Mining and UDlizing Dataset Relevancy from Oceanographic ... › forum › estf2017 ›...

Post on 30-May-2020

1 views 0 download

transcript

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

MiningandUDlizingDatasetRelevancyfromOceanographicDataset(MUDROD)Metadata,UsageMetrics,andUser

FeedbacktoImproveDataDiscoveryandAccess NASAAIST(NNX15AM85G)

Dr.Chaowei(Phil)Yang,Mr.YongyaoJiang,Ms.YunLi,

GeographyandGeoInformaDonScience,GeorgeMasonUniversity

Mr.EdwardMArmstrong,Mr.ThomasHuang,Mr.DavidMoroni,Mr.ChrisFinch,Dr.LewisJ.McGibbney,Mr.FrankGreguska,Mr.GaryChen

JetPropulsionLaboratory,NASA

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  ProjectBackground–  Problems–  ObjecDves–  FuncDons

•  System–  Logmining–  QuerySemanDcs–  Ranking–  RecommendaDon

•  Results•  Nextstep

Agenda

3

Data Discovery Problems

•  Keyword-based matching (traditional search engines) –  User query: ocean wind –  Final query: ocean AND wind

•  Reveal the real intent of user query –  ocean wind = “ocean wind” OR “greco” OR “surface wind” OR “mackerel breeze” …

•  PO.DAAC UWG Recommendation 2014-07

•  NASA ESDSWG Search Relevance Recommendations 2016 & 2017

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  Analyzeweblogstodiscoveruserknowledge(queryanddatarelaDonships)•  ConstructknowledgebasebycombiningsemanDcsandprofileanalyzer•  Improvedatadiscoveryby1)be^erranking;2)recommendaDon;3)

ontologynavigaDon

ObjecDves

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  Weblogpreprocessing•  SemanDcanalysisofuserqueries&NavigaDon

•  Machinelearningbasedsearchranking•  DataRecommendaDon

FuncDons/Modules

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Weblogprocessing

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  Requestssentfromcliente.g.browser,cmdlinetool,etc.recordedbyserver

•  LogfilesprovidedbyPO.DAAC(HTTP(S),FTP)

ClientIP:68.180.228.99Requestdate/Dme:[31/Jan/2015:23:59:13-0800]Request:"GET/datasetlist/...HTTP/1.1"HTTPCode:200Bytesreturned:84779Referrer/previouspage:“/ghrsst/"Useragent/browser:"Mozilla/5.0...

68.180.228.99--[31/Jan/2015:23:59:13-0800]"GET/datasetlist/...HTTP/1.1"20084779"/ghrsst/""Mozilla/5.0..."

Weblogs

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Goal: reconstruct user browsing pattern (search history & clickstream) from a set of raw logs

Weblogs

UseridenDficaDon

CrawlerdetecDon

StructurereconstrucDon

SessionidenDficaDon Searchhistory

Clickstream

AddiDonalstepsinclude:wordnormalizaDon,stopwordsremoval,andstemming

Datapreprocess

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Reconstructedsessionstructure

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Datapreprocessresults

1.Usersearchhistory2.Clickstream

Jiang,Y.,Y.Li,C.Yang,E.M.Armstrong,T.Huang&D.Moroni(2016)“ReconstrucDngSessionsfromDataDiscoveryandAccessLogstoBuildaSemanDcKnowledgeBaseforImprovingDataDiscovery”ISPRSInternaDonalJournalofGeo-InformaDon,5,54.

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

SemanDcsimilarity

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

ExisDngontology(SWEET)•  SWEET(RaskinandPan2003)•  FocusononlytworelaDons•  Thecloser,themoresimilar

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Usersearchhistory•  Createquery–usermatrix•  Calculatebinarycosinesimilarity

Conceptualexample

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Clickstream•  Hypothesis:similarqueriescanresultinsimilarclickingbehavior•  Iftwoqueriesaresimilar,thedatathatgetclickedaoertheyaresearched

wouldbemorelikelytobesimilar

Queryb

Data

Querya

Similar?

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Metadata•  Hypothesis:semanDcallyrelatedtermstendtoappearinthesame

metadatamorefrequently•  EssenDallythesameastheclickstreamanalysis•  PerformLatentSemanDcAnalyses(LSA)overtheterm–metadata

matrix

Queryb

Metadata

Querya

Similar?

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

IntegraDon•  Allfourresultscouldbeconvertedto

•  Problem:–  Noneofthemareperfect(uncertaintyindata,hypothesisandmethod)

–  Metadataandontologymighthaveunknowntermstosearchengineendusers

–  SomeDmes,similarityvaluesfromdifferentmethodsareinconsistent

ConceptA

ConceptB

Similarity

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

IntegraDon

•  Themaximumsimilarityofallofthecomponents(largesimilarityappearstobemorereliable)

•  Theadjustmentincrementbecomeslargerwhenthesimilarityexistsinmoresources

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

SemanDcSimilarityCalculaDonWorkflow

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

ResultsandevaluaDonQuery Searchhistory Clickstream Metadata SWEET Integratedlist

oceantemperature

seasurface

temperature(0.66),sea

surfacetopography(0.56),

oceanwind(0.56),

aqua(0.49)

seasurface

temperature(0.94),

sst(0.94),grouphigh

resoluDonseasurface

temperaturedataset(0.89),

ghrsst(0.87)

sst(0.96),ghrsst(0.77),sea

surfacetemperature(0.72),

surfacetemperature(0.63),

reynolds(0.58)

None

sst(1.0),seasurfacetemperature(1.0),

ghrsst(1.0),grouphighresoluDonsea

surfacetemperaturedataset(0.99),

reynoldsseasurfacetemperature(0.74)

Samplegroup Overallaccuracy

Mostpopular10queries 88%

Leastpopular10queries 61%

Randomlyselected10queries 83%

Bydomainexperts

Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang&D.Moroni(2017)AComprehensiveApproachtoDeterminingtheLinkageWeightsamongGeospaDalVocabularies-AnExamplewithOceanographicDataDiscovery.InternaDonalJournalofGeographicalInformaDonScience(minorrevision)

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  QuerysuggesDon•  QuerymodificaDon

Whatcanweuseitfor?

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Searchranking

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Background

•  Rankingisalong-standingproblemingeospaDaldatadiscovery•  Typically,hundreds,eventhousandsofmatches•  CangetlargerasmoreEarthobservaDondataisbeingcollected

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

ObjecDveandMethods

•  Putthemostdesireddatatothetopoftheresultlist•  Whatfeaturescanrepresentusers’searchpreferencesforgeospaDal

data?•  HowcantherankingfuncDonreachabalanceofallthesefeatures?

•  IdenDfiedelevenfeaturesfrom–  GeospaDalmetadataa^ributes–  Query– metadatacontentoverlap–  Userbehaviorfromweblogs

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Rankingfeatures– Metadataa^ributesFeatures Description

Release date The date when the data was published

Processing level (PL) The processing level of image products, ranging from level 0 to level 4.

Version number The publish version of the data

Spatial resolution The spatial resolution of the data

Temporal resolution The temporal resolution of the data

•  Fivemetadatafeatures•  Verifiedbydomainsexperts•  Query-independent:staDc,dependsonthedataitself,won’tchangewiththequery

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

SpaDalquery-metadataoverlap

•  SpaDalsimilaritybetweenqueryareaandthecoverageofaparDculardata

•  Overlapareanormalizedbytheoriginalareaofqueryanddata

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Rankingfeatures– Userbehavior•  All-Dme,monthly,userpopularity,andsemanDcpopularity(retrieved

fromweblogs)•  SemanDcpopularity:thenumberofDmesthatthedatahasbeen

clickedaoersearchingaparDcularqueryanditshighlyrelatedones(query-dependent)

oceantemperature

oceantemperature

seasurfacetemperature

sst

1.0

1.0

1.0

DataA

10

5

5

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

RankSVM•  Oneofthewell-recognizedMLrankingalgorithm•  ConvertarankingproblemintoaclassificaDonproblemthataregular

SVMalgorithmcansolve•  3mainsteps

Ø  1)Standardize:mean=0,std=1o  SVMisnotscaleinvarianto  Over-opDmizedo  Longertotrain

Ø  2)Foranypairoftrainingdata,calculatethedifferenceØ  3)ArankingproblembecomesabinaryclassificaDonproblem,whereSVMis

appliedtofindtheopDmaldecisionboundary

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

ArchitectureSec.15.4.2

•  All of these (except for training) can be finished within 2 seconds

•  None of the open source mainstream ML library provide any ranking algorithm

•  Implemented it by ourselves with the aid of Spark MLlib Index

User query

Semantic query

Top K retrieval

Ranking model

Learning algorithm

Training data

Re-ranked results

Feature extractor

Weblogs

User clicks Knowledge base

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

NDCG(K)forfivedifferentrankingmethodsatvaryingK(1-40)

Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)TowardsintelligentgeospaDaldiscovery:amachinelearningrankingframework.InternaDonalJournalofDigitalEarth(minorrevision)

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

DatarecommendaDon

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

HowtorecommendgeospaDaldata?•  UsegeospaDalmetadataforcontent-basedrecommendaDon

-MetadataspaDotemporalsimilarity-Metadataa^ributesimilarity-Metadatacontentsimilarity

•  LeverageuserbehaviorsdataforCFrecommendaDon

-Session-basedco-occurrenceofdata

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

AFributetype AFributename AFributedescripHon

SpaHotemporalaFributes DatasetCoverage-EastLon TheEastlongitudeoftheboundingrectangle

DatasetCoverage-WestLon TheWestlongitudeoftheboundingrectangle

DatasetCoverage-NorthLat TheNorthlaDtudeoftheboundingrectangle

DatasetCoverage-SouthLat TheSouthlaDtudeoftheboundingrectangle

DatasetCoverage-StartTimeLong ThestartDmeofthedata

DatasetCoverage-StopTimeLong TheendDmeofthedata

Categoricalgeographic

aFributes

DatasetRegion-Region Regionofdataset.Suchasglobal,AtlanDc

Dataset-ProjecDonType Projecttypelikecylindricallat-lon

Dataset-ProcessingLevel Dataprocessinglevel

DatasetPolicy-DataFormat Dataformate.g.HDF,NetCDF

DatasetSource-Sensor-ShortName Shortnameofsensor

OrdinalgeographicaFributes Dataset-TemporalResoluDon TemporalresoluDonofdataset

Dataset-TemporalRepeat TemporalresoluDonofdataset

Dataset-SpaDalResoluDon SpaDalresoluDonofdataset

Descriptive attributes Dataset-description Describe the content of the dataset

Geographicmetadata

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  SpaDalvariables:NorthLat,SouthLat,WestLon,EastLon•  Temporalvariables:DatasetCoverage-StartTimeLong,StopTimeLong

•  UsevolumeoverlapraDotocalculatesimilarity

𝑠𝑝𝑎𝑡𝑖𝑜𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙↓𝑠𝑖𝑚(𝑟↓𝑖 , 𝑟↓𝑗 ) =(𝑣𝑜𝑙𝑢𝑚𝑒(𝑟↓𝑖 ∩ 𝑟↓𝑗 )/𝑣𝑜𝑙𝑢𝑚𝑒(𝑟↓𝑖 ) + 𝑣𝑜𝑙𝑢𝑚𝑒(𝑟↓𝑖 ∩ 𝑟↓𝑗 )/𝑣𝑜𝑙𝑢𝑚𝑒(𝑟↓𝑗 ) )∗0.5𝑣𝑜𝑙𝑢𝑚𝑒(𝑟)=|𝑒𝑎𝑠𝑡𝑙𝑜𝑛−𝑤𝑒𝑠𝑡𝑙𝑜𝑛|∗|𝑠𝑜𝑢𝑡ℎ𝑙𝑎𝑡−𝑛𝑜𝑟𝑡ℎ𝑙𝑎𝑡|∗|𝑒𝑛𝑑𝑡𝑖𝑚𝑒−𝑠𝑡𝑎𝑟𝑡𝑖𝑚𝑒|

SpaDotemporalsimilarity

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  Fixednumberofvalues•  Nointrinsicordering•  sensor-name:"AMSR-E","MODIS","AVHRR-3"and

"WindSat"

𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙_𝑣𝑎𝑟_𝑠𝑖𝑚(𝑣↓𝑖 , 𝑣↓𝑗 )= 𝑣↓𝑖  ∩ 𝑣↓𝑗 /𝑣↓𝑖  ∪ 𝑣↓𝑗  

Categoricalsimilarity

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Ordinala^ributeissimilartocategoricala^ributebutitsvalueshasaclearorder,e.g.spaDalresoluDon•  Convertedintorankfrom1toR•  NominalizetheseranksforsimilaritycalculaDon

𝑜𝑟𝑑𝑖𝑛𝑎𝑙↓𝑣𝑎𝑟↓𝑠𝑖𝑚(𝑣↓𝑖 , 𝑣↓𝑗 )  =1− |𝑛𝑜𝑟𝑚↓𝑟𝑎𝑛𝑘(𝑣↓𝑖 ) −𝑛𝑜𝑟𝑚↓𝑟𝑎𝑛𝑘(𝑣↓𝑗 ) |

𝑛𝑜𝑟𝑚_𝑟𝑎𝑛𝑘(𝑣↓𝑖 )= 𝑅𝑎𝑛𝑘𝑣↓𝑖 +1/𝑅+1 

Ordinalsimilarity

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

OriginaltextAquariusLevel3seasurfacesalinity(SSS)standardmappedimagedatacontainsgridded1degreespaDalresoluDonSSSaveragedoverdaily,7day,monthly,andseasonalDmescales.ThisparDculardatasetistheseasonalclimatology,Ascendingseasurfacesalinityproductforversion4.0oftheAquariusdataset,whichistheofficialendofprimemissionpublicdatareleasefromtheAQUARIUS/SAC-Dmission.OnlyretrievedvaluesforAscendingpasseshavebeenusedtocreatethisproduct.TheAquariusinstrumentisonboardtheAQUARIUS/SAC-Dsatellite,acollaboraDveeffortbetweenNASAandtheArgenDnianSpaceAgencyComisionNacionaldeAcDvidadesEspaciales(CONAE).Theinstrumentconsistsofthreeradiometersinpushbroomalignmentatincidenceanglesof29,38,and46degreesincidenceanglesrelaDvetotheshadowsideoftheorbit.Footprintsforthebeamsare:76km(along-track)x94km(cross-track),84kmx120kmand96kmx156km,yieldingatotalcross-trackswathof370km.Theradiometersmeasurebrightnesstemperatureat1.413GHzintheirrespecDvehorizontalandverDcalpolarizaDons(THandTV).Asca^erometeroperaDngat1.26GHzmeasuresoceanbacksca^erineachfootprintthatisusedforsurfaceroughnesscorrecDonsintheesDmaDonofsalinity.Thesca^erometerhasanapproximate390kmswath.

ExtractedtermsRadiometersMeasureBrightnessTemperature,AQUARIUS/SACMission,ImageData,BroomAlignment,ResoluDonSSS,AQUARIUS/SACSatellite,Sca^erometer,Sca^erometer,AquariusData,ArgenDnianSpaceAgencyComisionNacional,IncidenceAngles,TimeScales,AcDvidadesEspaciales,Cross-trackSwath,OfficialEnd,AquariusInstrument,ShadowSide,AscendingSeaSurfaceSalinityProduct,Level,Level,SurfaceRoughnessCorrecDons,DataRelease,Salinity,Density,Salinity,Density,AQUARIUS,L3,SSS,SMIA,SEASONAL-CLIMATOLOGY,V4

Step1:PhraseextracDon1.  ExtracttermcandidatesfrommetadatadescripDonwithPOS(partofspeech)Tagging2.  Introduce“occurrence”and“strength”tofilterouttermsfromcandidates.“occurrence”:occurrencesnumberofterms“strength”:thenumberofwordsinaterm

DescripDvesimilarity

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Step2:Representmetadatainthephrasevectorspace(Thedimensionlowerthanwordfeaturespace)

Term1 Term2 Term3 Term4 Term5 Term6 Term7 … TermN

dataser1 1 0 1 0 0 0 0 1

dataser2

0 0 0 1 1 0 0 0

datasetk 1 0 1 0 0 0 0 1

Step3:Calculatecosinesimilarity

MetadataabstractsemanDcsimilarity

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Session1 Session2 … SessionN

Data1 1 0 1Data2 0 0 0…

Datak 1 0 1

Calculatemetadatasimilaritybasedonsessionlevelco-occurrence

Similarity (i,j) = 𝑁(𝑖ᴖ𝑗)/√𝑁(𝑖) ∗√𝑁(𝑗)  

N(i):ThenumberofsessionsinwhichdatasetiwasviewedordownloadN(j):ThenumberofsessionsinwhichdatasetjwasviewedordownloadN(𝑖ᴖ𝑗):Thenumberofsessionsinwhichbothdatasetiandjwereviewedordownload

SessionbasedrecommendaDon

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

RecommendaHonmethod

Pros Cons Strategy

DescripDvesimilarity

1.NaturallanguageprocessingmethodscanbeadoptedtofindlatentsemanDcrelaDonship

1.  Manydatasetshasnearlysameabstractwithfewwords/valueschanged2.  Itishardtoextractdetaileda^ributesfromdescripDon

UsedasthebasicmethodofrecommendaDonalgorithm

A^ributesimilarity(spaDotemporal,ordinal,categorical)

1.Asstructureddata,geographicmetadatahavemanyvariables.

1.Variablevaluesmaybenullorwrong2.Thequalitydependsontheweightassignedtoeveryvariable

AssupplementtosemanDcsimilarity

Sessionconcurrence

1.Reflectusers’preference

1.Coldstartproblem:Newlypublisheddatadon’thaveusagedata

Fine-tunerecommendaDonlist

Recommend (i) =𝑊𝑠𝑠∗𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑣𝑒𝑐↓𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑖) + ∑𝑊𝑐𝑣∗𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑎𝑙↓𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑖) +∑𝑊𝑜𝑣∗𝑂𝑟𝑑𝑖𝑎𝑛𝑙↓𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑖) +𝑊𝑠𝑡𝑣 ∗𝑆𝑝𝑎𝑡𝑖𝑜𝑇𝑒𝑚𝑝𝑜𝑟𝑎𝑙𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦+ 𝑊𝑠𝑜∗𝑆𝑒𝑠𝑠𝑖𝑜𝑛𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦

HybridrecommendaDon

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10

Precision

PosiDon

Hybridsimilarity Word-basedsimilarity A^ributesimilarity Session-basedsimilarity

QuanDtaDveEvaluaDon HybridsimilarityoutperformothersimilariDessinceitintegratesmetadataa^ributesanduserpreference.

Y.Li,Jiang,Y.,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)AGeospaDalDataRecommenderSystembasedonMetadataandUserBehaviour(revision)

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Conclusion

•  LogminingenablesadataportalintegraDngimplicituserpreferences•  Wordsimilarityretrievedbydataminingtasksexpandsanygivenquery

toimprovesearchrecallandprecision.•  TherichsetofrankingfeaturesandtheMLalgorithmprovide

substanDaladvantagesoverusingotherrankingmethods•  TherecommendaDonalgorithmcandiscoverlatentdatarelevancy•  Theproposedarchitectureenablesthelooselycoupledsooware

structureofadataportalandavoidsthecostofreplacingtheexisDngsystem

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  PublicaDons•  Jiang,Y.,Y.Li,C.Yang,E.M.Armstrong,T.Huang&D.Moroni(2016)ReconstrucDngSessionsfromDataDiscoveryand

AccessLogstoBuildaSemanDcKnowledgeBaseforImprovingDataDiscovery.ISPRSInternaDonalJournalofGeo-InformaDon,5,54.

•  Y.Li,Jiang,Y.,C.Yang,K.Liu,E.M.Armstrong,T.Huang&D.Moroni(2016)LeveragecloudcompuDngtoimprovedataaccesslogmining.IEEEOceans2016.

•  Yang,C.,etal.,2017.BigDataandcloudcompuDng:innovaDonopportuniDesandchallenges.Interna>onalJournalofDigitalEarth,10(1),pp.13-53.(the2ndmostreadpaperofIJDEinit’sdecadalhistory)

•  Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang&D.Moroni(2017)AComprehensiveApproachtoDeterminingtheLinkageWeightsamongGeospaDalVocabularies-AnExamplewithOceanographicDataDiscovery.InternaDonalJournalofGeographicalInformaDonScience(minorrevision)

•  Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)TowardsintelligentgeospaDaldiscovery:amachinelearningrankingframework.InternaDonalJournalofDigitalEarth(minorrevision)

•  Y.Li,Jiang,Y.,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)AGeospaDalDataRecommenderSystembasedonMetadataandUserBehaviour(revision)

•  Jiang,Y.,Y.Li,C.Yang,K.Liu,E.M.Armstrong,T.Huang,D.Moroni&L.J.McGibbney(2017)Asmartweb-baseddatadiscoverysystemforoceansciences.(ongoing)

•  Sourcecode:h^ps://github.com/mudrod/mudrod•  PO.DAACLabs:h^p://mudrod.jpl.nasa.gov/•  PDLeverage:h^p://pd.cloud.gmu.edu/

Products

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Component CurrentTRL ProjectendTRL DescripHonSemanHcsearchengine

SearchDispatcher 7 7 TranslaDngausersearchqueryintoasetofnewsemanDcqueries

Similaritycalculator 7 7 CalculaDngthesemanDcsimilarityfromweblogs,metadata,andontology

RecommendaDonmodule 7 7 Recommendingsimilardatasetstotheclickeddataset

Rankingmodule 7 7 Re-rankingthesearchresultsbasedonRankSVMMLalgorithm

Knowledgebase

Ontology 7 7 ExtensionsfromSWEETontologyforearthsciencedata

TripleStore 7 7 ESIPontologyrepositoryVocabularylinkagediscoveryengine

Profileanalyzer 7 7 ExtracDnguserbrowsingpa^ernfromrawweblogs

Webservices/GUI

Rankingservice/presenter 7 7 ProvidingandpresenDngtherankedresults

RecommendaDonservice/presenter 7 7 ProvidingandpresenDngtherelateddatasets

OntologynavigaDonservice/presenter 7 7 ProvidingandpresenDngrelatedsearches

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

Nextsteps•  Addmorefeatures(e.g.,temporalsimilarity)•  CreatetrainingdatafromweblogsforRankSVM•  Developaqueryunderstandingmoduletobe^erinterpretuser’ssearch

intent(e.g.“oceanwindlevel3”->“oceanwind”AND“level3”)•  SupportSolr•  Supportnearreal-DmedataingesDontodynamicallyupdateknowledge

base•  IntegraDonwithDOMSandOceanXtremesforanoceanscienceanalyDcs

center•  LeverageadvancedcompuDngtechniquestospeeduptheprocess

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

•  YangC.,JiangY.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2015.“UDlizingAdvancedITTechnologiestoSupportMUDRODtoAdvanceDataDiscoveryandAccess”,AGU,SanFrancisco,CA.

•  YangC.,JiangY.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2016.“MiningandUDlizingDatasetRelevancyfromOceanographicDataset(MUDROD)Metadata,UsageMetrics,andUserFeedbacktoImproveDataDiscoveryandAccess”,ESIPwintermeeDng2016,WashingtonD.C.

•  JiangY.,YangC.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2016.“AComprehensiveApproachtoDeterminingtheLinkageWeightsamongGeospaDalVocabularies-AnExamplewithOceanographicDataDiscovery”,AAG2016,SanFrancisco,CA.

•  YangC.,JiangY.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2016.“MiningandUDlizingDatasetRelevancyfromOceanographicDataset(MUDROD)Metadata,UsageMetrics,andUserFeedbacktoImproveDataDiscoveryandAccess”,PO.DAACUWG,Pasadena,CA.

•  LY.,YangC.,JiangY.,ArmstrongE.,HuangT.,andMoroniD.,2016.“LeveragingcloudcompuDngtospeedupuseraccesslogmining”,Oceans16MTSIEEE,Monterey,CA.

•  JiangY.,YangC.,LY.,ArmstrongE.,HuangT.,andMoroniD.,2017.“TowardsintelligentgeospaDaldiscovery:amachinelearningrankingframework”,AAG2017,Boston,MA.

•  LY.,YangC.,JiangY.,ArmstrongE.,HuangT.,andMoroniD.,2017.“Ageographicrecommendersystemusingmetadataanduserfeedbacks”,AAG2017,Boston,MA.

PresentaDons

EarthScienceTechnologyForum(ESTF2017),June13-15,2017Pasadena,CA

1.  NASAAISTProgram(NNX15AM85G)2.  PO.DAACSWEETOntologyTeam(IniDallyfundedbyESTO)3.  HydrologyDAACRahulRamachandran(providingtheearlier

versionofNOESIS)4.  ESDISforprovidingtesDnglogsofCMR5.  AllteammembersatJPLandGMU

Acknowledgements