+ All Categories
Home > Documents > MAS- Is the Phoenix getting wings 5 - harzing.comharzing.com/download/mas2.pdf · Microsoft...

MAS- Is the Phoenix getting wings 5 - harzing.comharzing.com/download/mas2.pdf · Microsoft...

Date post: 29-Jun-2018
Category:
Upload: vuongdat
View: 214 times
Download: 0 times
Share this document with a friend
12
1 Microsoft Academic: is the Phoenix getting wings? Anne-Wil Harzing Satu Alakangas Version November 2016 Accepted for Scientometrics Copyright © 2016, Anne-Wil Harzing, Satu Alakangas All rights reserved. Prof. Anne-Wil Harzing Middlesex University The Burroughs, Hendon London NW4 4BT Email: [email protected] Web: www.harzing.com
Transcript

1

MicrosoftAcademic:isthePhoenixgettingwings?

Anne-Wil Harzing Satu Alakangas Version November 2016 Accepted for Scientometrics Copyright © 2016, Anne-Wil Harzing, Satu Alakangas All rights reserved. Prof. Anne-Wil Harzing Middlesex University The Burroughs, Hendon London NW4 4BT Email: [email protected] Web: www.harzing.com

2

MicrosoftAcademic:IsthePhoenixgettingwings?

ANNE-WILHARZINGMiddlesexUniversity

TheBurroughs,Hendon,LondonNW44BTEmail:[email protected]:www.harzing.com

SATUALAKANGAS

UniversityofMelbourneParkvilleCampus,ParkvilleVIC3010,Australia

AbstractIn this article, we compare publication and citation coverage of the newMicrosoft Academicwith all othermajorsourcesforbibliometric data:GoogleScholar, Scopus,and the WebofScience,usingasampleof145academicsinfivebroaddisciplinaryareas:LifeSciences,Sciences,Engineering,SocialSciences,andHumanities.Whenusingthemoreconservativelinkedcitationcounts for Microsoft Academic, this data-source provides higher citation counts than bothScopusandtheWebofScienceforEngineering,theSocialSciences,andtheHumanities,whereascitationcountsfortheLifeSciencesandtheSciencesarefairlysimilaracrossthesethreedata-bases.GoogleScholarstillreportsthehighestcitationcountsforalldisciplines.

WhenusingthemoreliberalestimatedcitationcountsforMicrosoftAcademic,itsaveragecita-tionscountsarehigherthanbothScopusandtheWebofScienceforalldisciplines.FortheLifeSciences, Microsoft Academic estimated citation counts are higher even than Google Scholarcounts,whereasfortheSciencestheyarealmostidentical.ForEngineering,MicrosoftAcademicestimated citation counts are14% lower thanGoogleScholar citation counts,whereas for theSocial Sciences this is 23%. Only for theHumanities are they substantially (69%) lower thanGoogleScholarcitationscounts.

Overall, this first large-scalecomparativestudysuggests that thenewincarnationofMicrosoftAcademicpresentsuswithanexcellentalternativeforcitationanalysis.Wethereforeconcludethat theMicrosoftAcademicPhoenix isundeniablygrowingwings; itmightbeready to flyoffandstartitsadultlifeinthefieldofresearchevaluationsoon.

3

MicrosoftAcademic:IsthePhoenixgettingwings?

IntroductionThebibliometrics literature is awashwitharticles reviewingandcomparing (the coverageof)theWebofScience,Scopus,andGoogleScholar,ofteninthecontextofresearchevaluation(forthe latest examples see e.g. Delgado-López-Cózar & Repiso-Caballero, 2013,Wildgaard, 2015,Harzing&Alakangas,2016).However,sofarthebibliometricresearchcommunityhaspaidlittleattention to the fourthdata-source in this landscape:MicrosoftAcademic(Search).AlthoughaGoogleScholarsearchwiththewordsGoogleScholar,WebofScience,orScopusinthetitlere-sults in hundreds of journal articles for each of these three databases, the same search forMicrosoftAcademicdeliversonlysixpublishedjournalarticles(seeHarzing,2016).

AcomprehensiveanalysisofMicrosoftAcademicSearchcoveragewaspublishedin2014byOr-duña-Malea,Martín-Martín,Ayllon,&DelgadoLopez-Cozar(2014).Thisshowedthatalmostnonewmaterial had been added since 2012.Microsoft Academic Searchwas proclaimed all butdead by the bibliometric community. However, inMarch 2016Microsoft officially launched anewservice:MicrosoftAcademic.InMay2016,Harzing(2016)provided-forherownpublica-tion record - a detailed comparison of coverage of the newMicrosoft Academic with GoogleScholar,Scopus,andtheWebofScience,andproclaimedittobe“aPhoenixarisenfromtheash-es”. Harzing (2016) showed that Microsoft Academic significantly outperformed the Web ofScienceintermsofbothpublicationandcitationcoverage,andcouldalsobeconsideredtobeatleastanequaltoScopusonbothcounts.OnlyGoogleScholaroutperformedMicrosoftAcademic.However,Harzing’sstudyonlylookedatasingleacademic’spublicationrecordandassuchitsresultsmightbe idiosyncratic.Therecentreviewpublished inD-libMagazine’sSept/Oct issuebyHerrmannovaandKnoth(2016)presentedahigh-levelcomparisonofthekeyentitiesintheMicrosoftAcademicdatabasewithotherpubliclyavailabledatabases,butdidnotincludeGoogleScholar,Scopus,ortheWebofScience,norcomparedindividualacademics’records.

Inthisarticle,wethuscomparepublicationandcitationcoverageofthenewMicrosoftAcademicwithGoogleScholar,Scopus,andtheWebofScienceforasampleof145academicsinfivebroaddisciplinary areas: Life Sciences, Sciences, Engineering, Social Sciences, and Humanities. Thiscomparisonwillbeconductedatafairlyhighlevelofaggregation;unlikeHarzing(2016)wewillnot compare each academic’s individual publication record across databases. Instead,wewilllookathowMicrosoftAcademiccompareswiththethreeotherdatasourcesintermsoftheav-eragenumberofpapers,citations,h-indexandhIa(seeHarzing,Alakangas&Adams,2014)forthe145academicsinoursample.Wefirstconductouranalysisforthesampleasawhole,andsubsequentlyexplorethedifferentialcoverageacrossdisciplinesandindividuals.Finally,wein-vestigatetheextenttowhichourfindingschangeifweusethemoreliberal“estimatedcitationcount”inMicrosoftAcademicratherthanthemoreconservative“linkedcitationcount”.

MethodsSampleOur sample consists of 145AssociateProfessors andFull Professors at theUniversity ofMel-bourne, Australia. Constraining our sample to a single university allows us to control forextraneousvariabilityandthusconcentrateonthedifferencesbetweenthefourdatabases.FulldetailsoftheselectionprocedurescanbefoundinHarzingandAlakangas(2016).Inbrief,oursampleincludedall37disciplinesrepresentedattheUniversityofMelbourne,groupedintofivemajordisciplinaryfields:• Humanities: Architecture, Building & Planning; Culture & Communication; History; Lan-

guages&Linguistics;Law(19observations),• Social Sciences: Accounting & Finance; Economics; Education;Management &Marketing;

Psychology;Social&PoliticalSciences(24observations),• Engineering: Chemical & Biomolecular Engineering; Computing & Information Systems;

Electrical&ElectronicEngineering;InfrastructureEngineering;MechanicalEngineering(20

4

observations),• Sciences:Botany;Chemistry;EarthSciences;Genetics;Land&Environment;Mathematics;

Optometry;Physics;VeterinarySciences;Zoology(39observations),• Life Sciences: Anatomy and Neurosciece; Audiology; Biochemistry & Molecular Biology;

Dentistry;Obstetrics&Gynaecology;Ophthalmology;Microbiology;Pathology;Pharmacol-ogy;Physiology;PopulationHealth(43observations)1.

Table1providesthedescriptivestatisticsforoursample.Asisclearlyapparent,therearelargevariationsbothacrossindividualsandacrossdatabases.

Table1: Descriptivestatistics:numberofpapersandcitations,h-indexandhIaindexfor145ac-ademicsacrossGoogleScholar,MicrosoftAcademic,Scopus,andWebofScience

N Minimum Maximum Mean Std. Deviation

Papers Google Scholar 145 21 541 155 102 Papers Microsoft Academic 145 12 556 137 99 Papers Scopus 145 3 381 96 76 Papers Web of Science 145 3 413 96 82 Citations Google Scholar 145 76 20427 3982 3614 Citations Microsoft Academic 145 14 10779 2336 2328 Citations Scopus 145 2 15121 2413 2626 Citations Web of Science 145 0 14019 2168 2566 H-index Google Scholar 145 4 71 29 14 H-index Microsoft Academic 145 2 56 22 12 H-index Scopus 145 1 60 22 14 H-index Web of Science 145 0 58 20 14 hIa index Google Scholar 145 .08 1.86 .59 .26 hIa index Microsoft Academic 145 .07 1.38 .42 .20 hIa index Scopus 145 .04 1.12 .42 .20 hIa index Web of Science 145 .00 1.13 .38 .19

DatasourcesandproceduresAlldatawerecollected inthe firstweekofOctober2016.WeusedPublishorPerish(Harzing,2007)toconductsearchesforGoogleScholarandMicrosoftAcademic.Traditionally,PublishorPerishhasbeenusedprimarilyinconjunctionwithGoogleScholar,butversion5ofthesoftwarehas implementedMicrosoftAcademicsupport throughMicrosoft’sAPI.AsPoP5alsoprovidessupportforGoogleScholarCitationProfiles,weusedthosefortheacademicsinoursamplethathadcreatedsuchaprofile(justover50%).PublishorPerishalsooffersextensivedata importfacilities,thusprovidingtheabilitytoimportScopusandWebofSciencedata.SearchesforSco-pusand theWebofSciencewere thereforeconducted in theirnative interfaces, exportedandsubsequentlyimportedintoPublishorPerishtoallowforcalculationofthevariouscitationmet-rics. Final statistics of our 145 academics for all four databaseswere then exported to Excel,allowingforcomparisonofpaperandcitationscounts,aswellastheh-indexandhIa.

Searchqueriesforindividualauthorswererefinedonaniterativebasisthroughadetailedcom-parisonof theresults forthe fourdatabases(fordetailsregardingGoogleScholar,Scopus,andWebofScience,seeHarzing&Alakangas,2016).ForMicrosoftAcademic,thisinvolvedsomeex-perimentation,astheredidnotseemtobeauniformly“best”waytodefinequeries.Forsomeauthors,querieswiththe fullgivennameworkedbest, forotherauthorssearcheswithoneormore initialsprovided thebest results.Given thatMicrosoftAcademichasnot implementedaNOTsearch,whichwouldallowtheexclusionofnamesakes,wehadtosearchwithacombina-tionofauthornameandkeywordsforsomeauthors.Therelevantkeywordswereidentifiedbyreviewing theauthors’publicationrecords inotherdatabases.Thisprocedurewasneeded forfiveauthors,makingdatacollectionfortheseauthorsquitetime-consuming(30-60minutes). 1Earlierarticlesonthesamedataset(Harzing,Alakangas&Adams,2014;Harzing&Alakangas,2016)in-cludedanerrorinthenumberofobservationsbydiscipline,whichwerereversedfortheSciencesandLifeSciences.Thisdidnotimpactonanyofthearticlesstatisticsorconclusions,buttheerrorwascorrectedforthispaper.Furthermore,wehadtoremoveoneacademicintheLifeSciencesfromtheoriginalsampleof146academicsashisnamewassocommonthatitwasimpossibletoachievereliablesearchresults.

5

MetricsThefollowingmetricswereincludedinourcomparisons:• Publications:Totalnumberofpublicationsperacademic• Citations:Totalnumberofcitationsperacademic• H-index:Anacademicwithanindexofhhaspublishedhpaperseachofwhichhasbeencited

inotherpapersatleasthtimes(Hirsch,2005)• hIa:hInorm/academicage(seeHarzing,Alakangas&Adams,2014),where:

o hInorm:normalize thenumberofcitations foreachpaperbydividing thenumberofcitationsbythenumberofauthorsforthatpaper,andthencalculatetheh-indexofthenormalizedcitationcounts

o academicage:numberofyearselapsedsincefirstpublication

ResultsFirst,we note thatMicrosoft Academic coverage has improved substantially in the 5.5monthsincewe firststudied thisnewdatasource.Table2providesa longitudinalcomparisonof thefirstauthor’scitationscounts inMicrosoftAcademicwithcitationcounts fromthe threeotherdatabases.Acomparisononapublication-by-publicationbasisshowedthatcitationsforallpub-licationshad increased inMicrosoftAcademic for the5.5.monthperiod.Thebiggest increase,however,wasfoundforseveralbooksorbookchapters,aswellassomepublicationsinminorjournals. In addition, the Publish or Perish software now appeared in Microsoft Academicwhereasitdidn’tbefore.

Table2: Increaseofcitationsovertimeforanindividualacademic,comparisonacrossMicrosoftAcademic,GoogleScholar,ScopusandWebofScience

Date MA citations GS citations Scopus citations WoS citations

16 May 2016 3424 10409 2946 1844

MA cites as % of other sources 33% 116% 186%

1 Oct 2016 5237 11177 3271 2012

MA cites as % of other sources 47% 160% 260%

Monthly increase 9.6% 1.4% 2.0% 1.7%

1 Nov 2016 5420 11345 3330 2044

Monthly increase 3.5% 1.5% 1.8% 1.6%

Overall,withanaveragegrowthofnearly10%permonth,citationsincreasedmuchmoresignif-icantly inMicrosoft Academic than in any of the other databases,most likely reflecting a sig-nificantincreaseincoveragefortheformer.At1.4%-2.0%,monthlyincreasesincitationcountsforthethreeotherdatabasesweremuchmoremodest,andareverymuchinlinewiththosere-portedinHarzingandAlakangas(2016)foramuchlargersample.

WealsoreranoursearchesforthefirstauthorearlyNovember,justbeforesubmittingthisarti-cle.Themonthly increaseforMicrosoftAcademichaddeclinedto3.5%,whereastheincreasesfor theotherdatabasesremainedata similar level (1.5%-1.8%).This suggests thatwhilstMi-crosoft Academic is still expanding its coverage, it is getting closer to a steady-state citationgrowth.Finally,wereranbothMicrosoftAcademicandGoogleScholarsearchesforthefullsam-pleof145academics.AsScopusandWebofSciencesearchesareconsiderablymoretime-consu-ming thansearches forMicrosoftAcademicandGoogleScholar,wedidnotrerunsearches forthetwoformerdatabases.2Theresultsshowedthat,fortheoverallsample,MicrosoftAcademicresultsincreasedby2.4%inthelastmonth,comparedtoanincreaseforGoogleScholarof1.2%. 2Oncequeriesweredefined,repeatingMicrosoftAcademicsearchestooklessthan10minutesfortheen-tiresampleof145academics.Duetothemuchlongernecessarydelaysbetweenrequests,GoogleScholarsearchestookseveralhours,butdidnotrequirecontinuousattention.ScopusandWebofSciencesearch-estookuptoafulldayandrequiredcontinuousattentionassearchesinvolvedquiteanumberofstepsforeachindividualacademic.

6

Again, this suggests that further expansionofMicrosoftAcademic coveragehas sloweddown,butthatitmightstillbecatchingupwithGoogleScholar.

Intermsofdataquality,wenotethattheissueshighlightedinHarzing(2016)–namelyseveralerroneous year allocations, and citations thatwere split between a version of the publicationwiththemaintitleonlyandaversionwithboththemaintitleandasub-title–havenotyetbeenresolved,althoughtheMicrosoftAcademicteamhaveindicatedtheyareworkingonaresolution.

KeymetricsacrosstheentiresampleFigure1comparestheaveragenumberofpapersandcitationsacrossthefourdatabases.Onav-erage,MicrosoftAcademicreportsmorepapersperacademicthanScopusandWebofScienceandlessthanGoogleScholar.However,inadditiontocoveringawiderrangeofresearchoutputs(suchforinstanceasbooks),bothGoogleScholarandMicrosoftAcademicalsoincludeso-called“stray”publications,i.e.publicationsthatareduplicatesofotherpublications,butwithaslightlydifferenttitleorauthorvariant.3Hence,acomparisonofpapersacrossdatabasesisprobablynotvery informative.However, citationscanbemorereliablycomparedacrossdatabasesasstraypublicationstypicallyhavefewcitations.AsFigure1shows,onaverageMicrosoftAcademiccita-tionsareverysimilartoScopusandWebofSciencecitationsandsubstantivelyloweronlythanGoogleScholarcitations.OnaverageMicrosoftAcademicprovides59%oftheGoogleScholarci-tations,97%oftheScopuscitationsand108%oftheWebofSciencecitations.

Figure1: Average number of papers and citations for 145 academics across Google Scholar,MicrosoftAcademic,ScopusandWebofScience

Theaforementioneddifferences incitationpatternsarealsoreflected in thedifferences in theaverageh-indexandhIa(individualannualh-index)foroursample(seeFigure2).Onaverage,theMicrosoft Academic h-index is 77%of the Google Scholar h-index, equal to the Scopus h-index,and108%oftheWebofScienceh-index.TheMicrosoftAcademichIa-indexisonaverage71%oftheGoogleScholarindex,equaltotheScopusindexand113%oftheWebofSciencein-dex.AgainMicrosoftAcademic,ScopusandWebofSciencepresentverysimilarmetrics.

3Scopusand theWebofSciencealsocontainstraypublications,andoften–especially forauthorswithnon-journal publications – a far larger number thanGoogle Scholar andMicrosoft Academic. However,straysarenotshownwhenusingthegeneralsearchoptions,mostcommonlyemployedforbibliometricstudies.Forthefirstauthor,Scopusreportsnolessthan442secondarydocuments,inadditiontothe71documentsshowninthegeneralsearch.TheWebofScienceCitedReferenceSearchwouldhaveshownasimilarnumberifshehadnotsubmittedweeklydatachangereportsforyears,requestingthemergingofstraypublicationsintotheirrespectivemasterrecords.Forthefirstauthor’srecord,bothdatabasesthushavemorestraypublicationsthaneitherGoogleScholarorMicrosoftAcademic.

GS MA Scopus WoSPapers 155 137 96 96Citations 3982 2336 2413 2168

0

500

1000

1500

2000

2500

3000

3500

4000

0

20

40

60

80

100

120

140

160

180

Citations

Papers

7

Figure2: Averageh-indexandhIafor145academicsacrossGoogleScholar,MicrosoftAcademic,ScopusandWebofScience

DisciplinarycomparisonsThisaggregatepicturehidesquitealotofdifferences,bothbetweendisciplinesandbetweenin-dividuals.Astodisciplines,MicrosoftAcademichasfewercitationsthanScopusand,marginally,thanWebofSciencefortheLifeSciencesandSciences(seeFigure3).However,overallcitationlevelsfortheLifeSciencesandSciencesarefairlysimilaracrossthreeofthefourdatabases.ToalesserextentthisistrueforEngineeringaswell.Forthreeofourfivedisciplines,MicrosoftAca-demicthusdifferssubstantiallyincitationcountsonlyfromGoogleScholar,providingbetween57%and67%ofGoogleScholarcitations.

Figure3: Averagecitationsfor145academicsacrossGoogleScholar,MicrosoftAcademic,ScopusandWebofScience,groupedbyfivemajordisciplinaryareas

IntheSocialSciences,however,MicrosoftAcademichasaclearadvantageoverbothScopusandWebofScience,providing1.5to2timesasmanycitationsforoursample.Thedifferenceisevenstarker for the Humanities, whereMicrosoft Academic has a coverage that is 1.7 to nearly 3times as high. In both disciplines however,Microsoft Academic provides fewer citations thanGoogleScholar,lessthanhalffortheSocialSciencesandonlyaboutafifthfortheHumanities.

GS MA Scopus WoSh-index 28.9 22.2 22.3 20.4hIa 0.59 0.42 0.42 0.38

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

hIa

h-index

LifeSciences Sciences SocialSciences Engineering HumanitiesGS 5525 4835 3252 2613 1100MA 3701 2830 1452 1496 233Scopus 4102 3039 995 1429 137WoS 3711 2924 702 1120 80

0

1000

2000

3000

4000

5000

6000

Citations

8

Confirming our earlier study based on the same sample of academics (Harzing & Alakangas,2016), thedifferencesbetweendisciplines aremuch smallerwhen considering thehIa,whichwasspecificallydesignedtoadjustforcareerlengthanddisciplinarydifferences(seeFigure4).ApartfromtheHumanities,theaveragehIaforthefourdisciplinesdoesnotdiffersignificantlyforanyofthefourdatabaseswhenusingamoreconservativeTukeyBtest.

Figure4: AveragehIafor145academicsacrossGoogleScholar,MicrosoftAcademic,ScopusandWebofScience,groupedbyfivemajordisciplinaryareas

AgainweseethatMicrosoftAcademicprovidesmetricsthatareverysimilartoScopusandWebofScience for theLifeSciencesand theSciences.ForEngineeringand theHumanities, theMi-crosoft Academic hIa is very similar to the Scopus hIa,whereas it is 1.2 (Engineering) to 1.5times (Humanities) as high as theWeb of Science hIa. Only for the Social Sciences is theMi-crosoftAcademichIasubstantiallyhigherthanboththeScopusandtheWebofSciencehIa.TheGoogleScholarhIaishigherforalldisciplinesthantheMicrosoftAcademichIa,from1.3timesashighforEngineeringto1.9timesashighfortheHumanities.

IndividualcomparisonsThecoverageoftherespectivedatabasesdifferssubstantiallybyindividual(SeeTable3).GoogleScholarcitationswerehigherthanMicrosoftAcademiccitationsforallbutoneindividualinoursample.AlthoughonaverageMicrosoftAcademicreportsaverysimilarlevelofcitationstoSco-pus and theWeb of Science, it has a higher level of citations for 55% of the academics thanScopusdoes,andahigherlevelfor72%oftheacademicswhencomparedwithWebofScience.Amongthe8-10%oftheacademicswhohavesubstantiallylowercitationlevelsinMicrosoftAc-ademicthaninScopusandWebofScienceareseveralacademicswhoseolderpublications(30+years old) cannot be found inMicrosoftAcademic.Others have publicationswithmany (500-1500)co-authorsthatcannotbefoundinMicrosoftAcademicwhensearchingfortheirname.

Table3: IndividualcomparisonsofMicrosoftAcademiccitationcountswithGoogleScholar,Sco-pusandWebofScience

Datasource Numberofacademics(outof145)forwhomcitationcountsarelowerorhigherthanMicrosoftAcademiccitationcounts

LowerthanMA <5%higher 5%-10%higher 10%-25%Higher >25%HigherGoogleScholar 1* - - 10 134(92%)Scopus 80(55%) 13 13 25 14(10%)WebofScience 105(72%) 7 8 13 12(8%)

*ThisconcernedaGoogleScholarsearchproblem,where-astheacademic’slastnamewasverycommon-wewereforcedtosearchwith2initials,thusmissingsomecitations.Theoverallcitationcountwas8%lowerthaninMicrosoftAcademic

LifeSciences Sciences SocialSciences Engineering HumanitiesGS 0.66 0.59 0.71 0.53 0.38MA 0.46 0.43 0.53 0.42 0.21Scopus 0.49 0.47 0.43 0.39 0.19WoS 0.44 0.45 0.35 0.34 0.14

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

hIa

9

MASestimatedcitationcountsMicrosoftAcademiconlyincludescitationrecordsifitcanvalidatebothcitingandcitedpapersas credible. Credibility is established through a sophisticatedmachine learning based systemandcitationsthatarenotcrediblearedropped.4Thenumberofdroppedcitations,however, isusedtoestimate“true”citationcounts.5TheseestimatedcitationcountswereaddedtotheMi-crosoftAcademicdatabase in July/August2016. Inour sample,MicrosoftAcademic estimatedcitationcounts(APIattributeECC)wereonaverage66%higherthanMicrosoftAcademiclinkedcitation counts (API attribute CC). This hides large differences between individuals though.Around10%oftheacademicshaveestimatedcitationcountsthatareidenticaltotheirlinkedci-tationcountsorareatmost25%higher,whereasanother20%seeanincreaseofbetween25%and 50%. The largest group of academics (60%) experiences increases of between 50% and75%,whereas the remaining 10% see increases over 75%, some seeing their citation countsdoubleormorethandouble.

Replicatingourdetailedstudyof the firstauthor’spublicationrecord(Harzing,2016),we findthatforallbutoneofthe40journalarticlesincludedinherh-indexof49,theMicrosoftAcadem-ic estimated citation count is within -24%/+20% of the Google Scholar citation count, withabsolutedifferencesrangingfrom-34to+42citations.Morethanhalfoftheabsolutedifferencesareinarangeof-/+10citations.Theoverallcitationcountforthese40journalarticlesis8060inGoogleScholarand8198 inMicrosoftAcademic, i.e. there is less than2%differenceoverallbetweenthetwodatabases.Itappearsasif–atleastforthefirstauthor’sownrecord–thetwodata-sourcesachieveconvergentresults.Themainremainingdifferencebetweenthetwodata-sourcesconcernsnon-journalpublications.However,even in thiscategorytwopublications(aresearchmonograph and the Publish or Perish software) achieve very similar citation levelsacross the twodatabases,whereas obviously neither research output is covered in Scopus orWebofScience.

TakingMicrosoftAcademicestimatedcitationcountsrather than linkedcitationcountsasourbasisforthecomparisonwithScopus,WebofScience,andGoogleScholardoeschangethecom-parativepicturequitedramatically.Lookingatouroverallsampleof145academics,MicrosoftAcademic’saverageestimatedcitationcounts(3873)aremuchhigherthanbothScopus(2413)andWebofScience(2168)citationcounts.Thisisalsotruewhenwecomparetheaveragecita-tioncountsbydiscipline.MicrosoftAcademicestimatedcitationcountsare1.5timesashighasScopuscountsfortheLifeSciences,Sciences,andEngineeringand2.5timesashighfortheSo-cial Sciences andHumanities.When comparingMicrosoft Academic estimated citation countswithWebofSciencecitationcounts,we find themtobe1.6-1.7 timesashigh for theSciencesandLife Sciences, twiceashigh forEngineering,3.5 timesashigh for theSocial Sciences, andmore than 4 times as high for the Humanities. It is clear that in terms of estimated citationcounts,MicrosoftAcademicprovidesasignificantlybroadercoveragethanthetwocommercialdatabases,especiallyfortheSocialSciencesandHumanities.

4SinceMAsourcespublicationrecordsfromtheentireweb,itoftenfindsmultipleversionsofthesamear-ticle,and inmanycases, theydon’tagreeon thedetails.Amachine learningbasedsystemcorroboratesmultipleaccountsofthesamepublication,andonlyifaconfidencethresholdispasseddoesMAdeemtherecordcredibleandassignsaunique“paperentityID”toit.AcitingpapercanfailthetestandnotgetanentityIDifMAcannotverifyitsclaimedpublicationvenue,orauthorships.Thesameverificationiscon-ductedoneachreferredarticleaswell.Acitationcanfailthetestforthesameaforementionedreasons,orifthepapertitleischanged.Ifthetestfailsbecauseofthepublicationdate,thesystemcanself-correctasmorecorroborativeevidenceisobservedfromthewebcrawl.[Wang,2016]5Estimatedcitationcountsareusingatechniquestatisticianshavedevelopedtoestimatethetruesizeofapopulationifonecanonlyobserveasmallportion,butcanaffordtosamplemultipletimes.Themathal-lowstakingaportionofthedata,countinghowmany“new”itemsarenotseenbefore,andinferringhowsmallaportionwassampled.MA’s“linked”citationsareastatisticalsampleofthetruecitationseachpa-perreceives.MAcanalsofindothersamplesfromtheweb,includingGS,otherpublishers’websites,etc.MAcombinesalltheseasmultiplesamplesandappliesthesizeestimationformulaonthem.Theestima-tionqualityisbetterifthestatisticsfromsamplesagreemorewithoneanother.Asaresult,thevarianceintheestimatedcounts isnotuniform.For fieldsthathavedoneabetter jobtoputpublicationsonline,therearesmallerdifferencesbetweenMAandGSresults.[Wang,2016]

10

However,MicrosoftAcademicaverageestimatedcitationcounts(3873)arealsoverysimilartoGoogleScholar’saveragecounts(3982);presentingadifferenceoflessthan3%.Againthough,thisdoesobscuresratherlargedifferencesincomparativecitationscountsbetweendisciplinesandindividuals.Withregardtodisciplines,Figure5showsthatalthoughMicrosoftAcademices-timatedcitationcountsareclosertoGoogleScholarcitationcountsforalldisciplines,MicrosoftAcademicgetscloser forsomedisciplines than forothers.For theLifeSciencesMicrosoftAca-demicestimatedcitationcountsareinfact12%higherthanGoogleScholarcounts,whereasfortheSciencestheyarealmost identical.TheavailabilityofrepositoriessuchasPubMedreliablyinformsMicrosoftAcademichowmanypapersarebehindpaywallsthatneitherMicrosoftnorGooglehavebeenabletocrawl.ForEngineering,MicrosoftAcademicestimatedcitationcountsare14%lowerthanGoogleScholarcitations,whereasfortheSocialSciencesthisis23%.Onlyfor the Humanities are they substantially (69%) lower than Google Scholar citations. This ismostlikelycausedbyGoogleBooksprovidingGooglewithanedgeoverMicrosoftAcademicfortheSocialSciencesandHumanities.

Figure5: Comparison of average Microsoft Academic estimated citation counts with GoogleScholarcitationcountsandMicrosoftAcademic linkedcitationcounts,groupedbyfivemajordisciplinaryareas

Looking at individual academics, Table 4 shows that Microsoft Academic estimated citationcountsarehigherthanWebofSciencecitationcountsfor96%oftheacademicsandhigherthanScopuscitationcountsfor94%oftheacademics.OfthesixacademicswithlowercitationcountsinMicrosoftAcademicthaninWebofScience, twohadveryfewcitationsoverallandthustheverysmalldifferenceofrespectively6and24citationsbetweenMicrosoftAcademicandWebofSciencemadeupbetween6and10%oftheircitationrecord.Twootheracademics,workinginMolecularBiologyandAstrophysics,hadmissingpublications inMicrosoftAcademic,resultinginsubstantiallylowercitationcounts.Inthefirstcase,thisconcernedtheacademic’stwomostlyhighly cited papers, co-authored respectivelywith 250+ and 1500+ academics. In the secondcase, half of the academic’s papers and three quarters of his citations concernedpapers fromlargeconsortiawith500-1000authors,noneofwhichwerefoundinMicrosoftAcademicfortheauthorinquestion.Twofurtheracademicshadpublishedaverysignificantnumberofarticlesinthe1960s,1970s,and1980sthatweregenerallyhighlycitedinWebofScience;MicrosoftAca-demiccitationsfortheseolderpublications,however,wereverylow.Thismightbeduetomorelimited coverage in Microsoft Academic in the early years. Herrmannova and Knoth (2016)showedthatMicrosoftAcademiccoverage liesbelow1milliondocumentsayearbefore1980,increasingto3millionayeararound2000,withafurtherincreasetoaround7millionayearinrecentyears.

LifeSciences Sciences SocialSciences Engineering HumanitiesGS 5525 4835 3252 2613 1100MA 3701 2830 1452 1496 233MAECC 6167 4744 2499 2252 337

0

1000

2000

3000

4000

5000

6000

Citations

11

Table4: Individual comparisons of Microsoft Academic estimated citation counts with GoogleScholar,ScopusandWebofScience

Datasource Numberofacademics(outof145)forwhomcitationcountsarelowerorhigherthanMicrosoftAcademicEstimatedCitationCounts

LowerthanMA <5%higher 5%-10%higher 10%-25%Higher >25%HigherGoogleScholar 60(41%) 9 6 27 43(30%)Scopus 136(94%) - 3 3 3(2%)WebofScience 139(96%) - 2 - 4(3%)

Ofthenine individualswith lowercitationcounts inScopus, fourhadvery fewcitationcountsoverall (37-168 Scopus citations), so that relatively small differences betweenMicrosoft Aca-demicandScopusmadeup6-19%oftheircitationcount.OnefurtheracademicintheScienceshadonly85fewercitationsinMicrosoftAcademic(6%lower)assomeofhisolderpublicationshadlowcitationcounts,eventhoughcitationcountsinMicrosoftAcademicforhisrecentpubli-cations were generally higher than in Scopus. The remaining four academics with lowerestimatedcitationcountsinMicrosoftAcademicwereidenticaltothefourwediscussedabove,sufferingfrommissingpublicationsandlowercitationlevelsforpublicationsbefore1985.

MicrosoftAcademicestimatedcitationcountsarehigherthanGoogleScholarcitationcountsfor41%oftheacademicsinoursample.Differencesaregenerallynotverylargethough,only15%of the academics haveMicrosoft Academic ECCs that aremore than 25% higher than GoogleScholarcitations.Fornearly60%oftheacademicsinoursample,MicrosoftAcademicestimatedcitationcountsare lower thantheirGoogleScholarcitationcounts.This includesallof theHu-manities scholars, all but two of the Social Scientists, and all but three of the Engineeringacademics.Closerinspectionrevealed,however,thatthetwoSocialScientistsinquestionwereNeuro-psychologists. Hence, even though we classified the four Psychology academics in oursampleasSocialScientists,publicationpatternsfortwoofthemwereinfactmuchclosertotheLifeSciences.Likewise,twoofthreeEngineeringacademicswereinMolecularandChemicalEn-gineering and had publication patterns that were arguably closer to the Sciences. Thus itappearsthat,bothatanoverallandatanindividuallevel,MicrosoftAcademicestimatedcitationcountsarestilllowerthanGoogleScholarcitationcountsforthethreedisciplinesthatinprevi-ous studies have been shown to benefitmost from the expanded coverage of Google Scholar(Harzing&Alakangas,2016):Engineering, theSocialSciences,andHumanities.This isnot thecasefortheSciencesandtheLifeSciences,however.Nearly60%oftheacademicsintheScienc-eshavehigherMicrosoftAcademicestimatedcitationcountsthanGoogleScholarcitationcounts;fortheLifeSciencestheproportionwaseven75%.

DiscussionandConclusionIn this article,we comparedpublication and citation coverageof thenewMicrosoftAcademicwithallothermajorsourcesforbibliometricdata:GoogleScholar,Scopus,andtheWebofSci-ence,usingasampleof145academics in fivebroaddisciplinaryareas:LifeSciences,Sciences,Engineering, Social Sciences, and Humanities. We showed that Microsoft Academic compareswellwithbothScopusandtheWebofScienceintermsofcoverage.Whenusingthemorecon-servativelinkedcitationcountforMicrosoftAcademic,thisdata-sourceprovidedhighercitationcountsthanScopusandtheWebofScienceforEngineering,theSocialSciences,andtheHumani-ties,whereascitationcountsfortheLifeSciencesandtheScienceswerefairlysimilaracrossthethreedatabases.GoogleScholarstillprovidedthehighestcitationcountsforalldisciplines.AtanindividuallevelMicrosoftAcademicpresentedhighercitationcountsfor55%oftheacademicswhencomparedtoScopusandfor72%oftheacademicswhencomparedwiththeWebofSci-ence.Google Scholar, however, still provided thehighest citation counts for all but oneof theacademicsinoursample.

WhenusingthemoreliberalestimatedcitationcountsforMicrosoftAcademicitsaveragecita-tionscountswerehigherthanbothScopusandtheWebofScienceforalldisciplines.FortheLife

12

Sciences, Microsoft Academic estimated citation counts are even higher than Google Scholarcounts,whereasfortheSciencestheyarealmostidentical.ForEngineering,MicrosoftAcademicestimatedcitationcountsare14% lower thanGoogleScholar citations,whereas for theSocialSciences this is23%.Only for theHumanitiesare they substantially (69%) lower thanGoogleScholarcitations.Atanindividuallevel,MicrosoftAcademichadhighercitationcountsforvirtu-allyallacademicsthanScopusandtheWebofScience.However,academicsinEngineering,theSocialSciencesandHumanitiesstillhadhighercitationcountsinGoogleScholar,reflectingthelattersmorecomprehensivecoverageofbooksandnon-traditionalresearchoutputs.

Overall, this first large-scalecomparativestudysuggests that thenewincarnationofMicrosoftAcademicpresentsuswithanexcellentalternative forcitationanalysis.Thisverdictwouldbestrengthened further if coverage for books and non-traditional research outputs could be im-provedandtheremainingdataquality issuesregardingyearallocationandmain/subtitlesplitcouldberesolved.Our limitedcomparisonofcitationgrowthover the last6monthsalsosug-gests thatMicrosoft Academic is still increasing its coverage.We therefore conclude that theMicrosoftAcademicPhoenixisundeniablygrowingwings;itmightbereadytoflyoffandstartitsadultlifeinthefieldofresearchevaluationsoon.

ReferencesDelgado-López-Cózar, E.,&Repiso-Caballero,R. (2013). El impactode las revistas de comuni-cación: comparando Google Scholar Metrics, Web of Science y Scopus. Comunicar: RevistaCientíficadeComunicaciónyEducación,21(41),45-52.

Harzing,A.W.(2007)PublishorPerish,availablefromhttp://www.harzing.com/pop.htm

Harzing, A.W. (2016)Microsoft Academic (Search):A Phoenix arisen from the ashes?, Scien-tometrics,108(3),1637-1647.

Harzing,A.W.,&Alakangas,S.(2016)GoogleScholar,ScopusandtheWebofScience:Alongitu-dinalandcross-disciplinarycomparison,Scientometrics,106(2),787-804.

Harzing,A.W.,Alakangas,S.&Adams,D.(2014)hIa:Anindividualannualh-indextoaccommo-datedisciplinaryandcareerlengthdifferences,Scientometrics,99(3),811-821.

Herrmannova,D.,&Knoth,P.(2016).AnAnalysisoftheMicrosoftAcademicGraph.D-LibMaga-zine,22(9/10).

Hirsch, J.E. (2005) An index to quantify an individual's scientific research output,arXiv:physics/0508025v529Sep2006.

Orduña-Malea,E.,Martín-Martín,A.,M.Ayllon, J.,&DelgadoLopez-Cozar,E. (2014).Thesilentfading of an academic search engine: the case of Microsoft Academic Search.Online Infor-mationReview,38(7),936-953.

Wang,K.(2016)PersonalcommunicationwithKuansanWang,ManagingDirectoratMicrosoftResearchOutreach,31October2016.

Wildgaard,L.(2015).Acomparisonof17author-levelbibliometricindicatorsforresearchersinAstronomy, Environmental Science, Philosophy and Public Health in Web of Science andGoogleScholar.Scientometrics,104(3),873-906.


Recommended