Google Scholar as a source for scholarly evaluation: a bibliographic review of database errors

Authors

DOI:

https://doi.org/10.3989/redc.2017.4.1500

Keywords:

Google Scholar, Academic search engines, Bibliographic databases, Errors, Quality

Abstract


Google Scholar (GS) is an academic search engine and discovery tool launched by Google (now Alphabet) in November 2004. The fact that GS provides the number of citations received by each article from all other indexed articles (regardless of their source) has led to its use in bibliometric analysis and academic assessment tasks, especially in social sciences and humanities. However, the existence of errors, sometimes of great magnitude, has provoked criticism from the academic community. The aim of this article is to carry out an exhaustive bibliographical review of all studies that provide either specific or incidental empirical evidence of the errors found in Google Scholar. The results indicate that the bibliographic corpus dedicated to errors in Google Scholar is still very limited (n= 49), excessively fragmented, and diffuse; the findings have not been based on any systematic methodology or on units that are comparable to each other, so they cannot be quantified, or their impact analysed, with any precision. Certain limitations of the search engine itself (time required for data cleaning, limit on citations per search result and hits per query) may be the cause of this absence of empirical studies.

Downloads

Download data is not yet available.

References

Abram, S. (2005). Google Scholar: thin edge of the wedge?. Information Outlook, 9 (1), 44-46.

Adlington, J.; Benda, C. (2006). Checking under the hood: evaluating Google scholar for reference use. Internet Reference Services Quarterly, 10 (3/4), 135-148.

Adriaanse, L.; Rensleigh, C. (2011). Content versus quality: a Web of Science, Scopus and Google Scholar comparison. 13th Annual Conference on World Wide Web applications, pp. 5-18. Cape Peninsula University of Technology; Johannesburg, South Africa.

Adriaanse, L.; Rensleigh, C. (2013). Web of Science, Scopus and Google Scholar: A content comprehensiveness comparison. The Electronic Library, 31 (6), 727-744. https://doi.org/10.1108/EL-12-2011-0174

Aguillo, Isidro F. (2012). Is Google Scholar useful for bibliometrics? A webometric analysis. Scientometrics, 91 (2), 343-351. https://doi.org/10.1007/s11192-011-0582-8

Baneyx, A. (2008). "Publish or Perish" as citation metrics used to analyze scientific output in the humanities: International case studies in Economics, Geography, Social Sciences, Philosophy, and History. Archivium Immunologiae et Therapiae Experimentalis, 56 (6), 363–371. https://doi.org/10.1007/s00005-008-0043-0 PMid:19043670

Bar-Ilan, J. (2006). An ego-centric citation analysis of the works of Michael O. Rabin based on multiple citation indexes. Information Processing & Management, 42 (6), 1553-1566. https://doi.org/10.1016/j.ipm.2006.03.019

Bar-Ilan, J. (2008). Which h-index?—A comparison of WoS, Scopus and Google Scholar. Scientometrics, 74 (2), 257-271. https://doi.org/10.1007/s11192-008-0216-y

Bar-Ilan, J. (2010). Citations to the "Introduction to informetrics" indexed by WOS, Scopus and Google Scholar. Scientometrics, 82 (3), 495-506. https://doi.org/10.1007/s11192-010-0185-9

Bauer, K.; Bakkalbasi, N. (2005). An examination of citation counts in a new scholarly communication environment. D-Lib magazine, 11 (9). https://doi.org/10.1045/september2005-bauer

Beel, J.; Gipp, B. (2009). Google Scholar's ranking algorithm: an introductory overview. Proceedings of the 12th international conference on scientometrics and informetrics, pp. 230-241. ISSI. Rio de Janeiro, Brazil.

Belew R.K. (2005). Scientific impact quantity and quality: analysis of two sources of bibliographic data. Available at: http://www.cogsci.ucsd. edu/~rik/papers/belew05-iqq.pdf

Bensman, S.J. (2012). The impact factor: its place in Garfield's thought, in science evaluation, and in library collection management. Scientometrics, 92 (2), 263- 275. https://doi.org/10.1007/s11192-011-0601-9

Bosman, J; Mourik, I; Van Rasch, M; Sieverts, E; Verhoeff, H (2006). Scopus reviewed and compared. The coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar. Utrecht University Library. Available at: https://dspace.library.uu.nl/handle/1874/18247

Breeding, M. (2015). The future of library resource discovery. NISO Whitepapers. NISO; Baltimore, United States. Do?an, G.; ?encan, ?.; Tonta, Y. (2016). Does dirty data affect google scholar citations?. Proceedings of the Association for Information Science and Technology, 53 (1), 1-4.

Butler, D. (2004). Science searches shift up a gear as Google starts Scholar Engine. Nature, 432, p. 423. https://doi.org/10.1038/432423a PMid:15565113

Butler, L. (2011). The devil is in the detail: Concerns about Vanclay's analysis of Australian journal rankings. Journal of Informetrics, 5 (4), 693–694. https://doi.org/10.1016/j.joi.2011.04.001

De Winter, J.C.; Zadpoor, A.A.; Dodou, D. (2014). The expansion of Google Scholar versus Web of Science: a longitudinal study. Scientometrics, 98 (2), 1547-1565. https://doi.org/10.1007/s11192-013-1089-2

Dilger, A.; Müller, H. (2013). A citation-based ranking of German-speaking researchers in business administration with data of Google Scholar. European Journal of Higher Education, 3 (2), 140-150. https://doi.org/10.1080/21568235.2013.779464

Felter, L.M. (2005). The better mousetrap: Google Scholar, Scirus, and the Scholarly Search Revolution, Searcher, 13 (2), 43-48.

García-Pérez, M.A. (2010). Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology. Journal of the Association for Information Science and Technology, 61(10), 2070-2085. https://doi.org/10.1002/asi.21372

Gardner, S.; Eng, S. (2005). Gaga over Google? Scholar in the social sciences. Library Hi Tech News, 22 (8), 42-45. https://doi.org/10.1108/07419050510633952

Giles, J. (2005). Science in the web age: Start your engines. Nature, 438 (7068), 554–555. https://doi.org/10.1038/438554a PMid:16319857

Goodman, A. (2004). Google Scholar vs. Real Scholarship, Traffic. Available at: http://www.traffick.com/2004/11/ google-scholar–vs-real-scholarship.asp

Haddaway, N.R.; Collins, A.M.; Coughlin, D.; Kirk, S. (2015). The role of Google Scholar in evidence reviews and its applicability to grey literature searching. PloS one, 10 (9), e0138237. https://doi.org/10.1371/journal.pone.0138237 PMid:26379270 PMCid:PMC4574933

Harzing, A.W. (2010). The publish or perish book. Tarma software research; Melbourne.

Harzing, A.W. (2014). A longitudinal study of Google Scholar coverage between 2012 and 2013. Scientometrics, 98 (1), 565-575. https://doi.org/10.1007/s11192-013-0975-y

Harzing, A-W.; Alakangas, S. (2016). Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison. Scientometrics, 106 (2), 787-804. https://doi.org/10.1007/s11192-015-1798-9

Harzing, A.W; Van der Wal, R. (2008). Google Scholar as a new source for citation analysis. Ethics in science and environmental politics, 8 (1), 61-73. https://doi.org/10.3354/esep00076

Hirsch, J.E. (2005). An index to quantify an individual's scientific research output. Proceedings of the National academy of Sciences of the United States of America, 102 (46), 16569-16572. https://doi.org/10.1073/pnas.0507655102 PMid:16275915 PMCid:PMC1283832

Jacsó, P. (2004). Péter's digital ready reference shelf, (web-only document). Available at: https://goo.gl/ouV3PP

Jacsó, P. (2005a). As we may search: Comparison of major features of the Web of Science, Scopus, and Google Scholar citation-based and citation-enhanced databases. Current science, 89 (9), 1537-1547.

Jacsó, P. (2005b). Comparison and analysis of the citedness scores in Web of Science and Google Scholar. International Conference on Asian Digital Libraries, pp 360-369. Springer; Berlin; Heidelberg, Germany. https://doi.org/10.1007/11599517_41

Jacsó, P. (2005c). Google Scholar: the pros and the cons. Online information review, 29 (2), 208-214. https://doi.org/10.1108/14684520510598066

Jacsó, P. (2006a). Deflated, inflated, and phantom citation counts. Online Information Review, 30 (3), 297-309. https://doi.org/10.1108/14684520610675816

Jacsó, P. (2006b). Dubious hit counts and cuckoo's eggs. Online Information Review, 30 (2), 188-193. https://doi.org/10.1108/14684520610659201

Jacsó, P. (2008a). Google scholar revisited. Online Information Review, 32 (1), 102-114. https://doi.org/10.1108/14684520810866010

Jacsó, P. (2008b). The pros and cons of computing the h-index using Google Scholar. Online Information Review, 32 (3), 437-452. https://doi.org/10.1108/14684520810889718

Jacsó, P. (2008c). Testing the calculation of a realistic h-index in Google Scholar, Scopus, and Web of Science for F.W. Lancaster. Library Trends, 56 (4), 784-815. https://doi.org/10.1353/lib.0.0011

Jacsó, P. (2009a). Calculating the h-index and other bibliometric and scientometric indicators from Google Scholar with the Publish or Perish software. Online Information Review, 33(6), 1189-1200. https://doi.org/10.1108/14684520911011070

Jacsó, P. (2009b). Google Scholar's Ghost Authors. Library Journal, 134 (18), 26-27.

Jacsó, P. (2010). Metadata mega mess in Google Scholar. Online Information Review, 34 (1), 175-191. https://doi.org/10.1108/14684521011024191

Jacsó, P. (2011). Google Scholar duped and deduped–the aura of "robometrics". Online Information Review, 35(1), 154-160. https://doi.org/10.1108/14684521111113632

Jacsó, P. (2012a). Google Scholar Author Citation Tracker: is it too little, too late?. Online Information Review, 36(1), 126-141. https://doi.org/10.1108/14684521211209581

Jacsó, P. (2012b). Grim tales about the impact factor and the h-index in the Web of Science and the Journal Citation Reports databases: Reflections on Vanclay's criticism. Scientometrics, 92 (2), 325-354. https://doi.org/10.1007/s11192-012-0769-7

Jacsó, P. (2012c). Using Google Scholar for journal impact factors and the h-index in nationwide publishing assessments in academia – siren songs and air-raid sirens. Online Information Review, 36 (3), 462-478. https://doi.org/10.1108/14684521211241503

Jacsó, P. (2012d). Google Scholar Metrics for Publications: The software and content features of a new open access bibliometric service. Online Information Review, 36 (4), 604-619. https://doi.org/10.1108/14684521211254121

Kennedy, S.; Price, G. (2004). Big News: "Google Scholar" is Born. Resourceshelf. Available at: http:// web.resourceshelf.com/go/resourceblog/40511

Leslie M.A. (2004). A Google for academia. Science, 306 (5702), 1661-1663. https://doi.org/10.1126/science.306.5702.1661c

Levine-Clark, M.; Gil, E.L. (2009). A comparative citation analysis of Web of Science, Scopus and Google Scholar. Journal of Business and Finance Librarianship, 14 (1), 32-46. https://doi.org/10.1080/08963560802176348

Li, J.; Sanderson, M.; Willett, P.; Norris, M.; Oppenheim, C. (2010). Ranking of library and information science researchers: Comparison of data sources for correlating citation data, and expert judgments. Journal of Informetrics, 4 (4), 554-563. https://doi.org/10.1016/j.joi.2010.06.005

London School of Economics and Political Science (2011). Maximizing the impacts of your research: A handbook for social scientists. LSE; UK. Available at: http:// www2.lse.ac.uk/government/research/resgroups/ LSEPublicPolicy/Docs/LSE_Impact_Handbook_ April_2011.pdf

Maia, J.L.; Di Serio, L.C.; Alves Filho, A.G. (2016). Bibliometric research on strategy as practice: exploratory results and source comparison. Sistemas & Gestão, 10 (4), 654-669. https://doi.org/10.20985/1980-5160.2015.v10n4.662

Martín-Martín, A.; Ayllón, J.M.; Orduna-Malea, E.; Delgado López-Cózar, E. (2014a). Google Scholar Metrics 2014: a low cost bibliometric tool. EC3 Working Papers, 17. Available at: https://arxiv.org/abs/1407.2827

Martín-Martín, A., Orduna-Malea, E., Ayllón, J.M.; Delgado López-Cózar, E. (2014b). Does Google Scholar contain all highly cited documents (1950-2013)?. EC3 Working Papers, 19. Available at: https://arxiv.org/ abs/1410.8464

Martín-Martín, A.; Ayllón, J.M.; Orduna-Malea, E.; Delgado López-Cózar, E. (2016a). 2016 Google Scholar Metrics released: a matter of languages... and something else. EC3 Working Papers, 22. Available at: https://arxiv. org/abs/1607.06260

Martín-Martín, A.; Orduna-Malea, E.; Ayllón, J.M.; Delgado López-Cózar, E. (2016b). A two-sided academic landscape: snapshot of highly-cited documents in Google Scholar (1950-2013). Revista Espa-ola de Documentación Científica, 39 (4).

Martín-Martín, A.; Orduna-Malea, E.; Ayllón, J.M.; Delgado López-Cózar, E. (2016c). The counting house: measuring those who count. Presence of Bibliometrics, Scientometrics, Informetrics, Webometrics and Altmetrics in the Google Scholar Citations, ResearcherID, ResearchGate, Mendeley & Twitter. EC3 Working Papers, 21. Available at: https://arxiv.org/abs/1602.02412

Martín-Martín, A.; Orduna-Malea, E.; Harzing, A.W.; Delgado López-Cózar, E. (2017). Can we use Google Scholar to identify highly-cited documents?. Journal of Informetrics, 11 (1), 152-163. https://doi.org/10.1016/j.joi.2016.11.008

Meho, L.I.; Yang, K. (2007). Impact of data sources on citation counts and rankings of LIS faculty: Web of Science versus Scopus and Google Scholar. Journal of the American Society for Information Science and Technology, 58 (13), 2105-2125. https://doi.org/10.1002/asi.20677

Moed, H.F.; Bar-Ilan, J.; Halevi, G. (2016). A new methodology for comparing Google Scholar and Scopus. Journal of Informetrics, 10 (2), 533-551. https://doi.org/10.1016/j.joi.2016.04.017

Noll, H.M. (2008). Where Google Scholar Stands on Art: An Evaluation of Content Coverage in Online Databases. [Master Thesis]. University of North Carolina at Chapel Hill; North Carolina.

Noruzi, A. (2005). Google Scholar: the new generation of citation indexes. Libri, 55 (4), 170-180. https://doi.org/10.1515/LIBR.2005.170

Notess, G.R. (2005). Scholarly web searching: Google Scholar and Scirus. Online, 29 (4), 39-41.

Nunberg, G. (2009). Google's book search: A disaster for scholars. The chronicle of higher education, 31. Available at: http://www.chronicle.com/article/Googles-Book-Search-A/48245

Oder, N. (2009). Google, 'the last library', and millions of metadata mistakes. Library Journal Academic Newswire, 3.

Ojala, M. (2005). Scholarly mistakes. Online, 29 (3), 26.

Orduna-Malea, E.; Martín-Martín, A.; Ayllón, J.M.; Delgado López-Cózar, E. (2016). La revolución Google Scholar. Destapando la caja de Pandora académica. UNE (Unión de Editoriales Universitarias Espa-olas); Granada. PMid:27653216

Orduna-Malea, E.; Ayllón, J.M.; Martín-Martín, A.; Delgado López-Cózar, E. (2017). The lost academic home: institutional affiliation links in Google Scholar Citations. Online Information Review, 41 (6), 762-781. https://doi.org/10.1108/OIR-10-2016-0302

Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Elsevier; Oxford. http://www.sciencedirect.com/science/book/9781843347910

Ortega, J. L. (2015). Relationship between altmetric and bibliometric indicators across academic social sites: The case of CSIC's members. Journal of Informetrics, 9 (1), 39-49. https://doi.org/10.1016/j.joi.2014.11.004

Pauly, D.; Stergiou, K.I. (2005). Equivalence of results from two citation analyses: Thomson ISI's Citation Index and Google's Scholar service. Ethics in Science and Environmental Politics, 9, 33-35. https://doi.org/10.3354/esep005033

Perkel (2005). The future of citation analysis. The Scientist, 19 (20), 24.

Pitol, S.P.; De Groote, S.L. (2014). Google Scholar versions: do more versions of an article mean greater impact?. Library Hi Tech, 32 (4), 594-611. https://doi.org/10.1108/LHT-05-2014-0039

Price, G. (2004). Google Scholar documentation and large PDF files. Search Engine Watch. Available at: https://searchenginewatch.com/sew/news/2063361/google-scholar-documentation-large-pdf-files

Robinson, M.L.; Wusteman, J. (2007). Putting Google Scholar to the test: A preliminary study. Program, 41 (1), 71-80. https://doi.org/10.1108/00330330710724908

Rosenstreich, D.; Wooliscroft, B. (2009). Measuring the impact of accounting journals using Google Scholar and the g-index. The British Accounting Review, 41 (4), 227-239. https://doi.org/10.1016/j.bar.2009.10.002

Sanderson, M. (2008). Revisiting h measured on UK LIS and IR academics. Journal of the American Society for Information Science and Technology, 59 (7), 1184- 1190. https://doi.org/10.1002/asi.20771

Shultz M. (2007). Comparing test searches in PubMed and Google Scholar. Journal of the Medical Library Association, 95 (4), 442–445. https://doi.org/10.3163/1536-5050.95.4.442 PMid:17971893 PMCid:PMC2000776

Sullivan, D. (2004). Google Scholar Offers Access to Academic Information. Search Engine Watch. Available at: https://searchenginewatch.com/sew/ news/2048646/google-scholar-offers-access-to-academic-information

Thelwall, M.; Kousha, K. (2017). ResearchGate versus Google Scholar: Which finds more early citations?. Scientometrics, 112 (2), 1125-1131. https://doi.org/10.1007/s11192-017-2400-4

Thor, A.; Bornmann, L. (2011). The calculation of the single publication h index and related performance measures: A web application based on Google Scholar data. Online Information Review, 35 (2), 291-300. https://doi.org/10.1108/14684521111128050

Torres-Salinas, D.; Ruiz-Pérez, R.; Delgado-López-Cózar, E. (2009). Google Scholar como herramienta para la evaluación científica. El profesional de la información, 18 (5), 501-510. https://doi.org/10.3145/epi.2009.sep.03

Vanclay, J.K. (2012). Impact factor: outdated artefact or stepping-stone to journal certification?. Scientometrics, 92 (2), 211-238. https://doi.org/10.1007/s11192-011-0561-0

Vaughan, L.; Shaw, D. (2008). A New Look at Evidence of Scholarly Citations in Citation Indexes and From Web Sources. Scientometrics, 74 (2), 317–330. https://doi.org/10.1007/s11192-008-0220-2

Verstak, A.; Acharya, A. (2013). Identifying multiple versions of documents. US Patents (US8589784 B1). Available at: https://www.google.com/patents/US8589784

Vine, R. (2005). Google Scholar is a full year late indexing Pubmed content. SiteLines: ideas about searching. Available at: http://web.archive.org/web/20060716085124/ http://www.workingfaster. com/sitelines/archives/2005_02.html

Walters, W.H. (2007). Google Scholar coverage of a multidisciplinary field. Information Processing & Management, 43 (4), 1121-1132. https://doi.org/10.1016/j.ipm.2006.08.006

White, B. (2006). Examining the claims of Google Scholar as a serious information source. New Zealand Library & Information Management Journal, 50 (1), 11-24.

Wleklinski, J.M. (2005). Studying Google Scholar: wall to wall coverage?. Online, 29 (3), 22-26.

Yang, K.; Meho, L.I. (2006). Citation analysis: a comparison of Google Scholar, Scopus, and Web of Science. Proceedings of the American Society for information science and technology, 43 (1), 1-15. https://doi.org/10.1002/meet.14504301185

Published

2017-12-30

How to Cite

Orduna-Malea, E., Martín-Martín, A., & Delgado López-Cózar, E. (2017). Google Scholar as a source for scholarly evaluation: a bibliographic review of database errors. Revista Española De Documentación Científica, 40(4), e185. https://doi.org/10.3989/redc.2017.4.1500

Issue

Section

Studies

Most read articles by the same author(s)

<< < 1 2