1. INTRODUCTION
⌅Place is a complex concept that can be evoked in many ways, from a physical space (real or imaginary) to a place of understanding or even a social place (Lefebvre, 1991Lefebvre, H. (1991). The production of space. In Gieseking, J.J., and Mangold, W. (eds.). The people, place, and space reader. Routledge.; Pillet Capdepón, 2004Pillet Capdepón, F. (2004). La geografía y las distintas acepciones del espacio geográfico. Investigaciones geográficas, 34, 141-154. DOI: 10.14198/ingeo2004.34.07.), through a multitude of quasi-synonymous terms (e.g., space, territory, location, area, site, landscape). The concept of place can differ significantly depending on the academic domain in which it is handled, be it geography, politics, history, law, sociology, psychology, etc. For this reason, and despite advances in technology and the development of geographic information systems (GIS) and their use in science (Longley et al., 2005Longley, P., Goodchild, Michael, F., Maguire, D.J., and Rhind, D.W. (2005). Geographic information systems and science. Wiley.), some authors have posited the lack of an overarching theory of place (Sui and Goodchild, 2011Sui, D., and Goodchild, M. (2011). The convergence of GIS and social media: challenges for GIScience. International Journal of Geographical Information Science, 25(11), 1737-1748. DOI: 10.1080/13658816.2011.604636.; Cresswell, 2014Cresswell, T. (2014). Place: an introduction. John Wiley & Sons.).
However, various attempts have been made to define place based on a consideration of its different uses. Agnew (1987Agnew, J.A. (1987). Place and politics: the geographical mediation of State and Society. Routledge.), for example, identifies three fundamental components of the concept: i) location, i.e., an element with fixed objective coordinates about a given point; ii) locale, i.e., the material settings for social relations; and iii) sense of place, i.e., the subjective, emotional attachments people have to a place.
In the science of science, a pluralist sensibility towards the nature of science emerged with the publication of Thomas Kuhn’s seminal work (1962Kuhn, T. S. (1962). The structure of scientific revolutions. University of Chicago Press.). From that date, the geographical perspective of science grew significantly, showing that “science is indelibly marked by the local and the spatial circumstances of its making” (Shapin, 1998Shapin, S. (1998). Placing the view from nowhere: historical and sociological problems in the location of science. Transactions of the Institute of British Geographers, 23(1), 5-12. DOI: 10.1111/j.0020-2754.1998.00005.x.), promoting the undertaking of studies that sought to determine the local or regional impact that science organizations (e.g., universities) have on development, thus linking science and location (Grossetti, 1995Grossetti, M. (1995). Science, industrie et territoire. Presses universitaires du Mirail.; Grossetti et al., 2007Grossetti, M., Milard, B., and Losego, P. (2007). La territorialisation des activités scientifiques dans le sud-ouest européen (France, Espagne, Portugal). Géographie Economie Société, 4(2), 427-442. DOI: 10.1016/S1295-926X(02)00047-3.; Sterlacchini, 2008Sterlacchini, A. (2008). R&D, higher education and regional growth: uneven linkages among European regions. Research Policy, 37(6-7), 1096-110. DOI: 10.1016/j.respol.2008.04.009.; Belenzon and Schankerman, 2013Belenzon. S., and Schankerman, M. (2013). Spreading the Word: Geography, Policy, and Knowledge Spillovers. The Review of Economics and Statistics, 95(3), 884-903. DOI: 10.1162/REST_a_00334.).
Subsequently, the development of pioneering bibliographic databases has enabled us to study different spatial aspects of science (Frenken et al., 2009) and represent science spatially through maps of science (Small and Garfield, 1985Small, H., and Garfield, E. (1985). The geography of science: Disciplinary and national mappings. Journal of Information Science, 11(4), 147-159. DOI: 10.1177/016555158501100402.). The development of GIS, in turn, has made it possible to explore the intersection between bibliometrics and geographic information in greater depth (Xuemei et al., 2014Xuemei, W., Mingguo, M., Xin, L. and Zhiqiang, Z. (2014). Applications and researches of geographic information system technologies in bibliometrics. Earth Science Information 7(3), 147-152. DOI: 10.1007/s12145-013-0132-4.) in what has become known as spatial bibliometrics (Frenken et al., 2009).
Spatial bibliometrics has facilitated the study of: i) the geographical distribution of scientists (Gaillard, 1991Gaillard, J. (1991). Scientists in the third world. University Press of Kentucky.); ii) the geographical distribution of disciplines (Carvalho and Batty, 2006Carvalho, R., and Batty, M. (2006). The geography of scientific productivity: Scaling in US computer science. Journal of Statistical Mechanics: Theory and Experiment, 2006(10), P10012. DOI: 10.1088/1742-5468/2006/10/p10012.); iii) scientific productivity by place, including country-level (e.g., Zhou and Leydesdorff, 2006Zhou, P., and Leydesdorff, L. (2006). The emergence of China as a leading nation in science. Research Policy, 35(1), 83-104. DOI: 10.1016/j.respol.2005.08.006.), city-level (Van Noorden, 2010Van Noorden, R. (2010). Cities: building the best cities for science. Nature, 467(7318), 906-908. DOI: 10.1038/467906a.; Eckert et al., 2013Eckert, D., Baron, M., and Jégou, L. (2013). Les villes et la science: apports de la spatialisation des données bibliométriques mondiales. M@ ppemonde, 110 (2013), 1-24. Available from: http://mappemonde-archive.mgm.fr/num38/articles/art13201.html.; Bornmann and De Moya-Anegón, 2019Bornmann, L., and De Moya-Anegón, F. (2019). Spatial bibliometrics on the city level. Journal of Information Science, 45(3), 416-425. DOI: 10.1177/0165551518806119.), institutional-level (Leydesdorff and Persson, 2010LeydesdorffL., and Persson, O. (2010). Mapping the geography of science: distribution patterns and networks of relations among cities and institutes. Journal of the American Society for Information Science and Technology, 61(8), 1622-1634. DOI: 10.1002/asi.21347.), and group-level (Cuyala, 2013Cuyala, S. (2013). La diffusion de la géographie théorique et quantitative européenne francophone d’après les réseaux de communications aux colloques européens (1978-2011). Cybergeo: European Journal of Geography, 657. Available from: 10.4000/cybergeo.26100 .; Maisonobe, 2013Maisonobe, M. (2013). Diffusion et structuration spatiale d’une question de recherche en biologie moléculaire. Mappemonde, 110(2), 1-17. Available from http://mappemonde.mgm.fr/num38/articles/art13202.html.) analyses; iv) citation-based impact by place (Batty, 2003Batty, M. (2003). The geography of scientific citation. Environment and Planning A, 35(5), 761-765. DOI: 10.1068/a3505com.; Wuestman et al., 2019Wuestman, M.L., Hoekman, J., and Frenken, K. (2019). The geography of scientific citations. Research Policy, 48(7), 1771-1780. DOI: 10.1016/j.respol.2019.04.004.); v) research excellence (Bornmann et al., 2011Bornmann, L., Leydesdorff, L., Walch-Solimena, C., and Ettl, C. (2011). Mapping excellence in the geography of science: An approach based on Scopus data. Journal of Informetrics, 5(4), 537-546. DOI: 10.1016/j.joi.2011.05.005.; Bornmann and Waltman, 2011Bornmann, L., and Waltman, L. (2011). The detection of “hot regions” in the geography of science—A visualization approach by using density maps. Journal of Informetrics, 5(4), 547-553. DOI: 10.1016/j.joi.2011.04.006.); vi) scientific collaboration, including conceptual (Cronin, 2008Cronin, B. (2008). On the epistemic significance of place. Journal of the American Society for Information Science and Technology, 59(6), 1002-1006. DOI: 10.1002/asi.20774.), methodological (Katz, 1994Katz, J. (1994). Geographical proximity and scientific collaboration. Scientometrics, 31(1), 31-43. DOI: 10.1007/bf02018100.), and applied (Gazni et al., 2012Gazni, A., Sugimoto, C.R., and Didegah, F. (2012). Mapping world scientific collaboration: Authors, institutions, and countries. Journal of the American Society for Information Science and Technology, 63(2), 323-335. DOI: 10.1002/asi.21688.; Hoekman et al., 2009Hoekman, J., Frenken, K., and Van Oort, F. (2009). The geography of collaborative knowledge production in Europe. The Annals of Regional Science, 43, 721-738. DOI: 10.1007/s00168-008-0252-9.) studies; vii) proximity, including its concept (Frenken et al., 2009FrenkenK., HardemanS., and Hoekman, J. (2009). Spatial scientometrics: Towards a cumulative research program. Journal of Informetrics, 3(3), 222-232. DOI: 10.1016/j.joi.2009.03.005.) and effects (Ponds et al., 2007Ponds, R., Van Oort, F., and Frenken, K. (2007). The geographical and institutional proximity of research collaboration. Papers in Regional Science, 86(3), 423-443. DOI: 10.1111/j.1435-5957.2007.00126.x.; Pan et al., 2012Pan, R.K., Kaski, K., and Fortunato, S. (2012). World citation and collaboration networks: uncovering the role of geography in science. Scientific Reports, 2(902). DOI: 10.1038/srep00902.); viii) scientific mobility (Laudel, 2003Laudel, G. (2003). Studying the brain drain: Can bibliometric methods help?. Scientometrics, 57(2), 215-237. DOI: 10.1023/a:1024137718393.; Robinson-Garcia et al., 2019Robinson-Garcia, N., Sugimoto, C.R., Murray, D., Yegros-Yegros, A., Larivière, V., and Costas, R. (2019). The many faces of mobility: Using bibliometric data to measure the movement of scientists. Journal of Informetrics, 13(1), 50-63. DOI: 10.1016/j.joi.2018.11.002.), including the brain drain phenomenon (Laudel, 2003Laudel, G. (2003). Studying the brain drain: Can bibliometric methods help?. Scientometrics, 57(2), 215-237. DOI: 10.1023/a:1024137718393.); ix) the geographical location of research funders (Grassano et al., 2017Grassano, N., Rotolo, D., Hutton, J., Lang, F., and Hopkins, M. M. (2017). Funding data from publication acknowledgments: Coverage, uses, and limitations. Journal of the Association for Information Science and Technology, 68(4), 999-1017. DOI: 10.1002/asi.23737.); x) the geopolitics of university rankings (Pietrucha, 2018Pietrucha, J. (2018). Country-specific determinants of world university rankings. Scientometrics, 114(3), 1129-1139. DOI: 10.1007/s11192-017-2634-1.); xi) attendance at scientific events (Van Dijk and Maier, 2006Van Dijk, J., and Maier, G. (2006). ERSA Conference participation: does location matter?. Papers in Regional Science, 85(4), 483-504. DOI: 10.1111/j.1435-5957.2006.00102.x.); xi) local and regional scholarly studies (Tijssen et al., 2006Tijssen, R.J., Mouton, J., Van Leeuwen, T.N., and Boshoff, N. (2006). How relevant are local scholarly journals in global science? A case study of South Africa. Research Evaluation, 15(3), 163-174. DOI: 10.3152/147154406781775904.); xiii) science maps (Borner, 2010Borner, K. (2010). Atlas of science: Visualizing what we know. MIT Press.); and xiv) the multi-affiliation of authors (Halevi et al., 2023Halevi, G., Rogers, G., Guerrero-Bote, V.P. and De-Moya-Anegón, F. (2023). Multi-affiliation: a growing problem of scientific integrity. Profesional de la información, 32(4). DOI: 10.3145/epi.2023.jul.01.).
Although spatial bibliometrics has provided meta-science with a geographical perspective, the view is limited because its geographical information is extracted exclusively from an author’s institutional affiliation. However, scholarly publications contain other relevant geographical information in the metadata fields of their bibliographic records (Castro-Torres and Alburez-Gutiérrez, 2022Castro-Torres, A., and Alburez-Gutiérrez, D. (2022). North and South: Naming practices and the hidden dimension of global disparities in knowledge production. Proceedings of the National Academy of Sciences, 119(10). DOI: 10.1073/pnas.2119373119.; Miguel et al., 2024Miguel, S., González, C.M. and Chinchilla-Rodríguez, Z. (2024) Towards a new approach to analyzing the geographical scope of national research. An exploratory analysis at the country level. Scientometrics, 129, 3659–3679. DOI: 10.1007/s11192-024-05045-9.) and the main body of their texts (Acheson and Purves, 2021Acheson, E., and Purves, R.S. (2021). Extracting and modeling geographic information from scientific articles. PloS one, 16(1), e0244918. DOI: 10.1371/journal.pone.0244918.) that can provide insights into the place where a particular study has been conducted, the specific location where samples have been taken or analyzed, the area in which fieldwork has been performed, or the city where a study’s subjects have been interviewed or surveyed. Hence, the design of new Scientometrics indicators based on place mentions would allow researchers to measuring and gain insights into other aspects related to the influence of place on research and vice versa.
Publications concerned with the extraction of geographical information from scientific publications - beyond, that is, the authors’ affiliations - have garnered limited attention. Nevertheless, this body of literature exhibits a solid technical and experimental nature, and has occupied itself with a range of disciplines or scientific domains rich in such information, including orchards and cancer genetics (Acheson and Purves, 2021Acheson, E., and Purves, R.S. (2021). Extracting and modeling geographic information from scientific articles. PloS one, 16(1), e0244918. DOI: 10.1371/journal.pone.0244918.), food packaging (Lentschat, 2020Lentschat, M., Buche, P., Dibie-Barthelemy, J., and Roche, M. (2020). Scipure: a new representation of textual data for entity identification from scientific publications. In Proceedings of the 10th international conference on web intelligence, mining and semantics, 220-226. ACM. Available from: 10.1145/3405962.3405978.), phylogeography (Weissenbacher et al., 2015WeissenbacherD., Tahsin, T., Beard, R., Figaro, M., Rivera, R., Scotch, M., and Gonzalez, G. (2015). Knowledge-driven geospatial location resolution for phylogeographic models of virus migration. Bioinformatics, 31(12), 348-356. DOI: 10.1093/bioinformatics/btv259.; 2017Weissenbacher, D., Sarker, A., Tahsin, T., Scotch, M., and Gonzalez, G. (2017). Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods. AMIA Summits on Translational Science Proceedings, 114-122. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543364/.; 2019Weissenbacher, D., MaggeA., O’Connor, K., Scotch, M., and Gonzalez, G. (2019). SemEval-2019 Task 12: Toponym Resolution in Scientific Papers. In Proceedings of the 13th International Workshop on Semantic Evaluation, 907-916. Association for Computational Linguistics. Available from: https://www.aclweb.org/anthology/papers/S/S19/S19-2155.), the life and earth sciences (Karl, 2019Karl, J.W. (2019). Mining location information from life-and earth-sciences studies to facilitate knowledge discovery. Journal of Librarianship and Information Science, 51(4), 1007-1021. DOI: 10.1177/0961000618759413.), geology (Leveling, 2015Leveling, J. (2015). Tagging of Temporal Expressions and Geological Features in Scientific Articles. In Proceedings of the 9th Workshop on Geographic Information Retrieval, 1-10. ACM. Available from: 10.1145/2837689.2837701.; Kmoch, 2018Kmoch, A., and Uuemaa, E. (2018). Geo-referencing of journal articles and platform design for spatial query capabilities. Available from: https://zenodo.figshare.com/articles/Geo-referencing_of_journal_articles_and_platform_design_for_spatial_query_capabilities/6893945.), biology (Scott et al., 2021Scott, J., Stock, K., Morgan, F., Whitehead, B., and Medyckyj-Scott, D. (2021). Automated georeferencing of Antarctic species. In Janowicz, K., and Verstegen, J. (eds.), 11th International Conference on Geographic Information Science. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. Available from: https://drops.dagstuhl.de/opus/volltexte/2021/14772.), and ecology (Tamames and De Lorenzo, 2010Tamames, J., and De Lorenzo, V. (2010). EnvMine: A text-mining system for the automatic extraction of contextual information. BMC bioinformatics, 11(1), 1-10. DOI: 10.1186/1471-2105-11-294.; Martin et al., 2012Martin, L.J., Blossey, B., and Ellis, E. (2012). Mapping where ecologists work: biases in the global distribution of terrestrial ecological observations. Frontiers in Ecology and the Environment, 10(4), 195-201. DOI: 10.1890/110154.; Karl et al., 2013Karl, J.W., Herrick, J.E., Unnasch, R.S., Gillan, J.K., Ellis, E.C., Lutters, W.G., and Martin, L.J. (2013). Discovering Ecologically Relevant Knowledge from Published Studies through Geosemantic Searching. BioScience, 63(8), 674-682. DOI: 10.1525/bio.2013.63.8.10.). Among studies taking an information science (and bibliometric) approach, local science-oriented publications stand out at both national (Chinchilla-Rodríguez et al., 2015Chinchilla-Rodríguez, Z., Miguel, S., and Moya-Anegón, F. (2015). What factors are affecting the visibility of Argentinean publications in human and social sciences in Scopus? Some evidences beyond the geographic realm of the research. Scientometrics, 102(1), 789-810. DOI: 10.1007/s11192-014-1414-4.; Miguel et al., 2015Miguel, S., González, C., and Chinchilla-Rodríguez, Z. (2015). Lo local y lo global en la producción científica argentina con visibilidad en Scopus, 2008-2012. Dimensiones nacionales e internacionales de la investigación. Información, Cultura y Sociedad, (32), 59-78. Available from: http://revistascientificas.filo.uba.ar/index.php/ICS/article/view/1375/1352.; González et al., 2019González, C., Archuby, G., and Miguel, S. (2019). La recuperación de información por delimitadores geográficos y su aplicación en estudios Bibliométricos sobre ciencia local. In SandraMiguel (coord.). Workshop Iberoamericano de estudios métricos de la actividad científica orientada a temas locales/regionals, 47-54. Universidad Nacional de La Plata. Available from: https://www.memoria.fahce.unlp.edu.ar/libros/pm.711/pm.711.pdf.; Miguel et al., 2023aMiguel, S., González, C. and Chinchilla-Rodríguez, Z. (2023a). National and international dimensions of research: topics of local and global interest in scientific production. In 19th International Conference on Scientometrics & Informetrics.) and regional (Arias and González, 2021Arias, Romina R., and González, Claudia M. (2021). Investigación sobre el Gran La Plata. Caracterización de la producción y estudio de la cobertura y solapamiento en fuentes bibliográficas referenciales. Actas de las 6tas Jornadas de intercambio y reflexión acerca de la investigación en Bibliotecología, 1-17. Universidad Nacional de La Plata. Available from: https://www.memoria.fahce.unlp.edu.ar/trab_eventos/ev.14267/ev.14267.pdf.) levels. Unlike publications based on authors’ affiliations, however, most of this body of literature lacks a scientometric conceptual basis, especially as regards the measurement of places using quantitative indicators.
A few attempts have been made to conceptualize place-mention analysis from a bibliometric perspective (Page, 2010Page, R.D.M. (2010). Enhanced display of scientific articles using extended metadata. Web Semantics: Science, Services and Agents on the World Wide Web, 8(2), 190-195. DOI: 10.1016/j.websem.2010.03.004.; Eckert et al., 2013Eckert, D., Baron, M., and Jégou, L. (2013). Les villes et la science: apports de la spatialisation des données bibliométriques mondiales. M@ ppemonde, 110 (2013), 1-24. Available from: http://mappemonde-archive.mgm.fr/num38/articles/art13201.html.; Cascón-Katchadourian et al., 2023Cascón-Katchadourian, J., Rodríguez-Domínguez, C., Carranza-García, F., Torres-Salinas, D. (2023). GeoAcademy: web platform and algorithm for automatic detection and location of geographic coordinates and toponyms in scientific articles. Revista Española de Documentación Científica, 46(4), e370. DOI: 10.3989/redc.2023.4.1393.); however, these works generally employ basic metrics (e.g., the number of publications mentioning a place) and do not exhaustively exploit all extant place metrics. Thus, a general Scientometrics-inspired framework is necessary to situate the concept within quantitative science studies, integrating, in this way, traditional studies of authors’ affiliation and scientific collaboration with modern studies of place mentions.
Therefore, the present paper’s main objective is to propose and define a Scientometrics framework for the study of place mentions in the scientific literature that can serve as a baseline for future conceptual, methodological, or descriptive studies.
2. METHODS
⌅The framework employed herein was developed systematically from the study of the scientific literature, on the one hand, and from a review undertaken by experts and the authors of this paper, on the other. It was, moreover, constructed by exploiting a sequential point of view obtained in the following three stages: the identification of a place (through place mentions), the description of a place (in terms of its essential characteristics), and the measurement of a place (using bibliometric indicators).
In the identification stage, the scientific literature dedicated to the geoparsing of scholarly publications was taken into consideration - above all, the work of Acheson and Purves (2021Acheson, E., and Purves, R.S. (2021). Extracting and modeling geographic information from scientific articles. PloS one, 16(1), e0244918. DOI: 10.1371/journal.pone.0244918.) - as well as the literature oriented towards highlighting inconsistencies in authors’ institutional affiliations, e.g., Taşkın and Al (2014Taşkın, Z., and Al, U. (2014). Standardization problem of author affiliations in citation indexes. Scientometrics, 98(1), 347-368. DOI: 10.1007/s11192-013-1004-x.). Second, the description stage was concerned with identifying the attributes of a place. To this end, a brainstorming session was conducted among the authors of this paper - all of whom have extensive experience in publishing and reviewing scholarly publications - aimed at identifying these attributes and agreeing on classes for each attribute. The first author distilled all this information, which the other four debated, discussed, and edited. Finally, the measurement stage involved identifying areas based on geographical information (i.e., place mentions), establishing bibliographic relationships between authors, publications, and places (e.g., mentions, citations, co-citations, and bibliographic couplings), and assigning geographical relationships between places (e.g., local, national, regional, and international), via physical and institutional proximity estimations (Frenken et al., 2009FrenkenK., HardemanS., and Hoekman, J. (2009). Spatial scientometrics: Towards a cumulative research program. Journal of Informetrics, 3(3), 222-232. DOI: 10.1016/j.joi.2009.03.005.).
The scientific literature on spatial bibliometrics (see Frenken et al., 2009FrenkenK., HardemanS., and Hoekman, J. (2009). Spatial scientometrics: Towards a cumulative research program. Journal of Informetrics, 3(3), 222-232. DOI: 10.1016/j.joi.2009.03.005.) integrates affiliation-related indicators within this framework. Additionally, the concept of ‘heterogeneous couplings’ (Costas et al., 2021Costas, R., De Rijcke, S., and Marres, N. (2021). “Heterogeneous couplings”: Operationalizing network perspectives to study science‐society interactions through social media metrics. Journal of the Association for Information Science and Technology, 72(5), 595-610. DOI: 10.1002/asi.24427.) was adopted to refer to the relationships between scholarly and non-scholarly publications when establishing the spatial relationship between two publications mentioning places.
3. RESULTS
⌅The study of place in scholarly literature can be structured into three main blocks (Figure 1). The first block - presence, detected by place identification - is concerned with accurately identifying a place in a scholarly publication, paying particular attention to linguistic uses. The second block - that is, description, with a specific concern for the characteristics of place - is concerned with defining a place previously identified by employing a distinct set of attributes. Based on metrics, the third block establishes indicators that reflect other uses of a place in the scientific literature.
The three blocks making up this study of place (as depicted in Figure 1) are described in greater depth in the following subsections.
3.1. Presence of science places
⌅When considering the places mentioned in the authors’ affiliation fields, the inherent lack of spatial precision in the information supplied must be borne in mind (Eckert et al., 2013Eckert, D., Baron, M., and Jégou, L. (2013). Les villes et la science: apports de la spatialisation des données bibliométriques mondiales. M@ ppemonde, 110 (2013), 1-24. Available from: http://mappemonde-archive.mgm.fr/num38/articles/art13201.html.). Such inaccuracies may arise from typographical errors, different degrees of geographical detail (i.e., where the same place may be described with varying levels of information), topographic variations, or insufficient data (e.g., identical place names within the same country). All these factors constitute a substantial challenge to efforts to assign a specific geographical location with any degree of accuracy based on the addresses provided by authors in their publications. This challenge is exacerbated when seeking to set a place in an aggregate space, such as a scientific area (Bornmann and De Moya-Anegón, 2019Bornmann, L., and De Moya-Anegón, F. (2019). Spatial bibliometrics on the city level. Journal of Information Science, 45(3), 416-425. DOI: 10.1177/0165551518806119.), given that it requires complex, time-consuming data-cleaning processes.
In the case of places mentioned in the body of the text, the challenge is even more significant. Here, the study of places requires the precise identification and recognition of toponyms (Bensalem and Kholladi, 2010Bensalem, I., and KholladiM.K. (2010) Toponym disambiguation by arborescent relationships. Journal of Computer Science, 6 (6), 653-659. DOI: 10.3844/jcssp.2010.653.659.), a process referred to as geoparsing (Leidner and Lieberman, 2011Leidner, J. L., and Lieberman, M.D. (2011). Detecting geographical references in the form of place names and associated spatial natural language. Sigspatial Special, 3(2), 5-11. DOI: 10.1145/2047296.2047298.). However, this is frequently hindered by several critical technical limitations attributable to the authors’ linguistic uses, making the identification task more complex. Table I shows the main obstacles to the accurate recognition of place names.
The complexity of this task also depends on the purpose or nature of the analysis. A closed analysis (i.e., identifying all mentions of one place or region) is more straightforward than an open analysis (i.e., identifying all place names mentioned). Likewise, a monolingual analysis (i.e., a corpus of documents written in a single language) is more accessible than a multilingual analysis (i.e., a corpus of publications in different languages). Similarly, a metadata-level analysis (i.e., an analysis of publications based on specific descriptive fields, such as the title, abstract, or keywords) can be completed more readily than a full-text analysis; however, the former is insufficient for identifying all the places that may have been mentioned in a single publication.
Whole-part relationships constitute an additional element of complexity. For example, if a publication mentions the city of Madrid, we can infer that Spain has been indirectly mentioned since Madrid is located in Spain. For this reason, in some studies, whole-part relationships need to be established between places to offer results at different levels of aggregation.
The toponym resolution is the relationship between the place mentioned and the unambiguous spatial footprint of that same place (e.g., latitude/longitude coordinates). However, the toponym resolution can be a further element of complexity in identifying a place name, especially in studies that visualize places on 2D or 3D maps. The resolution can be critical when the original place is incompletely or ambiguously mentioned.
Identifying place names may also be influenced by the scientific traditions that typify different research disciplines and which operate distinct representation mechanisms and rules when mentioning places.
Finally, the names of territories may change because of political conflicts and war. Thus, cities or countries may have changed their name (e.g., Madras became Chennai; Byzantium became Constantinople and later Istanbul), disappeared (e.g., Ctesiphon, the ancient Persian capital city), been disaggregated (e.g., Yugoslavia was broken down into Bosnia and Herzegovina, Croatia, Montenegro, North Macedonia, Serbia, and Slovenia), changed their geographical limits (e.g., the Roman empire), changed ownership (e.g., Strasbourg has belonged to both France and Germany), or ended up sharing the same border (e.g., the twin cities of Nova Gorica in Slovenia and Gorizia in Italy).
3.2. Characteristics of science places
⌅After identifying a place, the next step is to describe it. The following attributes must be defined to make this description:
Position
This attribute refers to the actual location where the mention of a place appears. Here, we propose breaking a publication down into three zones: A (affiliation), B (body of text, including title, abstract, and keywords), and C (references), in which the mentions of a place can occur (Figure 2).
While zones A and C have been extensively addressed in the literature and focus on the authors’ institutional affiliations, zone B has been barely explored. The appearance of mentions in this area does not necessarily respond to affiliations but to places used explicitly in the research.
Within zone B, mentions can be distinguished according to the specific section of the article in which they appear: i.e., title, abstract, keywords, introduction, method, results, discussion, conclusions, acknowledgments, or supplementary material (with each discipline/document type potentially introducing variants in the names afforded these sections).
Nomenclature
This attribute captures how the place is mentioned in the publication. An author might use a place name, its geographic coordinates, an image, or a geocode, among other options.
Method
The method employed when mentioning a place may be either direct (e.g., the sample was collected in “La Plata”) or indirect (e.g., the samples were analyzed at the “Instituto Médico Platense” [La Plata Medical Institute]). In the first instance, the place (La Plata, an Argentine city) is explicitly mentioned, while in the second, the place can be inferred from the mention of a hospital located in the city of La Plata.
Geographical category
This attribute refers to the class of the place, as used in geographical nomenclators. Thus, we can distinguish between administrative divisions, populated places and buildings, hydrography, orography, and transportation infrastructure.
Geographical scope
This attribute refers to the breadth of the mention, which might range from a specific space (e.g., a mountain, a bridge, or a building) to a city (e.g., Jeddah), a region (Mecca), a country (Saudi Arabia), a supranational area (Arabia) or a continent (Asia). It applies above all to places categorized as administrative divisions.
Role
This attribute indicates the specific function a mention has in a scholarly publication. Depending on the mention, a place can be categorized as playing either an endogenous (i.e., a mention directly related to the research conducted) or exogenous (i.e., a mention unrelated to the study carried out) role (see Table II). It should be borne in mind that the same place can take on different roles in the same publication depending on the mention it receives.
3.3. Place metrics
⌅Once the places have been identified and described, their use in the scholarly literature is quantified in the next stage. The term “place metrics” has been specifically coined in the present study to embrace all the metrics related to mentioning places, regardless of their geographical scope, role, method, and nomenclature.
Figure 3 depicts the conceptual framework devised - a Spatial Framework to identify Bibliographic Relationships (SFBR) - in which all “place metrics” are identified. Here, a “cited scholarly publication” is taken as a baseline. This publication can either be cited by another scholarly publication (referred to as “citing scholarly publication”) or a non-scholarly publication (referred to as “citing non-scholarly publication”). The cited scholarly publication can, in turn, cite other publications that appear in their zone C (see Figure 2) as cited references. Finally, users may consume the cited scholarly publication (e.g., read, download) in different places. Since citing (active, oriented to productivity) and being cited (passive, oriented to impact) acts lead to different indicators, they have been separated in the model to illustrate the bibliographic relationships in greater detail. However, they reflect reflexive actions (A cites B and B is cited by A).
The different place metrics identified in Figure 3 (letters A to Q) are defined below. For clarity, we assume just one author per citing or cited publication. However, geographic relationships must be computed for each citing/cited author pair.
- A. Multi-affiliation
The author might have one or more institutional affiliations, including a city, region, and country. In this case, a relationship is established between all the author’s affiliations.
- B. Collaboration
The author might collaborate with other authors, co-authoring a publication. In this case, a relationship is established between the affiliations of the co-authors.
- C. Place mention (from author’s affiliation)
The author might mention a place in the publication. In this case, a relationship is established between the author’s affiliation and the place mentioned.
- D. Cited affiliation
The author might include a citation to another publication, which will be described in the references. In this case, a relationship is established between the citing author’s and the cited author’s affiliations.
- E. Place co-mention
The author might mention more than one place in the publication. In this case, a relationship between these places is established.
- F. Place explicitly cited (in the cited document)
A place might be mentioned with a bibliographic reference, indicating a direct relationship between the place and said reference. For example, we might find in the cited document a sentence such as “… previous results have shown that productivity in the Netherlands has increased in the last decade (AuthorName, year)”. In this case, a connection is established between the place (i.e., the Netherlands) and the cited author’s affiliation (i.e., the affiliation of AuthorName).
- G. Co-cited affiliation (in the cited document)
The author might include citations to different publications, which are all described in the references. In this case, a relationship is established between the affiliation of the cited author of one cited reference and the affiliation of the cited author of another.
- H. Citing affiliation
Another scholarly publication might cite the author’s publication. In this case, a relationship is established between the cited author’s and the citing author’s affiliations.
- I. Place explicitly cited (in the citing document)
A place might be mentioned in a publication accompanied by a bibliographic citation. For example, in a document authored by AuthorName1, we might find a sentence such as “The literature has shown that the riots in Paris are related to […] (AuthorName2, year)”. In this case, a relationship is established between the place (i.e., Paris) and the citing author’s affiliation (i.e., the affiliation of AuthorName1).
- J. Place coupling
A place might be mentioned in two different publications. In this case, a relationship is established between the documents mentioning the same place.
- K. Co-citation of affiliations (in the citing document)
The author’s publication might be cited by other scholarly publications, which may include other cited references. In this case, a relationship is established between the authors’ affiliations of each cited reference.
- L. Affiliation coupling
Two different scholarly publications might mention the same place. In this case, a relationship is established between the affiliation of the author of the first publication and that of the author of the second publication.
- M. Heterogeneous affiliation coupling
Two publications might mention a place, a scholarly publication, and a non-scholarly publication (e.g., Facebook post, tweet, unpublished report, presentation). In this case, a relationship is established between the affiliation of the author of the scholarly publication and that of the author of the non-scholarly publication.
- N. Heterogeneous citing affiliation
A non-scholarly publication might cite the author’s publication. In this case, a relationship is established between the cited author’s and the citing author’s affiliations.
- O. Heterogeneous place coupling
A place might be mentioned in two different publications: a scholarly publication and a non-scholarly publication. In this case, a relationship is established between the two publications mentioning the same place.
- P. Heterogeneous place explicitly cited (in the citing document)
A non-scholarly document might cite the author’s publication, including a place mention. In this case, a relationship is established between the place and the citing author’s affiliation.
- Q. Place usage
Users can use the author’s publication (e.g., read, download) in different places. In this case, a relationship between the author’s affiliation and the user’s location is established.
From the 17 types of geographical relationships, 57 place metrics are proposed (Table III). The definition of each metric, along with illustrative examples, is included in Annexes I (author-level), II (place-level), and III (publication-level). The annexes are presented as supplementary material (see section 9).
Note: A: Author Affiliation; B: Body Text; C: References; AB: metric based on the relation between zones A and B; AC: metric based on the relation between zones A and C; BC: metric based on the relation between zones B and C. AEP = authority explicitly placed and PEC = place explicitly cited. See supplementary material for a detailed description of each indicator.
Places in zones A and C are based exclusively on the institutional affiliation of the publication’s authors. In contrast, places in zone B are based solely on the places explicitly mentioned by the publication’s authors elsewhere. However, note that the metrics of the AB intersection are related to the quantitative study of the places mentioned based on the authors’ affiliation, whereas those of the AC intersection are related to the relationship between the institutional affiliation of the publication’s authors and the institutional affiliation of the cited references’ authors. Finally, those of the BC intersection are related to the relationship between the places mentioned and the affiliations of the cited references’ authors.
Three main types of indicator have been considered to generate metrics. First, “count indicators” gauge the frequency of a place’s mention (i.e., how often an author mentions a place: for example, author A mentions Place 1 five times). Second, “breadth indicators” measure the number of elements generating or receiving mentions (i.e., unique authors mentioning a place or being mentioned from a place: for example, five authors mention Place 1). Third, “profile indicators” determine the scope of the geographical relationship between an author’s affiliation and other places, including those mentioned by the author or the affiliations of collaborating or cited authors (for example, Place 1 and Place 2 have a regional relationship).
While other cross-cutting indicators exist - most notably, the h-index, g-index, and i10-index (each calculated using identical procedures across various bibliographic databases and employing different parameters, that is, places, authors, and publications) - they have not been included here for the sake of clarity. However, they could be applied, expanding the range of possible indicators based on the bibliographic relationships established in Figure 3.
All metrics presented can be computed in absolute and disaggregated forms, depending on the attributes associated with the place, author, or document under analysis. Thus, these counts can be refined by taking into consideration such factors as the location of the mention (e.g., introduction, methods, results), the nomenclature employed (e.g., coordinates, textual mention), the purpose of the reference (e.g., informative, target, affiliation), the document type generating or receiving the mention (e.g., journal articles, conference papers, books, chapters, reports, posts), the publication year of the document generating or receiving the mention, the position of the author generating or receiving the reference, or even the disciplinary focus of the journal generating or receiving the mention.
Moreover, metrics implying a connection between two places (e.g., places cited by an author affiliated with an institution in a specific location) can be further dissected based on the scale of the geographical relationship between these places (e.g., local, regional, national, and international).
Furthermore, all these metrics can be computed based on the geographical scope assigned to the place, ranging from that of a specific area, municipality, region, or country to that of a supranational region, continent, or even an entire planet. This scope may be explicitly mentioned in the publication (e.g., the Netherlands is mentioned directly) or inherited (e.g., Amsterdam is referenced and is additionally attributed to the Netherlands due to the geographical association between the two references).
The number of place-related indicators expands significantly when the various data disaggregation parameters and geographical coverage units are taken into consideration. Annexes I, II, and III provide illustrative examples for each metric, breaking them down into multiple parameters and providing instances for places of varying geographical scope.
4. DISCUSSION
⌅The above framework comprises three consecutive stages of identifying, describing, and measuring science places. To the best of our knowledge, this is the first attempt to formally and comprehensively define the use of place in scientific publications (i.e., integrating studies based on author affiliations, on the one hand, with studies based on the identification of place names in the text, on the other), and to propose specific indicators where place is explicitly measured as an independent entity (place-level metrics), thus expanding the notion of spatial bibliometrics (Frenken et al., 2009FrenkenK., HardemanS., and Hoekman, J. (2009). Spatial scientometrics: Towards a cumulative research program. Journal of Informetrics, 3(3), 222-232. DOI: 10.1016/j.joi.2009.03.005.) and integrating the concept of heterogeneous couplings to consider place mentions between scholarly (including peer-reviewed publications and patents) and non-scholarly (e.g., press releases, clinical guidelines, policy reports, working papers) documents (Costas et al., 2021Costas, R., De Rijcke, S., and Marres, N. (2021). “Heterogeneous couplings”: Operationalizing network perspectives to study science‐society interactions through social media metrics. Journal of the Association for Information Science and Technology, 72(5), 595-610. DOI: 10.1002/asi.24427.).
Although other place-mention roles or place metrics might be identified, our primary objective here is not to be exhaustive in the lists we have drawn up but rather to structure the different components on which the study of places is based and to situate them cognitively under the umbrella of Scientometrics. In so doing, observation, bibliographic review, and expert review have been carried out. Consequently, the findings need to be expanded and improved by undertaking further empirical and theoretical studies.
In the case of the place metrics proposed here, the following issues must be considered. First, to facilitate calculation, most of the indicators are based only on the primary affiliation of each author; however, there is a clear need to expand their coverage by considering the multiple affiliations of each co-author. Second, the problems created by institutions with numerous headquarters (in some instances located in different regions and cities) must be addressed since this adds considerable complexity to the computation of some indicators. Third, we have included author-, publication-, and place-level metrics; however, other aggregations, including journals, groups, or universities, should also be studied. Fourth, the indicators address content explicitly mentioned within a publication (zones A, B, and C); however, metadata containing other geographical information (e.g., a publisher’s location) may be equally interesting. Fifth, usage metrics (e.g., downloads from specific locations) are not included in the proposal (see supplementary material) and should be specifically developed in future research.
However, the calculation of some place-related indicators is computationally complex, entailing not only the collection of multiple affiliations but also the establishment of a geographical relationship (e.g., affiliation with affiliation, affiliation with place-mention, and place-mention with place-mention) both for cited and citing publications, and for scholarly and non-scholarly publications. To this, we should add the complexity of correctly identifying each place name, which can be challenging even for formatted data included in the authors’ affiliations (Eckert et al., 2013Eckert, D., Baron, M., and Jégou, L. (2013). Les villes et la science: apports de la spatialisation des données bibliométriques mondiales. M@ ppemonde, 110 (2013), 1-24. Available from: http://mappemonde-archive.mgm.fr/num38/articles/art13201.html.).
Meta-researchers and geographers interested in determining the role played by place in scientific activity; researchers seeking to identify key locations in their disciplines; research evaluators wishing to gain insights into the local/international role of authors, publications and journals; and, practitioners and librarians involved in developing bibliographic products may all have an interest in consulting place metrics and can fruitfully exploit the model proposed in this work. To obtain these place-related metrics, their first source of information is the publication itself, with mentions of place appearing explicitly and implicitly throughout the publication in zones A, B, and C (Figure 2). In this way, they can identify the places that refer to the authors’ affiliations as well as those that might appear in the title, abstract, keywords, body of the work, notes, acknowledgments, and annexes. Their second source of information is the metadata of each publication, curated by publishers, repositories, or bibliographic databases. Mentions of places might appear primarily as toponyms (e.g., Norway) and demonyms (e.g., Norwegian), independently or as part of proper names (e.g., Norwegian University of Science and Technology).
Having obtained the geographical information, each place can then be identified by parsing the publication’s full text and harvesting its bibliographic metadata. The use of place authority lists, gazetteers and thesauri is recommended to merge variants (e.g., London is equal to Londres), establish whole-part relationships (e.g., Paris belongs to France), and to confirm official nomenclature. Likewise, the use of administrative and geodesic place codes (e.g., Mapcode) is also recommended to ensure unique identification of each place. Each place mention should then be characterized. In some instances, this task may be automated (e.g., the section in which the mention appears), but, in others, human intervention is required (e.g., to determine the role of the mention). Each characterized place mention should next be allocated to a citing/cited entity (e.g., author, publication, and place). GIS are needed here to describe places more robustly as well as determine the geographical relationship between two places (e.g., local, regional, national, international). Academic identifiers (e.g., ROR, ORCID, DOI) are also required to assign places to authors, institutions, and publications accurately. OpenAlex natively includes ROR identifiers, and other bibliographic databases are likely to incorporate them gradually.
The following considerations should be taken into account when assigning place mentions to citing/cited agents. Given a work A, published by author B (affiliated in C), which cites a work by author D (affiliated in E), all mentions of places located in zone A will be considered citing affiliations. If there is more than one, the geographical relationship between the different affiliations of the same author will be used to calculate multi-affiliation. In contrast, the geographical relationship between the affiliations of the various authors, if any, will be used to calculate collaboration. On the other hand, all places mentioned in zones B and C will be considered cited places. The citing agent will be assigned to the publication (i.e., the place is cited by work A), each author (i.e., the place is mentioned by author B), and each citing affiliation (i.e., the place is mentioned by affiliation C). Finally, the affiliations of the authors cited in zone C will be considered cited affiliations, mentioned by the work (i.e., affiliation E is mentioned by work A), each author (i.e., affiliation E is mentioned by author A), and each citing affiliation (i.e., affiliation E is mentioned by affiliation C). In all these cases, the mention date corresponds with work A's publication date.
The metrics can be obtained from ad hoc applications (operating from a collection of full texts imported into the system) or they can be directly provided by bibliographic databases, using their entire coverage of publications, thus facilitating the embedding of this information in publication-level metrics, author/ journal profiles, and new place profiles. While the full model requires the application to access the complete text of each publication, many indicators can be calculated on the basis of the information included in affiliations, titles, abstracts, and keywords, and which is already included in the metadata.
Journals could usefully provide authors with explicit guidelines to ensure places are mentioned in a standardized, unequivocal way and, in this way, help researchers in their studies of place, especially in some specific disciplines (e.g., history, geography, urbanism, zoology, archaeology, geology, and regional studies). Similarly, the inclusion of a section listing uniformly and unambiguously all places referenced in a publication would facilitate the automatic extraction of this information. This task could be further facilitated if HTML versions of manuscripts employed geo meta tags for extracting toponyms and demonyms. Finally, repositories could also request geographic information from authors when depositing their manuscripts as a means of generating geo metadata.
Although applications have already been developed that perform part of these tasks - above all, the identification of toponyms, their resolution and the calculation of basic metrics (Eckert et al., 2013Eckert, D., Baron, M., and Jégou, L. (2013). Les villes et la science: apports de la spatialisation des données bibliométriques mondiales. M@ ppemonde, 110 (2013), 1-24. Available from: http://mappemonde-archive.mgm.fr/num38/articles/art13201.html.; Acheson and Purves, 2021Acheson, E., and Purves, R.S. (2021). Extracting and modeling geographic information from scientific articles. PloS one, 16(1), e0244918. DOI: 10.1371/journal.pone.0244918.; Cascón-Katchadourian et al., 2023Cascón-Katchadourian, J., Rodríguez-Domínguez, C., Carranza-García, F., Torres-Salinas, D. (2023). GeoAcademy: web platform and algorithm for automatic detection and location of geographic coordinates and toponyms in scientific articles. Revista Española de Documentación Científica, 46(4), e370. DOI: 10.3989/redc.2023.4.1393.), there is no application currently available that allows toponyms to be identified accurately without human intervention (Gritta et al., 2018Gritta, M., Pilehvar, M. T., Limsopatham, N., and Collier, N. (2018). What’s missing in geographical parsing? Language Resources and Evaluation, 52(2), 603-623. DOI: 10.1007/s10579-017-9385-8.) or that permits all the indicators proposed and defined in the supplementary material of this work to be calculated on a large scale. There is, thus, a pressing need to design and test tools that facilitate the data collection and analysis of the place-related metrics embedded in the conceptual framework proposed in this work.
5. CONCLUSIONS
⌅This work has proposed a Scientometrics-inspired framework for integrating different studies of geographical place in the scientific literature (i.e., extended spatial Bibliometrics). Its primary contributions include the identification and description of the main attributes of a place (e.g., location, nomenclature, method, geographical scope, and, especially, place mention roles) and the bibliographic relationships between elements containing geographical information according to the zone in which they are located in the scholarly work, that is, the SFBR model developed herein. The SFBR allows the integration of geographical and bibliographic information, strengthening the identification, description, and testing of new place-oriented bibliometric indicators. Thus, we have proposed 57 bibliometric place-based indicators, divided into author-, publication-, and place-level metrics.
Some indicators may not be prevalent, relevant in corpora with particular characteristics, or applicable only in specific fields. For this reason, future studies should be devised to test the bibliometric properties of these indicators, especially when we are concerned with determining disciplinary differences in the use of places, the local dimension of an author concerning a process of evaluation, the contribution of a place to scientific endeavor or the use of a place for the development of a particular line of research.
Place-based metrics could be used to assess the spatial profile of authors, institutions, publications, journals, and disciplines; facilitate the development of new bibliographic applications; perform quantitative analyses of places (e.g., studies of infra/supra-analyzed places in the scientific literature); generate new place networks (relationships between locations outside the recognized networks of collaboration), provide place-based search features (i.e., bibliographic products that can track publications by the places they mention); study science traditions (i.e., the use and style of places mentioned depending on the scientific field), unveil the reasons underpinning the mention of each location; support local research (i.e., how particular places have contributed to specific research lines and how research has influenced particular areas); and, facilitate the deployment of public policies seeking territorial development. Specific research is needed to design the most appropriate methods that can address these issues within the spatial framework proposed in this work for identifying bibliographic relationships.