Descriptors analysis on different knowledge areas in CSIC databases. Application on automatic indexing
DOI:
https://doi.org/10.3989/redc.1997.v20.i2.589Keywords:
Descriptors analysis, linguistic analysis, statistical analysis, automatic indexing, CSIC databasesAbstract
The value of scientific articles titles and abstracts as sources of terms for document indexing is studied in relation with six knowledge areas: Library and Information Science, Medicine, Chemistry, Biology, Psychology and Physics, indexed in the databases ISOC, IME and ICYT of the CSIC. The sintagmatic structures of the indexing terms found in the field «Descriptors» is also examined, as well as the relation between the length of the documents and the number of descriptors. In order to do this, six searches were made in the databases for the six knowledge areas, and 450 bibliographical references were selected (75 for knowledge area), obtaining 2.077 descriptors; of these, 38,1% appear in the titles, in the abstracts or in both. With respect to the syntactic structures it was found that 41,9% were «nouns», 32,3% are «noun+adjective» groups, and 11,8% are «noun+noun» groups, with a 14% for other different structures. Lastly, regarding the relationship between length of documents and number of descriptors, all possible combinations were found: short articles with a few descriptors, long articles with a small amount of descriptors, short articles with a important quantity of descriptors, and documents with a high number both of pages and descriptors The following conclusions can be raised from the data obtained: first, if the abstracts are not well made and the titles are not precise, they are not definitives sources for the extraction of concepts; second, the most common syntactic structures is the «noun phrase», followed by «noun+adjective» and «noun-noun»: third, no significant relation is found between length of documents and number of descriptors assigned to it.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 1997 Consejo Superior de Investigaciones Científicas (CSIC)

This work is licensed under a Creative Commons Attribution 4.0 International License.
© CSIC. Manuscripts published in both the print and online versions of this journal are the property of the Consejo Superior de Investigaciones Científicas, and quoting this source is a requirement for any partial or full reproduction.
All contents of this electronic edition, except where otherwise noted, are distributed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. You may read the basic information and the legal text of the licence. The indication of the CC BY 4.0 licence must be expressly stated in this way when necessary.
Self-archiving in repositories, personal webpages or similar, of any version other than the final version of the work produced by the publisher, is not allowed.