Automatic Titling of Election Survey Questions
DOI:
https://doi.org/10.3989/redc.2016.2.1236Keywords:
Automatic titling, information extraction, opinion polls, abstractingAbstract
This paper describes the work carried out for automatically generating titles for questions included in the opinion polls contained in CIS databases (Centro de Investigaciones Sociológicas – Spanish Center of Sociological Research). In the context of CIS, the title of a question should meet two requirements: from the point of view of form, it has to be grammatically correct and similar in style to existing ones; from the point of view of content, it must contain the subject of the question and the different options for answering. These conditions for form and content of titles discourage the use of techniques used in similar problems, such as automatic abstracting or machine learning with a training corpus, but rather favor a methodology based on an analysis and knowledge of the domain. To illustrate the analysis and the resolution strategy of the problem, we have selected a set of questions related to elections, due to their strategic importance and to CIS’s own specialization in opinion polls. The process followed and the subsequent evaluation of results are discussed in detail, with an assessment of both qualitative and quantitative aspects. The evaluation shows that 88.73% of the generated titles are in strict accordance with CIS’s requisites on form and content, resulting in reduced time spent by the institution’s qualified personnel on manual work.
Downloads
References
Cui, H.; Kan M.; Chua T. (2007). Soft pattern matching models for definitional question answering. ACM Transactions on Information Systems, vol. 25(2), pp. 1-30. http://dx.doi.org/10.1145/1229179.1229182
Gallardo Pérez, C.; Carde-osa, J. (2011). Knowledge extraction for question titling. In Proceedings of the 9th international conference on Flexible Query Answering Systems (FQAS'11), Springer-Verlag, Berlin, Heidelberg, vol. 7022, pp. 119-127. http://dx.doi.org/10.1007/978-3-642-24764-4_11
García Gutiérrez, A. (2014). Análisis documental de noticias de prensa en sistemas de información factual. Revista Espa-ola de Documentación Científica, vol 37(2). http://dx.doi.org/10.3989/redc.2014.2.1094 http://dx.doi.org/10.3989/redc.2014.2.1094
Goldstein, J.; Kantrowitz, M.; Mittal, V.; Carbonell, J. (1999). Summarizing text documents: Sentence selection and evaluation metrics. Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, USA. 121-128. http://dx.doi.org/10.1145/312624.312665
Hung, S.; Lin, C.; Hong, J. (2010). Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling. Expert Systems with Applications, vol. 37(1), pp. 341-347. http://dx.doi.org/10.1016/j.eswa.2009.05.060
Jin, R.; Hauptmann, E. G. (2001). Headline generation using a training corpus. Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing, CICLING. Lecture Notes on Computer Science, Berlin: Springer-Verlag, vol. 2004: 208-215.
Jin, R.; Hauptmann, A. G. (2002). A new probabilistic model for title generation. Proceedings of the 19th International Conference on Computational Linguistics, vol. 1. http://dx.doi.org/10.3115/1072228.1072365
Liu, K.; Chapman, W. W.; Savova, G.; Chute, C. G.; Sioutos, N.; Crowley, R. S. (2011). Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents. Methods of Information in Medicine, vol. 50(5), pp. 397-407. http://dx.doi.org/10.3414/ME10-01-0020 PMid:21057720 PMCid:PMC3125434
Martínez-Ávila, D.; San Segundo, R.; Zurian, F. (2014). Retos y oportunidades en organización del conocimiento en la intersección con las tecnologías de la información. Revista Espa-ola de Documentación Científica, vol. 37(3). http://dx.doi.org/10.3989/redc.2014.3.1112
Spärck Jones, K. (2007). Automatic summarising: The state of the art. Information Process.Management, vol. 43(6), pp. 1449-1481. http://dx.doi.org/10.1016/j.ipm.2007.03.009
Spasic, I.; Sarafraz, F.; Keane, J. A.; Nenadic, G. (2010). Medication information extraction with linguistic pattern matching and semantic rules. Journal of the American Medical Informatics Association, vol. 17(5), pp. 532-535. http://dx.doi.org/10.1136/jamia.2010.003657 PMid:20819858 PMCid:PMC2995671
Published
How to Cite
Issue
Section
License
Copyright (c) 2016 Consejo Superior de Investigaciones Científicas (CSIC)

This work is licensed under a Creative Commons Attribution 4.0 International License.
© CSIC. Manuscripts published in both the print and online versions of this journal are the property of the Consejo Superior de Investigaciones Científicas, and quoting this source is a requirement for any partial or full reproduction.
All contents of this electronic edition, except where otherwise noted, are distributed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. You may read the basic information and the legal text of the licence. The indication of the CC BY 4.0 licence must be expressly stated in this way when necessary.
Self-archiving in repositories, personal webpages or similar, of any version other than the final version of the work produced by the publisher, is not allowed.