Automatic Titling of Election Survey Questions

Authors

  • Carolina Gallardo Departmento de Sistemas de Información, Escuela Técnica Superior de Sistemas Informáticos. Universidad Politécnica de Madrid
  • Jesús Cardeñosa Grupo de Validación y Aplicaciones Industriales. ETS de Ingenieros Informáticos. Universidad Politécnica de Madrid

DOI:

https://doi.org/10.3989/redc.2016.2.1236

Keywords:

Automatic titling, information extraction, opinion polls, abstracting

Abstract


This paper describes the work carried out for automatically generating titles for questions included in the opinion polls contained in CIS databases (Centro de Investigaciones Sociológicas – Spanish Center of Sociological Research). In the context of CIS, the title of a question should meet two requirements: from the point of view of form, it has to be grammatically correct and similar in style to existing ones; from the point of view of content, it must contain the subject of the question and the different options for answering. These conditions for form and content of titles discourage the use of techniques used in similar problems, such as automatic abstracting or machine learning with a training corpus, but rather favor a methodology based on an analysis and knowledge of the domain. To illustrate the analysis and the resolution strategy of the problem, we have selected a set of questions related to elections, due to their strategic importance and to CIS’s own specialization in opinion polls. The process followed and the subsequent evaluation of results are discussed in detail, with an assessment of both qualitative and quantitative aspects. The evaluation shows that 88.73% of the generated titles are in strict accordance with CIS’s requisites on form and content, resulting in reduced time spent by the institution’s qualified personnel on manual work.

Downloads

Download data is not yet available.

References

Cui, H.; Kan M.; Chua T. (2007). Soft pattern matching models for definitional question answering. ACM Transactions on Information Systems, vol. 25(2), pp. 1-30. http://dx.doi.org/10.1145/1229179.1229182

Gallardo Pérez, C.; Carde-osa, J. (2011). Knowledge extraction for question titling. In Proceedings of the 9th international conference on Flexible Query Answering Systems (FQAS'11), Springer-Verlag, Berlin, Heidelberg, vol. 7022, pp. 119-127. http://dx.doi.org/10.1007/978-3-642-24764-4_11

García Gutiérrez, A. (2014). Análisis documental de noticias de prensa en sistemas de información factual. Revista Espa-ola de Documentación Científica, vol 37(2). http://dx.doi.org/10.3989/redc.2014.2.1094 http://dx.doi.org/10.3989/redc.2014.2.1094

Goldstein, J.; Kantrowitz, M.; Mittal, V.; Carbonell, J. (1999). Summarizing text documents: Sentence selection and evaluation metrics. Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, USA. 121-128. http://dx.doi.org/10.1145/312624.312665

Hung, S.; Lin, C.; Hong, J. (2010). Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling. Expert Systems with Applications, vol. 37(1), pp. 341-347. http://dx.doi.org/10.1016/j.eswa.2009.05.060

Jin, R.; Hauptmann, E. G. (2001). Headline generation using a training corpus. Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing, CICLING. Lecture Notes on Computer Science, Berlin: Springer-Verlag, vol. 2004: 208-215.

Jin, R.; Hauptmann, A. G. (2002). A new probabilistic model for title generation. Proceedings of the 19th International Conference on Computational Linguistics, vol. 1. http://dx.doi.org/10.3115/1072228.1072365

Liu, K.; Chapman, W. W.; Savova, G.; Chute, C. G.; Sioutos, N.; Crowley, R. S. (2011). Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents. Methods of Information in Medicine, vol. 50(5), pp. 397-407. http://dx.doi.org/10.3414/ME10-01-0020 PMid:21057720 PMCid:PMC3125434

Martínez-Ávila, D.; San Segundo, R.; Zurian, F. (2014). Retos y oportunidades en organización del conocimiento en la intersección con las tecnologías de la información. Revista Espa-ola de Documentación Científica, vol. 37(3). http://dx.doi.org/10.3989/redc.2014.3.1112

Spärck Jones, K. (2007). Automatic summarising: The state of the art. Information Process.Management, vol. 43(6), pp. 1449-1481. http://dx.doi.org/10.1016/j.ipm.2007.03.009

Spasic, I.; Sarafraz, F.; Keane, J. A.; Nenadic, G. (2010). Medication information extraction with linguistic pattern matching and semantic rules. Journal of the American Medical Informatics Association, vol. 17(5), pp. 532-535. http://dx.doi.org/10.1136/jamia.2010.003657 PMid:20819858 PMCid:PMC2995671

Published

2016-06-30

How to Cite

Gallardo, C., & Cardeñosa, J. (2016). Automatic Titling of Election Survey Questions. Revista Española De Documentación Científica, 39(2), e133. https://doi.org/10.3989/redc.2016.2.1236

Issue

Section

Notes and Experiences