An approach for the detection of attribute-value pairs in natural language descriptions in the context of real estate offering
DOI:
https://doi.org/10.59471/raia2024209Keywords:
natural language processing, attribute-value extraction, eal estate observatoryAbstract
Structured information is a valuable resource for building information systems. The transformation of unstructured data into structured data can be automated, however, human text processing requires the use of Natural Language Processing (NLP) techniques. This study aims to explore and evaluate various approaches for the automatic extraction of attribute-value pairs from real estate advertisement descriptions. The final purpose is to enrich the database of the Real Estate Observatory of the Province of Buenos Aires. The Observatory purpose is to solve the lack of information on this matter and to facilitate public participation in real estate valuation.
References
Anantharangachar, R. et al.: Ontology Guided Information Extraction from Unstructured Text. Int. J. Web Semantic Technol. 4, 1, 19–36 (2013). https://doi.org/10.5121/ijwest.2013.4102.
Baur, K. et al.: Automated real estate valuation with machine learning models using property descriptions. Expert Syst. Appl. 213, 119147 (2023). https://doi.org/10.1016/j.eswa.2022.119147.
Blandón Andrade, J.C., Zapata Jaramillo, C.M.: Gate-Based Rules for Extracting Attribute Values. Comput. Sist. 25, 4, (2021). https://doi.org/10.13053/cys-25-4-3493.
Brinkmann, A. et al.: Product Information Extraction using ChatGPT. (2023). https://doi.org/10.48550/ARXIV.2306.14921.
Brown, T. et al.: Language Models are Few-Shot Learners. In: Larochelle, H. et al. (eds.) Advances in Neural Information Processing Systems. pp. 1877–1901 Curran Associates, Inc. (2020).
Cañete, J. et al.: Spanish Pre-Trained BERT Model and Evaluation Data. In: PML4DC at ICLR 2020. (2020).
Del Río, J.P. et al.: Normalización y análisis exploratorio de datos inmobiliarios web. In: XI Jornadas de Sociología de la UNLP 5-7 de diciembre de 2022 Ensenada, Argentina. Sociologías de las emergencias en un mundo incierto. Departamento de Sociología. Facultad de Humanidades y Ciencias de la … (2022).
Devlin, J. et al.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: North American Chapter of the Association for Computational Linguistics. (2019).
Dey Chowdhury, R. et al.: Product Attribute Extraction and Product Listing Analysis from E-commerce Websites. (2023). https://doi.org/10.13140/RG.2.2.11045.47842.
Dioguardi, F. et al.: Construcción de un grafo de conocimiento para un observatorio inmobiliario.
Huynh, S. et al.: Named Entity Recognition for Vietnamese Real Estate Advertisements. In: 2021 8th NAFOSTED Conference on Information and Computer Science (NICS). pp. 23–28 (2021). https://doi.org/10.1109/NICS54270.2021.9701519.
Khurana, D. et al.: Natural language processing: state of the art, current trends and challenges. Multimed. Tools Appl. 82, 3, 3713–3744 (2023). https://doi.org/10.1007/s11042-022-13428-4.
Linková, M., Gurskỳ, P.: Attributes Extraction from Product Descriptions on e-Shops. In: ITAT. pp. 23–26 (2017).
Pham, L.V., Pham, S.B.: Information Extraction for Vietnamese Real Estate Advertisements. In: 2012 Fourth International Conference on Knowledge and Systems Engineering. pp. 181–186 IEEE, Danang, Vietnam (2012). https://doi.org/10.1109/KSE.2012.27.
Probst, K. et al.: Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions. In: International Joint Conference on Artificial Intelligence. (2007).
Radford, A., Narasimhan, K.: Improving Language Understanding by Generative Pre-Training. Presented at the (2018).
Rajpurkar, P. et al.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I. and Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 784–789 Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124.
Sabeh, K. et al.: CAVE: Correcting Attribute Values in E-commerce Profiles. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management. pp. 4965–4969 ACM, Atlanta GA USA (2022). https://doi.org/10.1145/3511808.3557161.
Sharma, A. et al.: Named entity recognition in natural language processing: A systematic review. In: Proceedings of Second Doctoral Symposium on Computational Intelligence: DoSCI 2021. pp. 817–828 Springer (2022).
Sun, C. et al.: How to fine-tune bert for text classification? In: Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings 18. pp. 194–206 Springer (2019).
Vaswani, A. et al.: Attention is All you Need. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. (2017).
Vijayarajan, V. et al.: Ontology Based Object-Attribute-Value Information Extraction from Web Pages in Search Engine Result Retrieval. In: Kumar Kundu, M. et al. (eds.) Advanced Computing, Networking and Informatics- Volume 1. pp. 611–620 Springer International Publishing, Cham (2014).
Wang, Q. et al.: Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 47–55 ACM, Virtual Event CA USA (2020). https://doi.org/10.1145/3394486.3403047.
Automatic Extraction of Product Information, https://dida.do/projects/numeric-attribute-extraction-from-product-descriptions, last accessed 2023/06/26.
¿Qué es? Observatorio de valores de suelo, https://observatoriosuelo.gba.gob.ar/institucional/que-es.
Se presentó el “Observatorio de valores del suelo e instrumentos de financiamiento del desarrollo urbano” | Provincia de Buenos Aires, https://www.gba.gob.ar/habitat/noticias/se_present%C3%B3_el_%E2%80%9Cobservatorio_de_valores_del_suelo_e_instrumentos_de_financiamiento, last accessed 2024/04/12.
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.