Simpler ETL Tasks with Pandas: Useful Functions Applied to Public Data

Authors

DOI:

https://doi.org/10.59471/raia2024204

Keywords:

inteligencia artificial, Phyton, libreria Pandas, datos públicos

Abstract

Abstract Artificial Intelligence (AI according to its initials) is undoubtedly one of the most disruptive technologies today. As well as Python, it is the most widely used programming language for the development of AI models, based on the analysis of datasets. Data analytics in Python began a dizzying growth from the development of the Pandas library, which provides functionalities of simple implementation and great utility for the processing of raw data, this being, perhaps, the most laborious task in the modeling process. In this work, the main functionalities of Pandas applied to public data are presented with applied code examples.

References

McKinney, W. “Data Structures for Statistical Computing in Python”. En Proceedings of the 9th Python in Science Conference (SciPy 2010).

McKinney, W. “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”. O'Reilly Media. ISBN: 978-1449319793 (2012).

Muhammad Arham, “Pandas. How to One-Hot-Encode Data”, KD-Nuggets (2023). En linea: https://www.kdnuggets.com/2023/07/pandas-onehot-encode-data.html

McKinney, Wes. "Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython." O'Reilly Media, 2017. ISBN: 978-1491957660.

Chen, D., Ma, C., & Zheng, L. (2021). "Pandas for Everyone: Python Data Analysis". Addison-Wesley Professional.

Wes McKinney. "Data Structures for Statistical Computing in Python." Proceedings of the 9th Python in Science Conference, 2010. Enlace al documento: https://conference.scipy.org/proceedings/scipy2010/pdfs/mckinney.pdf

Pandas Documentation. The official documentation for Pandas. Enlace a la documentación: https://pandas.pydata.org/pandas-docs/stable/

Kahn,Michael. Diabetes. UCI Machine Learning Repository. https://doi.org/10.24432/C5T59G.

Kaggle, Titanic - Machine Learning from Disaster. Enlace: https://www.kaggle.com/c/titanic/data

Downloads

Published

2024-12-30

How to Cite

1.
Kamlofsky J, Manzano F, Lopez Yse D. Simpler ETL Tasks with Pandas: Useful Functions Applied to Public Data. Revista Abierta de Informática Aplicada [Internet]. 2024 Dec. 30 [cited 2025 Feb. 5];8(1):47-70. Available from: https://raia.revistasuai.ar/index.php/raia/article/view/204