Simpler ETL Tasks with Pandas: Useful Functions Applied to Public Data
DOI:
https://doi.org/10.59471/raia2024204Keywords:
inteligencia artificial, Phyton, libreria Pandas, datos públicosAbstract
Abstract Artificial Intelligence (AI according to its initials) is undoubtedly one of the most disruptive technologies today. As well as Python, it is the most widely used programming language for the development of AI models, based on the analysis of datasets. Data analytics in Python began a dizzying growth from the development of the Pandas library, which provides functionalities of simple implementation and great utility for the processing of raw data, this being, perhaps, the most laborious task in the modeling process. In this work, the main functionalities of Pandas applied to public data are presented with applied code examples.
References
McKinney, W. “Data Structures for Statistical Computing in Python”. En Proceedings of the 9th Python in Science Conference (SciPy 2010).
McKinney, W. “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”. O'Reilly Media. ISBN: 978-1449319793 (2012).
Muhammad Arham, “Pandas. How to One-Hot-Encode Data”, KD-Nuggets (2023). En linea: https://www.kdnuggets.com/2023/07/pandas-onehot-encode-data.html
McKinney, Wes. "Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython." O'Reilly Media, 2017. ISBN: 978-1491957660.
Chen, D., Ma, C., & Zheng, L. (2021). "Pandas for Everyone: Python Data Analysis". Addison-Wesley Professional.
Wes McKinney. "Data Structures for Statistical Computing in Python." Proceedings of the 9th Python in Science Conference, 2010. Enlace al documento: https://conference.scipy.org/proceedings/scipy2010/pdfs/mckinney.pdf
Pandas Documentation. The official documentation for Pandas. Enlace a la documentación: https://pandas.pydata.org/pandas-docs/stable/
Kahn,Michael. Diabetes. UCI Machine Learning Repository. https://doi.org/10.24432/C5T59G.
Kaggle, Titanic - Machine Learning from Disaster. Enlace: https://www.kaggle.com/c/titanic/data
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.