Daily Reading 12
The Pandas Library
###
Explain the purpose and basic functionality of the Pandas library. What are some common operations that can be performed on data using Pandas, and how do they contribute to data analysis and manipulation?
The Pandas library gives you access to a pair of classes intended for data handling, called Series, and DataFrame. With Pandas, you’re able to use simple methods to do views and sorting of the dataset you’ve provided. Dropping columns or rows of the dataset of a DataFrame, creating bar plots of specific columns, and creating an overview percentage of the averages of each column of your data are all single command actions, giving you lots of easy control over getting a better overview of large amounts of data.
What are the primary data structures in Pandas, and how do they differ in terms of use cases?
Pandas uses: Series, an array that supports individual labeling, and can hold data of any type.
DataFrame, a two dimensional array, or table, or spreadsheet. Some form of data format that is two axed. Note that each column of a data frame is a Series.
Because a Series is one dimensional, and a DataFrame is two, the same usecases apply for choosing to use a two dimensional array vs a python list, in that if your data involves multiple statistics that “relate” to a single item, you should probably use a DataFrame.
Describe the process of loading a dataset into a Pandas DataFrame. What are some common file formats that can be used, and which Pandas functions are utilized to read these formats?
import pandas as pd
df = pd.read_csv('pathto/your/data.csv')
Numerous different file formats are supported including sql, excel and json, each with their respective pd.read_x() command.