Pandas Basics
Table of Contents
What is Pandas
- Pandas is a library specifically aimed at simplifying the process of Data Analysis.
- It provides a number of functions that come in handy whenever working with a data set.
-
There are two types of data structures in Pandas
- Series: similar to a 1d numpy array except it is indexed and can store values other than numbers
- DataFrame: The Pandas way of representing a table. Consists of Series (columns) stored as objects (rows)
Apply lambda functions to Pandas Series
Converting CSV to DataFrame and getting metadata information
Note: Use df.describe()
to get general information mean, std, max, min values of a dataframe
Create a Series with custom indexing using Pandas
Custom Index for Pandas DataFrame
Note: If you want to overwrite the original dataframe:
Sorting DataFrame
Note: If you want to sort in descending order:
Selecting Even Rows of Pandas DataFrame
Accessing parts of the dataframes
Note: We can use df.iloc[x, y]
to get indexed based selection on dataframe. x
represents rows & y
represents columns. Standard indexing and slicing techniques apply.
Note: df.loc
is similar to df.iloc except that it works on labels instead of indexes
Difference between iloc & loc
iloc | loc |
---|---|
Position-based indexing. | Label-based indexing. |
Upper bounds for row and columns are not included if we specify by some number. | Upper bounds for row and columns are included if we specify by some number. |