Operations on Pandas DataFrames
Merging DataFrames
Note:
how
can be set to left / right / inner / outer and it is similar to SQLon
is the column on which merge can occur
Concatenating DataFrames
Operations on Multiple DataFrames
Grouping data
Creating new columns using existing columns
Pivot Table
- dataframe
- index --> groupby this column
- column --> distinct values to individual columns
- value --> aggregation will be performed on this
- aggregate function --> mean, min, sum etc.
Conclusion
- There are two data structures in Pandas, Series (Columns) & DataFrames (Tables [objects i.e. rows over the series]). The Pandas DataFrame can be obtained from csv, dictionaries, json etc.
- Selecting parts of dataframe [Slicing and Dicing] operations can be performed on DataFrames using df.loc (label based) & df.iloc (index based)
- Pandas is a very powerful library which simplifies many of the common operations that need to be performed on data. Operations such as mean, sum, groupby can be performed on dataframes. Lambda functions can be used to create a new columns easily.