Operations on Pandas DataFrames

Merging DataFrames

Note:

  • how can be set to left / right / inner / outer and it is similar to SQL
  • on is the column on which merge can occur

Concatenating DataFrames

Operations on Multiple DataFrames

Grouping data

Creating new columns using existing columns

Pivot Table

  • dataframe
  • index --> groupby this column
  • column --> distinct values to individual columns
  • value --> aggregation will be performed on this
  • aggregate function --> mean, min, sum etc.

Conclusion

  1. There are two data structures in Pandas, Series (Columns) & DataFrames (Tables [objects i.e. rows over the series]). The Pandas DataFrame can be obtained from csv, dictionaries, json etc.
  2. Selecting parts of dataframe [Slicing and Dicing] operations can be performed on DataFrames using df.loc (label based) & df.iloc (index based)
  3. Pandas is a very powerful library which simplifies many of the common operations that need to be performed on data. Operations such as mean, sum, groupby can be performed on dataframes. Lambda functions can be used to create a new columns easily.

Additional Resources

Thank the author. Fork this blog.


Tagged in pythondata-analysispandas