Pandas Basics

Table of Contents

What is Pandas

  1. Pandas is a library specifically aimed at simplifying the process of Data Analysis.
  2. It provides a number of functions that come in handy whenever working with a data set.
  3. There are two types of data structures in Pandas

    • Series: similar to a 1d numpy array except it is indexed and can store values other than numbers
    • DataFrame: The Pandas way of representing a table. Consists of Series (columns) stored as objects (rows)

Apply lambda functions to Pandas Series

Converting CSV to DataFrame and getting metadata information

Note: Use df.describe() to get general information mean, std, max, min values of a dataframe

Create a Series with custom indexing using Pandas

Custom Index for Pandas DataFrame

Note: If you want to overwrite the original dataframe:

Sorting DataFrame

Note: If you want to sort in descending order:

Selecting Even Rows of Pandas DataFrame

Accessing parts of the dataframes

Note: We can use df.iloc[x, y] to get indexed based selection on dataframe. x represents rows & y represents columns. Standard indexing and slicing techniques apply.

Note: df.loc is similar to df.iloc except that it works on labels instead of indexes

Difference between iloc & loc

iloc loc
Position-based indexing. Label-based indexing.
Upper bounds for row and columns are not included if we specify by some number. Upper bounds for row and columns are included if we specify by some number.

Selecting DataFrame based on conditions applied over the columns

Dropping duplicate rows

Selecting values of a particular quantile in Pandas DataFrame

Create Day column from Date Time column in Pandas

Percentage Wise column distribution

Frequency Table using pd.crosstab

References

Thank the author. Fork this blog.


Tagged in pythondata-analysispandas