
Operations on Pandas DataFrames
Merging DataFrames
import pandas as pd
df_1 = pd.read_csv('https://query.data.world/s/vv3snq28bp0TJq2ggCdxGOghEQKPZo')
df_2 = pd.read_csv('https://query.data.world/s/9wVKjNT0yiRc3YbVJaiI8a6HGl2d74')
df_3 = pd.merge(df_1, df_2, how = "inner", on = "unique_id")
print(df_3.head(20))Note:
howcan be set to left / right / inner / outer and it is similar to SQLonis the column on which merge can occur
Concatenating DataFrames
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
df_1 = pd.read_csv('https://query.data.world/s/vv3snq28bp0TJq2ggCdxGOghEQKPZo')
df_2 = pd.read_csv('https://query.data.world/s/9wVKjNT0yiRc3YbVJaiI8a6HGl2d74')
df_3 = pd.concat([df_1, df_2])
print(df_3.head())Operations on Multiple DataFrames
import pandas as pd
gold = pd.DataFrame({'Country': ['USA', 'France', 'Russia'],
'Medals': [15, 13, 9]}
)
silver = pd.DataFrame({'Country': ['USA', 'Germany', 'Russia'],
'Medals': [29, 20, 16]}
)
bronze = pd.DataFrame({'Country': ['France', 'USA', 'UK'],
'Medals': [40, 28, 27]}
)
gold.set_index('Country', inplace=True)
silver.set_index('Country', inplace=True)
bronze.set_index('Country', inplace=True)
total = gold.add(silver, fill_value=0)
total = total.add(bronze, fill_value=0)
print(total.sort_values(by='Medals', ascending=False))Grouping data
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
#Type your groupby command here
df_1 = df.groupby(['month', 'day'])['rain', 'wind'].mean()
print(df_1.head(20))Creating new columns using existing columns
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df['XY'] = df['X'] * df['Y']
print(df.head(20))Pivot Table
- dataframe
- index --> groupby this column
- column --> distinct values to individual columns
- value --> aggregation will be performed on this
- aggregate function --> mean, min, sum etc.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_1 = df.pivot_table(
index=["month", "day"],
aggfunc = "mean",
values = ["rain", "wind"]
)
print(df_1.head(20))Conclusion
- There are two data structures in Pandas, Series (Columns) & DataFrames (Tables [objects i.e. rows over the series]). The Pandas DataFrame can be obtained from csv, dictionaries, json etc.
- Selecting parts of dataframe [Slicing and Dicing] operations can be performed on DataFrames using df.loc (label based) & df.iloc (index based)
- Pandas is a very powerful library which simplifies many of the common operations that need to be performed on data. Operations such as mean, sum, groupby can be performed on dataframes. Lambda functions can be used to create a new columns easily.