Treating Missing values in DataFrame
Identify missing values
Note: If there were any rows missing all values, we would simply drop them.
Treating missing Values
There are broadly two ways to treat missing values
- Delete the values
-
Impute the values
- Use statistics such as mean, median, mode to fill values
- Use predictive models (k-NN, SVM) to fill missing values
Conclusion
- One can generate data from various sources such as text files, web scraping, APIs, databases, PDF files etc. Using requests, beautifulsoup, selenium, PyPDF2 one can extract relevant data, and convert it to desired format.
- In any large enough dataset, there are bound to be missing values
-
The missing values found in a dataset can be treated in two ways,
- delete the row/column which has missing values. If some column has too many missing values we can simply drop that column to avoid bias.
- impute missing values statistics (mean, mode etc.) can be employed to