Treating Missing values in DataFrame

Identify missing values

Note: If there were any rows missing all values, we would simply drop them.

Treating missing Values

There are broadly two ways to treat missing values

  1. Delete the values
  2. Impute the values

    • Use statistics such as mean, median, mode to fill values
    • Use predictive models (k-NN, SVM) to fill missing values

Conclusion

  1. One can generate data from various sources such as text files, web scraping, APIs, databases, PDF files etc. Using requests, beautifulsoup, selenium, PyPDF2 one can extract relevant data, and convert it to desired format.
  2. In any large enough dataset, there are bound to be missing values
  3. The missing values found in a dataset can be treated in two ways,

    • delete the row/column which has missing values. If some column has too many missing values we can simply drop that column to avoid bias.
    • impute missing values statistics (mean, mode etc.) can be employed to

Thank the author. Fork this blog.


Tagged in pythondata-analysispandas