Reading data from external sources into a DataFrame

Reading from text file
Reading from Database
Scraping data using bs4
Reading data from PDF files (pyPDF2)

Reading from text file

# Using encoding = "ISO-8859-1"
companies = pd.read_csv("companies.txt", sep="\t", encoding = "ISO-8859-1")
companies.head()

Reading from Database

import pymysql

# create a connection object 'conn'
conn = pymysql.connect(host="localhost", # your host, localhost for your local machine
                     user="root", # your username, usually "root" for localhost
                      passwd="yourpassword", # your password
                      db="world") # name of the data base; world comes inbuilt with mysql

# create a cursor object c
c = conn.cursor()

# execute a query using c.execute
c.execute("select * from city;")

# getting the first row of data as a tuple
all_rows = c.fetchall()

# to get only the first row, use c.fetchone() instead

# notice that it returns a tuple of tuples: each row is a tuple
print(type(all_rows))

# printing the first few rows
print(all_rows[:5])

df = pd.DataFrame(list(all_rows), columns=["ID", "Name", "Country", "District", "Population"])
df.head()

Scraping data using bs4

Refer Lyrics Extraction and Auto File Renamer to see how web-scraping works
Convert scraped data to desirable format so as to import it as a dataframe in Pandas

Reading data from PDF files (pyPDF2)

See PyPDF2 Github and Docs

Tagged in python data-analysis pandas

Prev Next