14  SQL in Python: Setup

14.1 Flavours of SQL in Python

  • sqldf

  • sqlite3

  • duckdb

  • pandasql, SQLAlchemy

In this section, we will not connect to any standard RDBMS. We will execute SQL on an in-memory Pandas DataFrame (table) that we have read in Python and available in the Python environment.

14.2 tblStudent

We read the following data in the Python environment as a DataFrame object tblStudent.

If required, please install pandas in your virtual environment: pip install pandas

14.3 sqldf

  • sqldf: SQLDF - Structured Query Language (SQL) on DataFrames (DF)

  • GitHub: sqldf

  • The sqldf.run() function is typically passed a single argument which is an SQL select statement where the table names are ordinary Python DataFrame names.

  • It transparently creates a virtual in-memory SQLite3 databases at runtime, converts pd.DataFrame input(s) to SQL table(s), performs the SQL select or other statement, converts the SQL table(s) to updated pd.DataFrame if required and returns the results of the query.

14.4 sqldf: Example

  • pip install sqldf

  • import pandas as pd

  • import sqldf

  • Read the DataFrame

  • Write down the SQL statement as a character string and assign it to a variable

  • Example: sSQL = "SELECT * FROM tblStudent LIMIT 5"

  • Execute the statement as: sqldf.run(sSQL)

  • You can also assign the executed object to a Python DataFrame object

Code
import pandas as pd
import sqldf

tblStudent = pd.read_csv('./data/tblStudent.txt', sep = '\t')

sSQL = "SELECT * FROM tblStudent LIMIT 5"

sqldf.run(sSQL)

# You can also store the results
out = sqldf.run(sSQL)
StudentID SName Gender Age ClassID Attendance
1 Alice Female 15 1 95
2 Bob Male 16 1 88
3 Eve Female 16 2 96
4 Charlie Male 15 2 85
5 David Male 16 3 90


14.5 sqlite3

14.6 sqlite3: Example

  • pip install sqlite3

  • import pandas as pd

  • from sqlite3 import connect

  • Read the DataFrame

  • Set a temporary database connection: conn = connect(':memory:')

  • Write the table to the database using the to_sql method.

  • Write down the SQL statement as a character string and assign it to a variable

  • Example: sSQL = "SELECT * FROM tblStudent"

  • Execute the statement as: pd.read_sql(sSQL, conn)

  • Disconnect the connection

Code
import pandas as pd
from sqlite3 import connect

tblStudent = pd.read_csv('./data/tblStudent.txt', sep = '\t')

conn = connect(':memory:')

tblStudent.to_sql(name = 'tblStudent', con = conn)

sSQL = "SELECT * FROM tblStudent LIMIT 5"

pd.read_sql(sSQL, conn)

# You can also store the results
out = pd.read_sql(sSQL, conn)

conn.close()
StudentID SName Gender Age ClassID Attendance
1 Alice Female 15 1 95
2 Bob Male 16 1 88
3 Eve Female 16 2 96
4 Charlie Male 15 2 85
5 David Male 16 3 90


14.7 duckdb

14.8 duckdb: Example

  • pip install duckdb

  • import pandas as pd

  • import duckdb

  • Read the DataFrame

  • Write down the SQL statement as a character string and assign it to a variable

  • Example: sSQL = "SELECT * FROM tblStudent"

  • Execute the statement as: duckdb.sql(sSQL)

Code
import pandas as pd
import duckdb

tblStudent = pd.read_csv('./data/tblStudent.txt', sep = '\t')

sSQL = "SELECT * FROM tblStudent LIMIT 5"

duckdb.sql(sSQL)

# You can also store the results
out = duckdb.sql(sSQL)
StudentID SName Gender Age ClassID Attendance
1 Alice Female 15 1 95
2 Bob Male 16 1 88
3 Eve Female 16 2 96
4 Charlie Male 15 2 85
5 David Male 16 3 90