12  SQL in R: Setup

12.1 Flavours of SQL in R

  • sqldf

  • RSQLie

  • duckdb

In this section, we will not connect to any standard RDBMS. We will execute SQL on an in-memory data.frame (table) that we have read into R and that are available in the R environment.

12.2 tblStudent

We read the following data in the R environment as a data.frame object tblStudent.

12.3 sqldf

  • sqldf: Manipulate R Data Frames Using SQL

  • GitHub: sqldf

  • The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data.frame names.

  • It transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the data.frame

  • The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle.

  • It supports backend databases like RSQLite, RH2, RMySQL and RPostgreSQL

12.4 sqldf: Example

  • install.packages("sqldf")

  • library(sqldf)

  • Read the data.frame

  • Write down the SQL statement as a character string and assign it to a variable

  • Example: sSQL = "SELECT * FROM tblStudent LIMIT 5"

  • Execute the statement as: sqldf::sqldf(sSQL)

  • You can also assign the executed object to an R data.frame object

Code
library(sqldf)

tblStudent = read.table(file = "tblStudent.txt", sep = "\t", header = TRUE)

sSQL = "SELECT * FROM tblStudent LIMIT 5"

sqldf::sqldf(sSQL)

# You can also store the results
out = sqldf::sqldf(sSQL)
StudentID SName Gender Age ClassID Attendance
1 Alice Female 15 1 95
2 Bob Male 16 1 88
3 Eve Female 16 2 96
4 Charlie Male 15 2 85
5 David Male 16 3 90


12.5 RSQLite

  • RSQLite: SQLite Interface for R

  • GitHub: RSQLite

  • Vignette: RSQLite

  • The library embeds the SQLite database engine in R and provides an interface compliant with the DBI package.

  • RSQLite is the easiest way to use a database from R because the package itself contains SQLite; no external software is needed.

  • RSQLite is a DBI-compatible interface which means you primarily use functions defined in the DBI package.

  • It can also run many complex queries like batch queries and multiple parameterised queries.

12.6 RSQLite: Example

  • install.packages("RSQLite")

  • library(DBI)

  • library(RSQLite)

  • Read the data.frame

  • For running RSQLite query, we need to set a temporary database connection:

    • Option 1: Use "" to connect to a database from the current environment.
      • Example: dbConnect(RSQLite::SQLite(), "")
    • Option 2: use ":memory:" or "file::memory:" for an in-memory database.
      • Example: dbConnect(RSQLite::SQLite(), "mydatabase.db")
    • This database will be automatically deleted when you disconnect from it.
  • Set the connection using one of the above options.

  • Write the table on the database using dbWriteTable function.

  • Write down the SQL statement as a character string and assign it to a variable

  • Example: sSQL = "SELECT * FROM tblStudent"

  • Execute the statement as: dbGetQuery(conn, sSQL)

  • Disconnect the connection

Code
library(DBI)
library(RSQLite)

tblStudent = read.table(file = "tblStudent.txt", sep = "\t", header = TRUE)

conn = dbConnect(RSQLite::SQLite(), "")
dbWriteTable(conn, name = "tblStudent", value = tblStudent)

sSQL = "SELECT * FROM tblStudent LIMIT 5"

dbGetQuery(conn, sSQL)

# You can also store the results
out = dbGetQuery(conn, sSQL)

dbDisconnect(conn, shutdown = TRUE)
StudentID SName Gender Age ClassID Attendance
1 Alice Female 15 1 95
2 Bob Male 16 1 88
3 Eve Female 16 2 96
4 Charlie Male 15 2 85
5 David Male 16 3 90


12.7 duckdb

  • duckdb: DBI Package for the DuckDB Database Management System

  • GitHub: duckdb

  • Website: DuckDB

  • The DuckDB project is an embedded analytical data management system with support for the SQL.

  • The package includes all of DuckDB and a R Database Interface (DBI) connector.

  • DuckDB is an in-process analytical database and supports a feature-rich SQL dialect complemented with deep integration into client application programming interfaces (APIs).

  • Besides R, it offers client APIs for several languages like C, C++, Python, Java, Julia, Rust. See the DuckDB website for further details.

12.8 duckdb: Example

  • install.packages("duckdb")

  • library(DBI)

  • library(duckdb)

  • Read the data.frame

  • For running duckdb query, we need to set a temporary database connection:

  • Register the connection using the function: duckdb_register. The function registers a data.frame as a virtual table in the DuckDB environment.

  • Write down the SQL statement as a character string and assign it to a variable

  • Example: sSQL = "SELECT * FROM tblStudent"

  • Execute the statement as: dbGetQuery(conn, sSQL)

  • Disconnect the connection

Code
library(DBI)
library(duckdb)

conn <- dbConnect(duckdb())
duckdb_register(conn = conn, name = "tblStudent", df = tblStudent)

sSQL = "SELECT * FROM tblStudent LIMIT 5"

dbGetQuery(conn, sSQL)

# You can also store the results
out = dbGetQuery(conn, sSQL)

dbDisconnect(conn, shutdown = TRUE)
StudentID SName Gender Age ClassID Attendance
1 Alice Female 15 1 95
2 Bob Male 16 1 88
3 Eve Female 16 2 96
4 Charlie Male 15 2 85
5 David Male 16 3 90