RSS

Category Archives: sqldf

sqldf


sqldf is an R package for running SQL statements on R data frames and data tables, optimized for convenience. The user simply specifies an SQL statement in R using R object names in place of table names and a database with appropriate table layouts/schema is automatically created, the data frames are automatically loaded into the database, the specified SQL statement is performed, the result  is read back into R and the database is deleted all automatically behind the scenes making the database’s existence transparent to the user who only specifies the SQL statement. Sqldf supports SQLite backend database (by default), so that there is no separate installation for SQLite. We only select records from R object.

I am using IRIS data but it has “.” in between the field name and SQLite returns error while using field name with “.” and we can’t use double quotes in the field name. So, I am renaming the field name using the plyr library.

Renaming of data.frame fields

> ir = rename(iris, c(“Sepal.Length” = “SL”, “Sepal.Width” = “SD”, “Petal.Width” = “PW”, “Petal.Length” = “PL”))

Example of SQL statement with sqldf library

> sqldf(‘select Species, avg(SL) mean_sepal_length, avg(SD) mean_sepal_width, avg(PL) mean_petal_length, avg(PW) mean_petal_width from ir group by Species’)

sqldf

sqldf is case insensitive. It returns error when we used ir and IR. Both are same for sqldf.

sqldf error

We can also perform insert, update and delete statement through sqldf library.

Although data frames referenced in the SQL statement(s) passed to sqldf are automatically imported to SQLite, sqldf does not automatically export anything for safety reasons. Thus if you update a table using sqldf you must explicitly return it as shown in the examples below.

Note that in the select statement we referred to the table as main.DF (main is always the name of the sqlite database.) If we had referred to the table as DF (without qualifying it as being in main) sqldf would have fetched DF from our R workspace rather than using the updated one in the sqlite database

Suppose, we have data frame

> DF = data.frame(a=1:5, b=c(1,3,1,2,1))

Insert

> sqldf(c(“insert into DF values(6,2)”, “select * from main.DF”))

sqldf insert

In the same way, we can perform update and delete operations.

 
Leave a comment

Posted by on September 8, 2013 in sqldf

 

Tags: , , , , , , ,