Introducting rsql: programming SQL in R

SQL is great, but it’s not R

SQL is a powerful language for manipulating data. R is a powerful language for manipulating data. Frequently, data to be analyzed in R actually comes from a database using a SQL query. Fortunately, there are a variety of great R packages for interacting with databases and everything goes smoothly.

Eventually, you find yourself wanting to start doing a little bit more of the SQL in R, since R functions tend to be easier to document, generalize and reuse than SQL scripts. Now, everything is great! You’ve got R functions that get or manipulate data in database tables before bringing the data into R. Everything works, unless you make a make a mistake. Unfortunately, the arguments for those functions are just strings, so it’s rather cumbersome to combine all those things programmatically. Seemingly small generalizations become more and more difficult because, ultimately, R is not SQL.

Why rsql is better

When you operate of a data frame you write this

x.sub = subset(x,subset=(y>0))

Not this

x.sub = subset(x,subset=("y > 0"))

So why would you want to do that just because the data is in a database?

Caveats

Unfortunately, things are just barely less awesome:

x.sub = subset(x,subset=.(y>0))

Check it out on github.

Comments