This fall, I found myself writing a number of Hive SQL queries, which was fun. The problem was that I was trying to do some not entirely simple things. In particular, I was trying to implement some of the R functions for data frames, e.g. melt, aggregate, etc. I started out writing these as shell scripts, but
S4 for rsql
For a variety of reasons, some good and some bad, I wanted to use S4 classes for the table objects in rsql. Among other things, S4 gives you inheritance, prototypes, and some other neat stuff. Just as important to me, but totally superficial is the fact that the contents (ie slots) of an S4 object are
SQL is great, but it’s not R
SQL is a powerful language for manipulating data. R is a powerful language for manipulating data. Frequently, data to be analyzed in R actually comes from a database using a SQL query. Fortunately, there are a variety of great R
Hadoop Streaming and TypedBytes
Typedbytes is a binary format for serializing data that is supported by Hadoop streaming. Several different Hadoop applications have found dramatic performance improvements by transitioning from text formats (e.g. csv) to
Trying to help
One of the things I’ve liked about git is how much it encourages responsible and clean commits. Personally, I’m not there yet. I commit when I think to, usually when I’m at a reasonable stopping point, which means I just end up doing
git commit -a rather than adding and committing clean groups of related
Implicit conversion with
But no unboxing with
Equality is not transitive
"" == 0
0 == "0"
"" != "0"
parseInt is not base ten!
First things first, so install rvm and all your dependencies.
\curl -L https://get.rvm.io | bash -s stable
rvm install 1.9.3
rvm use 1.9.3
echo "source $HOME/.rvm/scripts/rvm" >> ~/.bash_profile
For my first post, I thought I’d document how I built this site.
Middleman is a tool for dynamically building a static website. There are a variety of advantages to building a static website. For me, I wanted a blog a