Passing named arguments to Ruby Rake tasks using docopt for data science pipelines

Introduction Ever considered using rake for running tasks, but got stuck with the unnatural way that rake tasks pass in the arguments? Or have you seen the fancy argument parsing docopt and alikes can do for you? This article describes how…
Connecting Hive Spark on AWS

Connecting Hive and Spark on AWS in five easy steps

Hive and Spark are great tools for big data storing, processing and mining. They are usually deployed individually in many organizations. While they are useful on their own the combination of them is even more powerful. Here is the missing HOWTO…
Running Apache Spark on AWS

Apache Spark is being adopted at rapid pace by organization big and small to speed up and simplify big data mining and analytics architectures. First invented by researchers at AMPLab at UC-Berkeley, Spark codebase is being worked upon by hundreds…
Setting up Hadoop

Setting up Hadoop 2.4 and Pig 0.12 on OSX locally

This is first of many blog posts to come from our dev bootcamp. Often times you want to test your scripts and run code locally before you hit the push button. We want to share  our findings that we think will be helpful to the wider world…

