Even the best personal laptop reaches its limits when faced with analytics tasks, and that pretty quickly. While contestants of various Kaggle competitions report that they often do pretty well with 4 cores and 8 – 16 GB RAM, my own experience tells me that building many models with even moderate-sized data sets as well as parameter tuning requires a different sort of machine. Amazon and its Elastic Compute Cloud (Amazon EC2) come to our rescue.

EC2 Setup and Amazon Machine Images

When I searched the Internet for guidance on how to set up Amazon EC2 for the first time, I stumbled upon Louis Aslett‘s website who maintains various Amazon Machine Images including an excellent installation guide which make it really easy to set up your own RStudio Server in the Amazon EC2 environment in less than an hour.

EC2 Instance Types

Amazon offers different hardware setups (referred to as EC2 Instance Types) at different rates. Which one you choose depends not only on your very own computational requirements but also on your budget constraints.

Compute Optimized (both C4 and C3) instances: “High performance front-end fleets, web-servers, batch processing, distributed analytics, high performance science and engineering applications, ad serving, MMO gaming, and video-encoding.”

Memory Optimized (R3) instances: “We recommend memory-optimized instances for high performance databases, distributed memory caches, in-memory analytics, genome assembly and analysis, larger deployments of SAP, Microsoft SharePoint, and other enterprise applications.”

In case you wonder what EC2 instance you should take, I can tell you that I tend to use the C4, C3 or R3 instances depending on computation and memory requirements.


