A “Runbook” or “Run Book” by any other name…

We use a runbook template to document what processes occur automatically or on demand. Typically these events are either date-triggered or event-triggered. An example of a date-triggered process might be a scheduled email sent nightly at midnight. An example of an event-triggered process might be to send an email when any disk on a server is 90% full.a run book captures processes

Configuration Management

Use this template as a starting point for building your own. It should be a key element of your Configuration Management Database (CMDB) used to document and operate your enterprise.


The simple runbook template below features three tabs. The first is to list all your automated processes–those that occur without assistance. The second is operational process, which captures tasks your team may perform either on demand or based on some event. The last is a lookup tab for dropdown values in the other tabs. You can ignore this.

The tabs are pretty self-explanatory. Fill in each task that your group performs, how often, how long it normally takes, and any accompanying notes. Feel free to add more columns to capture other configuration items such as who to contact in case of failure, email distribution groups who support the process, organization the process supports, etc.



Please note that you are free to use this for any purposes, public or private.  I submit it as a (hopefully) helpful tool back to the community of professionals who have given me the advice and feedback and trialed it in real-world operations.

Planning and Communicating Your Cluster Design

When creating a new Amazon Web Services (AWS) hadoop cluster it is overwhelming for most people to put together a configuration plan or topology.  Below is a Hadoop reference architecture template I’ve built that can be filled in that addresses the key aspects of planning, building, configuring, and communicating your hadoop cluster on AWS.

I’ve done this many times and as part of my focus on tools and templates thought I’d add a template you can use as a basic guideline for planning your Cloudera big data cluster.  The template includes configurations for:

  • instance basics
  • instance list
  • storage
  • operating system
  • CDH version
  • the cluster topology
  • metastore detail for hive, YARN, hue, impala, sqoop, oozie, and Cloudera Manager
  • high-availability
  • resource management
  • and additional detail for custom service descriptors (CSD) for Storm and Redis

No Warranty Expressed or Implied

It’s not meant to be exhaustive as there are many items not covered (AWS security groups, network optimization, dockerization, continuous integration, monitors, etc.) but it is an example of a real-world cluster in AWS (details of instance and AZ changed for security).

Screenshot of the roles and services in the big data design template

Example list of EC2 instances for the cluster plan

Cloudera hadoop reference architecture  configuration template for Amazon Web Services (AWS)


Please feel free to let me know how it works for you and if you have any improvements for it.

I’ve attached an excel file for a full-featured Big Data (hadoop) Production topology with a good starting place for an architecture that supports full Lambda architecture (streaming for seconds-old recency, batch for heavy lifting, and services to logically merge the two on demand).  The cluster is composed of 21 AWS instances with EBS backing.  The HDFS layer can be partitioned with the older data (those more than 1 year for example) are on cheaper S3 storage while still fully query-able.

The use cases covered in this architecture:

  1. Accessibility
    1. Data miner support through SQL and machine learning libraries into the raw data
    2. Ad-hoc querying through SQL in a dimensional model
    3. REST, thrift, and other API access with load balancing, data merging (from any data technology), and efficient data source routing
    4. OLAP cubes with perspectives (through data marts) for business analysis
  2. Technical
    1. Open source, free licensing model
    2. Fault tolerance and re-entrance on failure
    3. Scalable design with massive parallelism
    4. Cloud design for flexibility


Below is an example dev cluster topology for a Big Data development cluster as I’ve actually used for some customers.  It’s composed of 6 Amazon Web Service (AWS) servers, each with a particular purpose.  We have been able to perform full lambda using this topology along with Teiid (for data abstraction) on terabytes of data.  It’s not sufficient for a production cluster but is a good starting point for a development group.  The total cost of this cluster as configured (less storage) is under $6/hour.

Here’s a link to this dev_topology in Excel.


Service Category Server1 Server2 Server3 Server4 Server5 Server6
Cloudera Mgr Cluster Mgt Alert pub Server Host mon Svc Mon Event Svr Act Mon
Zookeeper Infra Server Server Server
YARN Infra Node Mgr Node Mgr JobHist Node Mgr RM/NM
Redis Infra Master Slave Slave
Hive Data Hive server Metastore Hcat
Impala Data App Master Cat Svr Daemon Daemon Daemon
Storm Data Nimbus/UI Supervisor Supervisor Supervisor
Hue UI Server
Pentaho BI UI BI Server
AWS details              
Name m3.2xlarge m3.2xlarge m3.2xlarge r3.4xlarge r3.4xlarge r3.4xlarge
vCPU 8 8 8 16 16 16
Memory (Gb) 30.0 30.0 30.0 122.0 122.0 122.0
Instance storage (Gb) SSD 2 x 80 SSD 2 x 80 SSD 2 x 80 SSD 1 x 320 SSD 1 x 320 SSD 1 x 320
I/O High High High High High High
EBS option Yes Yes Yes Yes Yes Yes