Below is an example dev cluster topology for a Big Data development cluster as I’ve actually used for some customers. It’s composed of 6 Amazon Web Service (AWS) servers, each with a particular purpose. We have been able to perform full lambda using this topology along with Teiid (for data abstraction) on terabytes of data. It’s not sufficient for a production cluster but is a good starting point for a development group. The total cost of this cluster as configured (less storage) is under $6/hour.
Here’s a link to this dev_topology in Excel.
Service | Category | Server1 | Server2 | Server3 | Server4 | Server5 | Server6 |
Cloudera Mgr | Cluster Mgt | Alert pub | Server | Host mon | Svc Mon | Event Svr | Act Mon |
HDFS | Infra | Namenode | SNN/DN/JN/HA | DN | DN/JN | DN | DN/JN |
Zookeeper | Infra | Server | Server | Server | |||
YARN | Infra | Node Mgr | Node Mgr | JobHist | Node Mgr | RM/NM | |
Redis | Infra | Master | Slave | Slave | |||
Hive | Data | Hive server | Metastore | Hcat | |||
Impala | Data | App Master | Cat Svr | Daemon | Daemon | Daemon | |
Storm | Data | Nimbus/UI | Supervisor | Supervisor | Supervisor | ||
Hue | UI | Server | |||||
Pentaho BI | UI | BI Server | |||||
IP ADDRESS | |||||||
AWS details | |||||||
Name | m3.2xlarge | m3.2xlarge | m3.2xlarge | r3.4xlarge | r3.4xlarge | r3.4xlarge | |
vCPU | 8 | 8 | 8 | 16 | 16 | 16 | |
Memory (Gb) | 30.0 | 30.0 | 30.0 | 122.0 | 122.0 | 122.0 | |
Instance storage (Gb) | SSD 2 x 80 | SSD 2 x 80 | SSD 2 x 80 | SSD 1 x 320 | SSD 1 x 320 | SSD 1 x 320 | |
I/O | High | High | High | High | High | High | |
EBS option | Yes | Yes | Yes | Yes | Yes | Yes |