Pattern’s for Deploying On-prem BigData Applications on Amazon Web Services (AWS)

July 16, 2019

TAGs: Amazon Web Services (AWS) | BigData Applications | Cloud Automation | On-prem |

Cloud Automation

One of our clients, a leading financial organization was looking to move their existing on-premises applications into AWS cloud. They turned to Apps Associates to help them design and develop several frameworks to automate the deployment securely to AWS.

Working with them closely, Apps Associates followed important principles for any application before moving into AWS. Our approach included:

- Automate Everything
  - Develop automated routines to create, manage and update your applications
  - Refrain from using the AWS console to make manual changes

- Encrypt Everything
  - All data must be encrypted at rest
  - All S3 buckets must be encrypted using the KMS master key
  - All EBS volumes must be encrypted
  - All data in transit must be encrypted using TLS

- Secure Everything
  - Do not store any credentials in the source code
  - Always externalize the credentials from the application artifacts and make the application artifacts common to all environments
  - Always use the right role, policy and permissions to operate the application. Do not share roles and policies across applications.

- Be Frugal
  - Always shutdown the resources when they are not needed. Use automation to bring the resources online when they are needed
  - Choose use of serverless capabilities over dedicated infrastructure

Operate being mindful of privileges
- Use the minimum set of permissions to operate a service. Do not use an overly broad set of permissions
- Do not use security groups or firewall rules that are very broad or overly permissive

Framework for Deploying On-prem BigData Applications into AWS EMR

Apps Associates worked with the client to develop the framework to deploy their on-prem Big Data Applications. The main objectives included:

Enable teams to focus on application development
Configure Driven EMR Platform Service
Provision the Transient EMR clusters
Submit a job once job is completed, terminate the EMR cluster and cleanup resources
Create consistent Logging and Alerting Mechanism
Ease management of core EMR related services

Shared EMR Cluster

Shared EMR cluster is designed and developed for application teams to submit their BigData jobs (it can be Spark job or any Hive/Presto query executions) for processing data on AWS. It is a highly available cluster configured with Active-StandBy configuration to avoid downtime at any point in time for disaster recovery purposes. And also Active, StandBy master nodes are registered to the load balancer to distribute the application load. All the data is encrypted at rest and also in transit. Auto scaling is enabled based on some metrics like Memory utilization to auto provision core nodes to take care the processing load. Benefits include:

Provision ELB, Active and StandBy EMR clusters
The ELB sits in front of the primary EMR cluster and the CloudWatch alarm monitors the health of the cluster
If there is something wrong and the master node becomes unhealthy, the alarm will be on and the SNS topic will notify all the subscribers, including emails and the Lambda function
The lambda function then will trigger the deregistering of the current primary cluster master and register the standby master
The standby cluster can be created small and as the failure process starts and queries started to be routed to the standby cluster, auto scaling/manual scaling can start and scale the cluster to the ideal size

Patterns for Transient EMR provisioning

Transient EMR provisioning framework patterns are developed to help application teams to deploy their BigData applications into AWS cloud, but these patterns are transient in nature, as soon as the application job completes EMR cluster will be terminated and the corresponding backend resources will be cleaned. Transient EMR patterns helps application teams focus on application development instead of provisioning EMR clusters. Everything is configuration driven so application teams can update the parameters based on their requirements and necessity.

We developed multiple patterns based on the application team needs including:

S3 notification event pattern
CloudWatch schedule pattern

S3 notification event pattern

Create a notification on S3 bucket
If any object arrives into S3 bucket Lambda will be triggered
Lambda will execute the Cloud Formation templates to provision EMR cluster
Once the EMR cluster created submit a Spark job
Once the Spark job execution completed, terminate the EMR cluster and cleanup the resources

CloudWatch schedule pattern

Create a Cloud watch schedule event
On the specified schedule Lambda will be triggered
Lambda will execute the Cloud Formation templates to provision EMR cluster
Once the EMR cluster created submit a Spark job
Once the Spark job execution completed, terminate the EMR cluster and cleanup the resources

By utilizing the approach above we were able to provide the client with key benefits that include:

Configuration Driven EMR Platform Service
Enable teams to focus on application development
Centralized security and compliance routines
Consistent Logging and Alerting Mechanism
Enhanced AWS-infrastructure design with better maintainability, cost optimization
Ease management of core EMR related services

With Apps Associates help the client was able to address the following challenges:

Application teams can focus on application development instead of creating Cloudformation templates and other platform functions for provisioning EMR and related services
Application teams now just need to select parameters and execute a Jenkins job to provision EMR
Once the EMR job completes EMR will terminate immediately saving a lot of cost
If the EMR failed to provision or the EMR job failed to complete, a detailed email notification is sent to the person who initiated the EMR provisioning
Automatically scales the core and task nodes based on the utilization

If you have any questions or need help with an integration, please reach out to me directly at [email protected].

Balaji Bobba

Recent Posts

Topics

Pattern’s for Deploying On-prem BigData Applications on Amazon Web Services (AWS)

Cataneo Turns to Apps Associates...

Transforming Healthcare: A Health Clinic...

Implementing Salesforce Creates New Current...