Amazon’s EMR Spark

Parallel Computing Jobs in AWS

Bruce Haydon
May 9, 2022

Spark is an open-source foundational technology in the machine learning space. Similar to MapReduce, It is a framework for distributed processing of large data workloads to assist in the areas of machine learning, stream processing or analytics. Unlike MapReduce, Spark has the ability to actively cache data in-memory which offers a significant increase in performance for data processing and analytics.

Like other Hadoop implementations, Spark can be installed on an Amazon AWS EMR (Elastic MapReduce) cluster.

DRAFT Framework Chap link 3V72- Bruce Haydon ©2021, 2022

--

--