Spark is a very popular environment for processing data and doing machine learning in a distributed environment.
When working in a development environment you might work on a single node. This can be your local PC or laptop, as not everyone will have access to a multi node distributed environment.
But what if you could spin up some docker images there by creating additional nodes for you to test out the scalability of your Spark code.
There are links to some Docker images that may help you to do this.
- Mesosphere – Docker repository for Spark image
- Big Data Europe – Spark Docker images on GitHub
- GettyImages – Spark Docker image on GitHub and also available on Docker website
- SequenceIQ – Docker repository Spark image
Or simply create a cloud account on the Databricks Community website to create your own Spark environment to play and learn.