Hadoop on containers


I have started working on a project to make Hadoop/Spark run effectively in a containerized environment. And while microservices like a web server are well suited to run in a container (being stateless), running any big data technology like Hadoop/Spark/Kafka can be a challenge thanks to their need of being stateful (opposite of what containers provide). There are some workarounds available and I have written a one-page report that takes no more than 2-3 minutes to read providing a brief, high-level overview. If you have any questions, please leave a comment and I’ll try to address them.