Kubernetes slowly become one of the most popular container runtime environment while Hadoop has already been widely used open source bigdata platform since a long time. The questions is here: how can we use the new cloud-native toolset to administer and manage Hadoop based clusters? Is there any benefit to run Hadoop and other bigdata application on top of Kubernetes?
In this presentation I will show that Hadoop is not a legacy application but it could be run very easy in cloud-native environment thanks to the generic and distributed by design.
The first step to run an application in a Kuberrnetes cluster is containerization. Creating containers for an application is easy (even if it’s a goold old distributed application like Apache Hadoop), just a few steps of packaging. The hard part isn't packaging: it's deploying
How can we run the containers together? How to configure them? How do the services in the containers find and talk to each other? How do you deploy and manage clusters with hundred of nodes?
Modern cloud native tools like Kubernetes or Consul/Nomad could help a lot but they could be used in different way.
It this presentation I will demonstrate multiple solutions to manage containerized clusters with different cloud-native tools including kubernetes, and docker-swarm/compose.
No matter which tools you use, the same questions of service discovery and configuration management arise. This talk will show the key elements needed to make that containerized cluster work.