At Cloudera Analyst Days, Cloudera announced the details of their highly anticipated direction. Cloudera Data Platform (CDP) is a combination of the Cloudera and Hortonworks stacks, pulling in Spark, Tez, Hive, LLAP, Impala, druid, and other project capabilities, 30+ in all, into a single integrated stack.

This stack will have common security, governance and administration across the enterprise hybrid cloud environment: “Develop once, Deploy anywhere, Manage everything”. Importantly, with CDP, Cloudera has put the HDFS-only approach to data management in the rearview mirror. The pain with on-premises Hadoop implementations is now palpable and Cloudera is giving customers a choice of HDFS, S3, DataProc or Blob Storage with ORC, Avro and Parquet. They also have support for Apache Ozone – a scalable, redundant, and distributed object store for Hadoop for customers remaining on premises.

According to Cloudera, an implementation of CDP could serve the enterprise as the data warehouse, the data lake, and for machine learning functions. The cloud portability will make Cloudera Data Platform a better alternative for users looking to run machine learning applications and other types of advanced analytics at an enterprise scale.

CDP features the elimination of YARN in favor of Kubernetes for container management. A container-based approach provides significant advantages for the machine learning environment, certainly a direction for successful enterprises. Experimentation has got a bad name in the enterprise and Cloudera wants that to change.

With currently 1600 data warehouse customers and a target market of the top 5000 companies in the world, with CDP, Cloudera is giving customers access to robust data management and simplifying technical complexity for a hybrid cloud environment. Although it’s been long-awaited in this fast-paced enterprise data platform environment, it represents rapid development by the combined entity and the very best foot forward for Cloudera into the environment.

McKnight Consulting Group