Apache Ozone

What is Apache Ozone.

Ozone is a scalable, redundant, and distributed object store for Hadoop. Ozone can function effectively in containerized environments such as Kubernetes and YARN.

Ozone is an object store designed for big data applications

Why Ozone is Object Store

Object stores are more straightforward to build and use than standard file systems. It is also easier to scale an object-store.

Apache HDFS is designed for large objects

  • It is not for many small objects. Small files create memory pressure on namenode.
  • Each small file creates a block in the datanode . Datanodes send all block information to namenode in BlockReports. Both of these create scalability issues on Namenode.

Volumes, Buckets, and Keys in Ozone

Storage Volume is similar to a home directory in the ozone world. Only an administrator can create it. Volumes are used to store buckets.

Buckets consist of keys and objects. it is also similar to S3 or Container in Azur. Buckets are similar to directories. A bucket can contain any number of keys, but buckets cannot contain other buckets

Once a volume is created users can create as many buckets as needed.

Ozone stores data as keys that live inside these buckets.

Ozone Architecture

The main objective of ozone is to provide scalability. it aims to scale to billions of objects.

Ozone separates namespace management and block space management which was combined in hdfs. this helps ozone to scale much better. The namespace is managed by a daemon called Ozone Manager (OM), and block space is managed by Storage Container Manager (SCM).

 

The above diagram shows the high-level overview of Ozone Architecture.

The Ozone Manager is the name space manager, Storage Container Manager manages the physical and data layer and Recon is the management interface for Ozone.

This means that when you want to write some data, you ask Ozone Manager for a block and Ozone Manager gives you a block and remembers that information. When you want to read that file back, you need to find the address of the block and Ozone Manager returns it to you.

Key reads are simpler, the client requests the block list from the Ozone Manager
The ozone manager will return the block list and block tokens which allows the client to read the data from data nodes.
The client connects to the data node and presents the block token and reads the data from the data node.

 

Storage Container Manager
Storage Container Manager (SCM) is the leader node of the block space management. The main responsibility is to create and manage containers which is the main replication unit of Ozone.

 

Storage container manager provides multiple critical functions for the Ozone cluster. SCM acts as the cluster manager, Certificate authority, Block manager and the Replica manager