Object Storage is All You Need: Justin Cormack

2024/12/01

A summary of one of the standout talks from KubeCon by Justin Cormack, CTO of Docker: “Object Storage is All You Need.”

Amazon S3 has been an excellent primitive on the object storage front. As Jeff Bezos put it, it truly is the “malloc for the internet.” But what exactly is an object store?

What is an Object Store?

The “non-POSIX” part is particularly important — let’s dig into why.

POSIX — Portable Operating System Interface

At a high level, POSIX is a set of standards to ensure that applications developed on one UNIX flavor can run on other UNIXes. (This compatibility is one of the reasons Linux remains relevant today.)

The POSIX standard describes how system calls must behave, with specific conditions that must be enforced:

Why Isn’t S3 POSIX-Compliant?

How Concurrency in Object Stores Evolved

Phase 1: The Beginning

S3’s design was initially influenced by Amazon’s experience shipping websites. With only one deployment pipeline, concurrency wasn’t a primary concern.

Phase 2: Content Addressing

Content addressing identifies and retrieves data using a hash of its content rather than its location or name. On AWS, you can generate signed URLs to write files with specific sha256 hashes to a designated location. This offered some concurrency advantages but didn’t solve everything.

Phase 3: Database Integration

Databases were brought in to handle concurrency more robustly:

Phase 4: PUT with If-None-Match: *

A game-changer for distributed systems. This ensures objects are only written if they don’t already exist — opening up new dimensions for managing concurrency in object stores.

Where Things Stand Today

A lot has been built on top of these primitives:

Other notable innovations:

This space is just getting started.

References