How was Notion built? What is Notion's architecture?

Notion is built on a block-based data model where every piece of content — text, images, database rows, pages — is the same type of object called a 'block.' Each block has a UUID, a type, properties, and two kinds of relationships to other blocks. This model is stored in PostgreSQL on Amazon RDS (96 servers, 5 shards each as of 2023). The sync pipeline uses optimistic local apply on the client, server validation via a /saveTransactions endpoint, and real-time updates through WebSocket subscriptions to a system called MessageStore.

What is the Notion block model?

The Notion block model is the data architecture where every piece of content in Notion — paragraphs, headings, images, toggles, database rows, even full pages — is represented as the same type of object: a block. Each block has a UUID identifier, a type, properties (like title or color), and two kinds of relationships: a content pointer (ordered list of child IDs for rendering) and a parent pointer (upward link for permission inheritance). This single abstraction makes nesting, transforming, and reordering content a database operation rather than a UI trick.

What is Notion's tech stack?

Notion's backend uses PostgreSQL on Amazon RDS for block storage, partitioned by workspace ID. Clients use SQLite (native apps) or IndexedDB (web) for local caching. The real-time sync layer uses WebSocket connections. Their API uses custom JSON (not Markdown) to represent rich text, because Markdown can't represent Notion-specific block types like colored text, callouts, or synced blocks.

What is the single most important Notion architecture decision?

The block-based data model. It enables composability, transformation, and nested editing, but it requires a transaction system, a local persistence layer, and a sync pipeline that most 'Notion clone' projects underestimate. Every other architectural decision — the two-pointer system, the transaction queue, the breadth-first API pagination — flows from the decision to make 'everything a block.'

Why is Notion so hard to replicate despite being 'just a document editor'?

Because the difficulty is not in the editor UI. It is in the transaction model, sync pipeline, and data consistency constraints that make nested collaborative editing reliable. Those parts are not visible to users but take years to get right. Specifically: the RecordCache + TransactionQueue local persistence system, the /saveTransactions server validation that handles concurrent edits, and the MessageStore WebSocket subscription system for real-time updates.

How does Notion's real-time sync work?

Notion uses a five-step pipeline: (1) User actions create 'operations' grouped into transactions. (2) The transaction is applied locally immediately (optimistic update). (3) The transaction is persisted in a local TransactionQueue. (4) The transaction is posted to /saveTransactions on the server, which validates and commits it. (5) The server notifies all subscribed clients through MessageStore WebSocket connections, triggering them to call syncRecordValues and update their local RecordCache.

What is the content pointer vs parent pointer distinction in Notion?

Notion uses two different relationships between blocks for two different jobs. Content pointers are ordered lists of child block IDs used to render the nested content tree — what you see on the page. Parent pointers are upward links used only for permission inheritance. They're separate because: (1) scanning all content arrays to find a block's parent would be slow, and (2) historically blocks could appear in multiple content arrays, making permission inheritance via content traversal ambiguous.

Can founders understand Notion's architecture without access to private code?

Yes. Notion published detailed engineering writeups covering the block model, transaction pipeline, RecordCache, TransactionQueue, MessageStore, and their API design rationale. These posts contain internal system names and specific tradeoffs. Additionally, HowWorks provides structured breakdowns of Notion's architecture for product builders who want to understand the decisions before building something similar.

How Notion Was Built: Block Model, Architecture, and Sync Pipeline Explained

How Was Notion Built?

Notion is built on a block-based data model where every piece of content — text, headings, images, database rows, even full pages — is the same type of object called a "block." This single architectural decision shapes everything: how the editor works, how data syncs between users, how permissions cascade through nested content, and why the API looks the way it does. Notion stores these blocks in PostgreSQL on Amazon RDS (96 servers, 5 logical shards as of 2023), with a real-time sync pipeline that keeps edits responsive on unreliable networks.

Most "how Notion was built" explanations describe the product surface: pages, databases, blocks, drag-and-drop. They stop before the interesting part.

The interesting part is why that surface is reliable. Notion's editor works offline. Edits from multiple users merge without visible conflict. Pages with hundreds of blocks load fast. None of that is magic — it is a specific set of architectural decisions that Notion's engineering team published in detail.

This post unpacks those decisions using Notion's own primary sources.

1) The block model (what "everything is a block" actually means)

In Notion's data model, a block is the atomic unit of information. Each block has:

a UUID v4 identifier
a type
properties (for example, a title, color, or toggle state)
relationships to other blocks

Notion's API mirrors this exactly. The official API docs define a "block object" as any piece of content within Notion. Different block types are represented as distinct type objects, and the shape is consistent across all of them.

Here is what the block object looks like in the API (simplified):

{
  "object": "block",
  "id": "c02f…aea7",
  "type": "heading_2",
  "has_children": false,
  "heading_2": {
    "rich_text": [{ "plain_text": "Lacinato kale" }],
    "color": "default"
  }
}

Two consequences of this model matter for anyone building something similar:

Your editor becomes a tree editor, not a linear document editor. Nesting, reordering, and transforming content are all data-model operations — not UI tricks.

Most UX features are implemented as operations on blocks. Dragging a block to nest it, using "Turn into" to change its type, syncing blocks between pages — these are database operations. Building them requires thinking about your data model first, not your interface.

2) Content pointers vs parent pointers (render tree vs permissions)

Notion uses two different relationships between blocks for two different jobs:

Content (ordered set of child block IDs): used to render nested content — for example, blocks inside a toggle.
Parent (upward pointer): used for permissions inheritance.

Why split these? Notion's engineering post explains two reasons:

Historically, blocks could be referenced by multiple content arrays, which makes permission inheritance ambiguous — one block might logically "belong" in two places.
Walking "up" the tree by scanning all content arrays would be inefficient, especially on the client side where speed matters.

This is why indenting a block in Notion is a structural operation, not a visual one. You are moving the block into another block's content list — changing the render tree.

3) The life of a block: from keypress to collaborator

Notion's engineering post on the data model is unusually concrete about the full pipeline. Here is the end-to-end flow, using their own terminology.

3.1 Client-side: operations and transactions

When you type or drag in the UI, Notion expresses changes as operations that create or update records. Operations are grouped into transactions, committed or rejected as a group by the server.

Example: pressing Enter in a to-do list triggers three operations in a single transaction:

Create a new block with a fresh UUID and initial attributes
Insert that block's ID into the parent's content list at the right position
Apply the transaction locally so the UI updates immediately — before server confirmation

This optimistic local apply is why Notion feels fast even on slow connections.

3.2 Local persistence: RecordCache + TransactionQueue

On native apps, Notion caches records in an LRU cache backed by SQLite or IndexedDB called RecordCache. Transactions are persisted in TransactionQueue (also backed by IndexedDB or SQLite) until the server confirms or rejects them.

This is the mechanism behind offline editing: the UI is driven by local state, and the network layer is responsible for eventual consistency. If you lose your connection mid-edit, your changes are queued and sent when connectivity returns.

3.3 Server-side: saveTransactions validation and commit

The client serializes the transaction to JSON and posts it to an internal endpoint Notion calls "/saveTransactions".

The server then:

Loads the blocks and parents involved in the transaction
Applies the operations to produce "before" and "after" state
Validates permissions and data coherency
Commits created and modified records to the source-of-truth databases

The explicit "before/after" validation step is where conflict detection happens. Two clients editing the same block simultaneously will both attempt to commit — the server resolves this by validating against the actual current state, not the client's assumed state.

3.4 Real-time updates: MessageStore + WebSocket subscriptions

Clients maintain a persistent WebSocket connection to a real-time updates service called MessageStore.

When a client renders a record, it subscribes to that record's updates. When the server commits a change, MessageStore notifies all subscribed clients. Those clients call an API Notion calls "syncRecordValues", update their local RecordCache, and re-render.

This publish-subscribe model means updates propagate to all connected users without polling — and because each client caches records locally, re-rendering is fast.

3.5 Loading a page: loadPageChunk

When you open a page, the client first tries local data. If data is missing or stale, it calls an internal method Notion names "loadPageChunk", which descends from a starting block down the content tree and returns the blocks needed to render, plus their dependent records.

This explains why very large Notion pages can be slower to load: the content tree can be arbitrarily deep, and loadPageChunk has to chase all those dependencies before the page can render.

4) What the public Notion API reveals (and why it matters)

Notion launched its public API in public beta on May 13, 2021, and reached general availability on March 2, 2022.

The engineering post "Creating the Notion API" is useful precisely because it shows the constraints imposed by the block model on API design — constraints that are not obvious until you try to build the API yourself.

Why Notion's API uses custom JSON instead of Markdown

Notion considered Markdown for its portability but found it could not represent Notion's rich content: colored text, equations, callouts, toggle blocks, inline mentions, and more. They chose a custom JSON representation. The result is an API that is more expressive but more verbose than you might expect.

Why block hierarchies are paginated breadth-first

A page is an arbitrarily deep tree of blocks. Notion chose breadth-first pagination: return the top-level blocks first, require additional requests to fetch children. This was a performance choice — returning the whole tree in one request for a large page would be prohibitively slow.

Once you understand this, Notion integrations make sense: you cannot assume "one request returns the whole page" unless the page is small.

Why Notion uses global versioning by date

Rather than per-resource version numbers, Notion uses global versions tagged by date (similar to Stripe and AWS). This signals that block types and property formats are expected to evolve over time — and that the API team prefers a single coordinated versioning surface over managing many concurrent resource versions.

5) What to actually learn from Notion (if you are building something similar)

The common mistake in Notion-inspired projects is to start with the UI: the drag-and-drop, the slash commands, the block type picker. Those parts are relatively straightforward to build. What kills these projects is what they underestimated.

The hard parts of a Notion-like product are:

The transaction model: once you have multiple users, you need a clear contract for what an "operation" is, how transactions are validated, and how conflicts are resolved.
The sync pipeline: how does the UI stay responsive while syncing is in progress? Where does local state live? What happens to in-flight transactions on reconnect?
The data hierarchy: when content and permissions need to be decoupled (and they will), you need to have designed that separation from the start.

If you are building a Notion-like product, the single most useful design exercise is not wireframing the UI — it is writing down your data model and your transaction contract before you write the first line of editor code.

6) If you want to research Notion-like products on GitHub

You do not need Notion's private codebase to learn from its architecture. Analyze open-source block editors and collaborative editors instead.

When evaluating them, look specifically for:

The block schema: how is a block defined? What fields does it have? What determines its type?
The operation model: how are user actions expressed? Is there an op log, a transaction queue, a CRDT?
The nesting model: how are children represented? Content list, parent pointer, or both?
The sync mechanism: WebSocket, polling, checkpoints? What happens on reconnect?
The hard problems in the issue tracker: the GitHub issues for any Notion-style editor will tell you exactly where the difficult edge cases live.

If you want to start a research topic on Notion-like architecture, you can use HowWorks with:

How Top Tech Products Are Built: A Guide for Non-Developers — Research methodology for studying any product's architecture using primary sources
How to Build an App Like Linear: Scope, Stack, and Tradeoffs — How to apply architecture research to real build decisions
The AI Tech Stack Explained for Non-Technical Founders — The five-layer framework for understanding any AI product's infrastructure
Before You Vibe Code: Why Research Changes Everything — How to use architecture research before your first AI coding prompt

Sources

Notion engineering: The data model behind Notion's flexibility (blocks, render tree, RecordCache, TransactionQueue, /saveTransactions, MessageStore, loadPageChunk): Notion blog
Notion engineering: Creating the Notion API (custom JSON rich text, breadth-first pagination, global versioning): Notion blog
Notion API reference: Block object: Notion developers
Notion release: May 13, 2021 API public beta: Notion releases
Notion blog: March 2, 2022 API GA: Notion blog

How Notion Was Built: Block Model, Architecture, and Sync Pipeline Explained

Key takeaways

Decision checklist

How Was Notion Built?

1) The block model (what "everything is a block" actually means)

2) Content pointers vs parent pointers (render tree vs permissions)

3) The life of a block: from keypress to collaborator

3.1 Client-side: operations and transactions

3.2 Local persistence: RecordCache + TransactionQueue

3.3 Server-side: saveTransactions validation and commit

3.4 Real-time updates: MessageStore + WebSocket subscriptions

3.5 Loading a page: loadPageChunk

4) What the public Notion API reveals (and why it matters)

5) What to actually learn from Notion (if you are building something similar)

6) If you want to research Notion-like products on GitHub

Sources

FAQ

How was Notion built? What is Notion's architecture?

What is the Notion block model?

What is Notion's tech stack?

What is the single most important Notion architecture decision?

Why is Notion so hard to replicate despite being 'just a document editor'?

How does Notion's real-time sync work?

What is the content pointer vs parent pointer distinction in Notion?

Can founders understand Notion's architecture without access to private code?

Key takeaways

Decision checklist

How Was Notion Built?

1) The block model (what "everything is a block" actually means)

2) Content pointers vs parent pointers (render tree vs permissions)

3) The life of a block: from keypress to collaborator

3.1 Client-side: operations and transactions

3.2 Local persistence: RecordCache + TransactionQueue

3.3 Server-side: saveTransactions validation and commit

3.4 Real-time updates: MessageStore + WebSocket subscriptions

3.5 Loading a page: loadPageChunk

4) What the public Notion API reveals (and why it matters)

5) What to actually learn from Notion (if you are building something similar)

6) If you want to research Notion-like products on GitHub

Related Reading on HowWorks

Sources

FAQ

How was Notion built? What is Notion's architecture?

What is the Notion block model?

What is Notion's tech stack?

What is the single most important Notion architecture decision?

Why is Notion so hard to replicate despite being 'just a document editor'?

How does Notion's real-time sync work?

What is the content pointer vs parent pointer distinction in Notion?

Can founders understand Notion's architecture without access to private code?