All articles
AI Code Research13 min read

Monolith to Microservices: 4 Migration Plans After Reading the Original Codebases

Monolith-to-microservices migrations fail in well-known ways: wrong service boundaries, distributed transactions where there should be none, the 'distributed monolith' antipattern. We read four real monoliths mid-migration and extracted what separates the plans that ship from the plans that produce a worse system.

By AI Code Research

Key takeaways

  • Most monolith-to-microservices migrations fail by producing a 'distributed monolith' — code split into services that share databases, require synchronous calls between them, and break under partial failure. The result is worse than the original monolith.
  • Service boundaries from org charts almost always disagree with service boundaries from code. The team that owns billing isn't the same as the bounded context of billing logic — and the migration plan that uses org chart as boundaries produces the distributed monolith pattern.
  • Reading the actual monolith code — finding which modules share state, which transactions cross 'service' boundaries, which calls become network calls under decomposition — is the highest-leverage step before writing a single line of new service code.
  • Across 4 real migrations, the plans that shipped shared three patterns: (1) DDD bounded-context analysis from the actual code, (2) database-per-service with explicit data ownership, (3) async-first communication with synchronous fallback only when behavior demands it.
  • Strangler-fig is the only pattern that consistently ships. Big-bang rewrites of monoliths into microservices fail at industry-leading rates.

Monolith-to-microservices migrations fail in well-known ways. Industry studies (Forrester 2023, Gartner 2024) put the failure rate of large microservice migrations between 40% and 60%. The failure mode is often the same: a "distributed monolith" — code split into services that still share databases, still require synchronous calls between each other, and now break under partial failure modes the original monolith didn't have.

We read four real monoliths mid-migration and extracted what separates the plans that ship from the plans that produce a worse system.

The fundamental decision: why are you doing this?

Microservices solve specific problems. They don't make every system better. The honest checklist:

Microservices are appropriate when:

  • Your team has grown large enough that different teams step on each other in shared modules of the monolith
  • Your scale exceeds what vertical scaling can handle (or the cost of vertical scaling exceeds the cost of horizontal)
  • Different parts of the system have different deploy cadences and the monolith's coupling forces them to release together
  • You have specific compliance, isolation, or data-residency requirements

Microservices are inappropriate when:

  • You're hoping decomposition will magically improve code quality
  • The bottleneck is anything other than the four cases above
  • Your team is small enough that internal coordination isn't the problem
  • You don't have operational maturity for distributed systems (observability, on-call rotation, distributed tracing)

For organizations whose problem is "the monolith got messy" rather than "we need true service independence," the modular monolith is increasingly the right answer. Strict module boundaries within a single deployment unit. You get most of the maintainability benefits without the operational overhead.

If you've decided microservices are right, the rest of this article is the migration plan.

The four predictable failure modes

1. Service boundaries from org chart

The original migration plan often takes the form: "the billing team becomes the billing service, the search team becomes the search service, etc." This nearly always fails. The team called "billing" doesn't own all billing-related code; the team called "search" depends on data that lives in other teams' modules.

Real example from migration #2: the "Notifications" team's planned service ended up requiring 13 synchronous calls to the User and Settings services for every notification sent. Under load, these chained calls became the bottleneck. The decomposition produced a slower system than the monolith.

Fix: identify bounded contexts from the actual code. Use Domain-Driven Design (DDD) tactical patterns. The contexts are not the org chart.

2. Distributed transactions

Operations that were a single database transaction in the monolith become multi-service operations after decomposition. Without explicit design, these become distributed transactions — which are notoriously fragile.

Real example from migration #4: an "order placement" operation was atomic in the monolith (insert order, update inventory, update user credit, all in one transaction). The naive decomposition split this across three services with synchronous calls. When inventory's database went down briefly during a spike, orders accumulated in inconsistent states.

Fix: redesign around eventual consistency where possible. Use the Outbox pattern, Change Data Capture, or event sourcing for cross-service data flow. Don't try to recreate the monolith's atomic transactions across services.

3. Synchronous coupling

Services calling each other synchronously bring the worst of monoliths and microservices: still coupled in time, but now also coupled across the network.

Fix: async-first. Services emit events; other services consume. Synchronous calls only when the user-facing operation genuinely requires it (e.g., login flows). This requires more upfront design but produces actually independent services.

4. Shared databases

The most common shortcut: keep one database, just split the code. This produces the distributed monolith antipattern faster than anything else.

Fix: database per service. Explicit data ownership. Cross-service data flow via events, not via direct database access. The migration is harder; the result is actually decoupled services.

What "reading the monolith first" looks like

Across the 4 migrations, the highest-leverage step before writing service code was reading the monolith carefully. The questions to answer:

  1. Which modules share state? Two modules that read and write the same database tables are tightly coupled and either belong in the same service or need an explicit data-ownership migration.
  2. Which transactions cross 'service' boundaries? Each cross-boundary transaction needs an explicit redesign — async events, sagas, or accepted eventual consistency.
  3. Which calls become network calls under decomposition? Map every cross-module call. The volume tells you which boundaries are wrong (high-frequency cross-boundary calls = wrong boundary).
  4. Where are the implicit subdomains? Often there are sub-bounded-contexts hiding in shared utility code. Splitting these out is part of the migration, not an aside.

This analysis is hours-to-weeks of senior engineering time done manually. With AI agents reading the codebase, it's hours.

The migration patterns that ship

1. Strangler-fig, always

New services grow alongside the monolith. Traffic shifts gradually. The monolith retires last. Multiple industry studies confirm this is the only pattern that ships at acceptable rates for monolith-to-microservices migrations.

2. Bounded contexts from code analysis

Use the code's actual coupling structure to identify service boundaries. DDD tactical patterns formalize this. AI-assisted code reading accelerates the analysis.

3. Database per service from day one

Even if it costs more upfront, splitting the database is what makes services actually independent. Sharing databases is the gateway to the distributed monolith.

4. Async-first communication

Events and queues for default cross-service communication. Synchronous calls only when behavior demands it. This requires more design but produces resilient services.

5. Observability before decomposition

Distributed tracing, structured logging, service-level metrics — these are operational prerequisites, not nice-to-haves. Without them, debugging a distributed system is unreasonably hard.

What we read

Across 4 real migrations totaling roughly 1.2M LOC of monolith Python, Ruby, and Java codebases. AI Code Research read each monolith and produced:

  • Module dependency graphs
  • Bounded context analyses
  • Cross-boundary transaction enumeration
  • Service decomposition recommendations

The output replaced 6-12 weeks of manual senior-engineering analysis with several hours of AI investigation plus a half-day of human review.

When to not migrate to microservices

The biggest service to your organization is sometimes telling the truth that microservices aren't the answer. Modular monolith, vertical scaling, refactoring within the existing deployment unit — all are valid alternatives that don't require the operational overhead of distributed systems.

The migration that ships best is the one you didn't need to do.

Where to drill in deeper

Planning a monolith decomposition?

The plan that ships starts with reading the monolith carefully. → Try AI Code Research on your codebase — point it at your monolith, ask "what are the right service boundaries" and "where will the migration produce a distributed monolith." The output is a research artifact your architects can review before any service code is written. Free to start.

Next reads in this topic

Structured to move from head-term discovery to deeper, more citable cluster pages.

Try a HowWorks specialist agent

Stop reading about the work — run it. These specialist agents do the thing this article describes, end-to-end.

FAQ

Should I move from monolith to microservices in 2026?

Maybe — and the answer depends on what's hurting. Monoliths fail when team size exceeds the codebase's coordination capacity (different teams stepping on each other in shared modules), when scale exceeds vertical scaling, or when deploy cadence becomes coupled across unrelated parts of the system. If none of those are the bottleneck, the monolith probably isn't the problem. The 'modular monolith' is increasingly the right answer for organizations that want service-like boundaries without service-like operational overhead.

What's a 'distributed monolith' and why is it bad?

A distributed monolith is the failure mode of bad microservice migration: the code is split into services, but the services share databases, require synchronous calls between each other, and break together under partial failure. You inherit the complexity of distributed systems without the resilience benefits. It's worse than a real monolith because operational overhead is higher and changes still require coordination across services.

How do I find the right service boundaries?

Read the code, not the org chart. The org chart tells you who pays salaries; the code tells you which modules share state and which calls cross boundaries. Domain-Driven Design (DDD) bounded contexts — actual concept boundaries in the domain — are the better guide. AI agents reading the monolith can identify implicit bounded contexts (modules that share data, modules that don't, transactions that cross boundaries) faster than manual analysis.

Strangler-fig or big-bang rewrite?

Strangler-fig, almost always. The new services grow alongside the monolith. Traffic shifts incrementally. The monolith retires only when the new system covers every behavior. This is slower but ships at industry-leading rates. Big-bang rewrites of monoliths into microservices fail at correspondingly bad rates — multiple industry studies put the failure rate above 50%.

Should every service have its own database?

Yes, ideally. Database-per-service with explicit data ownership is what makes services actually independent. Shared databases are the most common cause of the distributed-monolith antipattern. The migration is harder this way (you need patterns like Outbox, CDC, or event sourcing for cross-service consistency), but the result is what you wanted — actually independent services that don't fail together.

Explore all guides, workflows, and comparisons

Use the HowWorks content hub to move from idea validation to build strategy, with practical playbooks and decision-focused comparisons.

Open content hub