Monolith-to-microservices migrations fail in well-known ways. Industry studies (Forrester 2023, Gartner 2024) put the failure rate of large microservice migrations between 40% and 60%. The failure mode is often the same: a "distributed monolith" — code split into services that still share databases, still require synchronous calls between each other, and now break under partial failure modes the original monolith didn't have.
We read four real monoliths mid-migration and extracted what separates the plans that ship from the plans that produce a worse system.
The fundamental decision: why are you doing this?
Microservices solve specific problems. They don't make every system better. The honest checklist:
Microservices are appropriate when:
- Your team has grown large enough that different teams step on each other in shared modules of the monolith
- Your scale exceeds what vertical scaling can handle (or the cost of vertical scaling exceeds the cost of horizontal)
- Different parts of the system have different deploy cadences and the monolith's coupling forces them to release together
- You have specific compliance, isolation, or data-residency requirements
Microservices are inappropriate when:
- You're hoping decomposition will magically improve code quality
- The bottleneck is anything other than the four cases above
- Your team is small enough that internal coordination isn't the problem
- You don't have operational maturity for distributed systems (observability, on-call rotation, distributed tracing)
For organizations whose problem is "the monolith got messy" rather than "we need true service independence," the modular monolith is increasingly the right answer. Strict module boundaries within a single deployment unit. You get most of the maintainability benefits without the operational overhead.
If you've decided microservices are right, the rest of this article is the migration plan.
The four predictable failure modes
1. Service boundaries from org chart
The original migration plan often takes the form: "the billing team becomes the billing service, the search team becomes the search service, etc." This nearly always fails. The team called "billing" doesn't own all billing-related code; the team called "search" depends on data that lives in other teams' modules.
Real example from migration #2: the "Notifications" team's planned service ended up requiring 13 synchronous calls to the User and Settings services for every notification sent. Under load, these chained calls became the bottleneck. The decomposition produced a slower system than the monolith.
Fix: identify bounded contexts from the actual code. Use Domain-Driven Design (DDD) tactical patterns. The contexts are not the org chart.
2. Distributed transactions
Operations that were a single database transaction in the monolith become multi-service operations after decomposition. Without explicit design, these become distributed transactions — which are notoriously fragile.
Real example from migration #4: an "order placement" operation was atomic in the monolith (insert order, update inventory, update user credit, all in one transaction). The naive decomposition split this across three services with synchronous calls. When inventory's database went down briefly during a spike, orders accumulated in inconsistent states.
Fix: redesign around eventual consistency where possible. Use the Outbox pattern, Change Data Capture, or event sourcing for cross-service data flow. Don't try to recreate the monolith's atomic transactions across services.
3. Synchronous coupling
Services calling each other synchronously bring the worst of monoliths and microservices: still coupled in time, but now also coupled across the network.
Fix: async-first. Services emit events; other services consume. Synchronous calls only when the user-facing operation genuinely requires it (e.g., login flows). This requires more upfront design but produces actually independent services.
4. Shared databases
The most common shortcut: keep one database, just split the code. This produces the distributed monolith antipattern faster than anything else.
Fix: database per service. Explicit data ownership. Cross-service data flow via events, not via direct database access. The migration is harder; the result is actually decoupled services.
What "reading the monolith first" looks like
Across the 4 migrations, the highest-leverage step before writing service code was reading the monolith carefully. The questions to answer:
- Which modules share state? Two modules that read and write the same database tables are tightly coupled and either belong in the same service or need an explicit data-ownership migration.
- Which transactions cross 'service' boundaries? Each cross-boundary transaction needs an explicit redesign — async events, sagas, or accepted eventual consistency.
- Which calls become network calls under decomposition? Map every cross-module call. The volume tells you which boundaries are wrong (high-frequency cross-boundary calls = wrong boundary).
- Where are the implicit subdomains? Often there are sub-bounded-contexts hiding in shared utility code. Splitting these out is part of the migration, not an aside.
This analysis is hours-to-weeks of senior engineering time done manually. With AI agents reading the codebase, it's hours.
The migration patterns that ship
1. Strangler-fig, always
New services grow alongside the monolith. Traffic shifts gradually. The monolith retires last. Multiple industry studies confirm this is the only pattern that ships at acceptable rates for monolith-to-microservices migrations.
2. Bounded contexts from code analysis
Use the code's actual coupling structure to identify service boundaries. DDD tactical patterns formalize this. AI-assisted code reading accelerates the analysis.
3. Database per service from day one
Even if it costs more upfront, splitting the database is what makes services actually independent. Sharing databases is the gateway to the distributed monolith.
4. Async-first communication
Events and queues for default cross-service communication. Synchronous calls only when behavior demands it. This requires more design but produces resilient services.
5. Observability before decomposition
Distributed tracing, structured logging, service-level metrics — these are operational prerequisites, not nice-to-haves. Without them, debugging a distributed system is unreasonably hard.
What we read
Across 4 real migrations totaling roughly 1.2M LOC of monolith Python, Ruby, and Java codebases. AI Code Research read each monolith and produced:
- Module dependency graphs
- Bounded context analyses
- Cross-boundary transaction enumeration
- Service decomposition recommendations
The output replaced 6-12 weeks of manual senior-engineering analysis with several hours of AI investigation plus a half-day of human review.
When to not migrate to microservices
The biggest service to your organization is sometimes telling the truth that microservices aren't the answer. Modular monolith, vertical scaling, refactoring within the existing deployment unit — all are valid alternatives that don't require the operational overhead of distributed systems.
The migration that ships best is the one you didn't need to do.
Where to drill in deeper
- Legacy Code Modernization — broader modernization patterns
- We Read 5 JS-to-TS Migrations — smaller-scale migration patterns
- What Is AI Code Research? — the agent that read these monoliths
Planning a monolith decomposition?
The plan that ships starts with reading the monolith carefully. → Try AI Code Research on your codebase — point it at your monolith, ask "what are the right service boundaries" and "where will the migration produce a distributed monolith." The output is a research artifact your architects can review before any service code is written. Free to start.