What Is an AI Tech Stack?
An AI tech stack is the layered combination of tools and services that power an AI product. Think of it like a building: the foundation model is the structure, the retrieval system is the plumbing, the orchestration layer is the electrical, the application interface is the interior, and observability is the maintenance system.
You don't need to build any of these layers yourself. But you need to understand them well enough to make informed decisions about which to buy, which to build, and which tradeoffs matter for your specific product.
The AI tech stack market is overwhelming in 2026 — over 200 providers and tools exist across these layers. The teams shipping successfully aren't evaluating all of them. They're standardizing on a small, interoperable set and moving.
The Five Layers of Every AI Product
Layer 1: Foundation Model (The Reasoning Engine)
The foundation model is the LLM at the center of your product — the component that understands language, generates text, and does the reasoning. This is what people mean when they say "the AI" in most conversations.
The main options in 2026:
| Model | Best For | Context Window | Relative Cost |
|---|---|---|---|
| GPT-4o (OpenAI) | General purpose, tool use, vision | 128K tokens | Medium |
| Claude 3.5 Sonnet (Anthropic) | Long documents, nuanced writing, instruction-following | 200K tokens | Medium |
| Gemini 1.5 Pro (Google) | Multimodal, very long context, Google integrations | 1M tokens | Medium |
| Llama 3 (Meta, open-source) | Privacy-sensitive data, on-premise deployment, cost optimization at scale | Varies | Low (self-hosted) |
The non-technical founder's decision framework:
For most first AI products, start with Claude 3.5 Sonnet or GPT-4o. Both have generous free tiers, reliable APIs, and sufficient capability for 95% of use cases. The decision between them is less important than the decisions in the layers below.
When to reconsider your model choice:
- You need to process documents longer than 20,000 words → Claude's 200K context window becomes relevant
- You're handling sensitive user data that can't leave your infrastructure → Llama 3 self-hosted becomes necessary
- You're building at scale (millions of daily active users) → cost optimization through model selection and caching becomes critical
Layer 2: Retrieval (The Memory System)
The foundation model only knows what it was trained on. Retrieval is how you give it access to specific, current, or private information.
This is the architectural decision with the most downstream implications. It affects your data requirements, your infrastructure costs, your response latency, and what features are technically possible.
The core pattern: Retrieval-Augmented Generation (RAG)
User asks a question → Search database for relevant documents → Pass relevant documents to LLM as context → LLM generates a grounded answer
The "database" in RAG is a vector database — a storage system that converts documents into numerical representations of meaning (embeddings), enabling semantic search. You ask "What's our refund policy for enterprise customers?", and the vector database finds the relevant policy document based on meaning, not just keywords.
The main vector database options:
| Service | Best For | Free Tier | Hosted |
|---|---|---|---|
| Supabase pgvector | Teams already using Supabase, simple use cases | Yes | Yes |
| Pinecone | Scalable production search, dedicated vector operations | Yes (limited) | Yes |
| Weaviate | Open-source flexibility, complex search requirements | Yes (self-hosted) | Both |
| Chroma | Local development, testing, small-scale production | Yes (open-source) | No (self-host) |
The decision for non-technical founders:
If you're already using Supabase (which you should be for most early-stage products), start with Supabase's built-in pgvector extension. It handles retrieval sufficiently for early-stage products without adding another service to manage.
Move to Pinecone when: you have more than a few hundred thousand documents, you need sub-100ms retrieval at high concurrency, or your search requirements are complex enough that general-purpose databases can't meet them.
When you don't need RAG:
- Your AI product only needs general knowledge (a writing tool, a generic chatbot)
- You're building a simple text generation tool where the model's base training is sufficient
- You're validating the core product idea before adding infrastructure complexity
Start without RAG if you can. Add it when you have a specific validated use case that requires access to private or dynamic information.
Layer 3: Orchestration (The Action Layer)
Orchestration is how your AI product connects to external tools, takes actions, and manages multi-step workflows.
A simple AI product (a chatbot that answers questions using RAG) doesn't need a dedicated orchestration layer. You pass the retrieved context and user message to the LLM and return the response.
A complex AI product (an agent that searches the web, writes code, runs tests, and commits to GitHub) needs orchestration: a system that manages the sequence of actions, handles tool calls, manages state between steps, and handles errors.
The main orchestration frameworks:
| Framework | What It Does | When to Use |
|---|---|---|
| LangChain | Chains LLM calls, tools, and data sources | Complex multi-step workflows, many integrations |
| LlamaIndex | Data ingestion and retrieval optimization | RAG-heavy products, complex document processing |
| Direct API calls | Simple, no framework | Single-step or two-step workflows |
| Vercel AI SDK | Next.js integration, streaming | Web applications using React/Next.js |
The non-technical founder's decision:
Most first AI products don't need LangChain. Frameworks add complexity, learning curve, and dependencies. Direct API calls with structured prompts handle 80% of use cases.
Reach for orchestration frameworks when:
- Your workflow has more than 3-4 sequential AI calls
- You need to manage complex tool use (your AI calling external APIs, running code, searching the web)
- You have a production product with complex state management requirements
The pattern that's almost always right for early-stage:
user_message = get_user_input()
relevant_docs = vector_search(user_message) # RAG
context = format_context(relevant_docs)
response = llm_call(context + user_message) # Direct API call
return response
That's it. No framework required for this pattern. Add a framework when this becomes insufficient.
Layer 4: Application Interface (What Users See)
The application interface is the product — the thing users interact with. For most AI products, this is a web application, a chat interface, or an API.
The AI-specific consideration in the application layer: how do you design for probabilistic output?
Traditional software is deterministic — a button click always produces the same result. AI output is probabilistic — the same prompt can produce different results. This requires different UX patterns:
- Loading states that communicate that the AI is reasoning, not loading
- Source citations that let users verify claims (Perplexity's core UX innovation)
- Regeneration options for when the first response isn't useful
- Confidence signaling — communicating when the AI is uncertain
- Feedback mechanisms that let users correct bad outputs
The standard 2026 stack for AI web applications:
- Framework: Next.js (React) — the default for TypeScript web apps with AI
- Hosting: Vercel — zero-configuration deployment, integrates with Next.js
- Auth: Clerk or Supabase Auth — never build authentication from scratch
- Database: Supabase (PostgreSQL + pgvector) — handles both structured data and RAG
- AI UI components: Vercel AI SDK — streaming responses, chat primitives, tool call display
This stack can be deployed in under a week, scales to 100,000 users, and costs under $100/month at early stage.
Layer 5: Observability (How You Know It's Working)
Observability is how you measure whether your AI product is performing as intended and catch regressions before they reach users.
This is the layer most early-stage products skip — and the layer most production AI products break on.
The challenge: AI output quality isn't binary. There's no error log for "the response was mediocre." You need to define and measure quality explicitly.
The minimum viable observability setup:
Logging: Record every prompt, every response, and the latency for each AI call. This is table stakes — you can't debug what you don't log.
Evals: A test dataset of 50-200 representative inputs with known-good outputs. Run this before and after any model changes, prompt changes, or retrieval changes. This is how you know if a change made things better or worse.
User feedback: A thumbs up/down mechanism (or equivalent) that users can use to flag bad responses. This creates a signal for quality regressions that your evals might miss.
Production monitoring: Track the distribution of response quality over time. Are responses getting shorter? More generic? Are there specific query types that consistently underperform?
The tools:
- LangSmith (by LangChain) — traces LLM calls, records inputs/outputs, visualizes pipelines
- Langfuse — open-source alternative, self-hostable
- Helicone — proxy that adds logging, caching, and rate-limiting to any LLM API call
- DeepEval — open-source eval framework for defining and running AI quality tests
The Three AI Stack Patterns for Founders
Across the layers above, three patterns emerge based on product complexity:
Pattern 1: The "Just Works" Stack (Right for Most First Products)
Use when: You're validating the product idea, you have a single core use case, and your team is small.
Foundation: Claude 3.5 Sonnet or GPT-4o API
Retrieval: Supabase pgvector (if needed)
Orchestration: Direct API calls, Vercel AI SDK
Interface: Next.js + Vercel
Auth: Supabase Auth or Clerk
Observability: Basic logging + thumbs up/down feedback
Cost at 10K MAU: $50-$200/month Time to deploy: 1-2 weeks
Pattern 2: The Production Stack (When You're Growing)
Use when: You have validated product-market fit, consistent user load, and data requirements the "just works" stack can't handle.
Foundation: Tiered (GPT-4o for complex queries, GPT-4o-mini for simple ones)
Retrieval: Pinecone for vector search, PostgreSQL for structured data
Orchestration: LangChain or direct API with structured pipeline
Interface: Next.js + Vercel, custom streaming
Auth: Clerk (scale features)
Observability: LangSmith or Langfuse + custom evals
Cost at 100K MAU: $500-$3,000/month Team required: 2-3 engineers
Pattern 3: The Enterprise Stack
Use when: Your customers are enterprises with security requirements, your product handles sensitive data, or you're operating at large scale.
Foundation: Hosted models with BAA agreements, or Llama 3 on your own infrastructure
Retrieval: Private vector database deployment, full data isolation per customer
Orchestration: Custom pipeline with enterprise integrations (SSO, SCIM, audit logs)
Interface: Custom, enterprise design standards
Auth: Enterprise SSO (Okta, Azure AD) via Clerk or Auth0
Observability: Full audit trail, compliance logging
Cost: Highly variable — enterprise contracts are custom Team required: Dedicated infrastructure and security team
What to Research Before Choosing Your Stack
Before making any stack decision, understand how similar products in your category have implemented these layers. The decisions are interdependent — your retrieval strategy affects your orchestration requirements, which affects your infrastructure costs.
HowWorks breaks down the architecture of real AI products — Cursor, Perplexity, Notion AI, and others — at the decision level. Spending 30-60 minutes there before you make stack choices shows you what real teams chose, why they chose it, and what problems emerged at scale.
The most expensive AI infrastructure mistakes come from over-engineering before validation. The most expensive product mistakes come from choosing a stack that can't support the product you'll need to build at 10x scale.
Understanding both is how you avoid both.
The Non-Technical Founder's Checklist
Before your first conversation with an engineer about your AI product's architecture, be able to answer:
- Do I need RAG, or is general model knowledge sufficient for my use case?
- What data does my AI product need access to that isn't in public training data?
- Is my workflow single-step (one LLM call) or multi-step (multiple calls, tool use)?
- What does "good output" mean for my core feature — can I define it in specific behavioral terms?
- How will I know if the AI quality degrades after a model update?
- What are the privacy requirements for my users' data?
These questions don't require an engineering background to answer. They require understanding your product. And they're the questions that separate founders who run productive architecture conversations from founders who delegate all technical judgment to engineers.
Related Reading on HowWorks
- How AI Apps Are Built — How Cursor, Perplexity, Notion AI, and Lovable implement these layers in production
- The Non-Technical Founder's Guide to Product Research — Research workflow for founders before hiring or building
- Before You Vibe Code: Why Research Changes Everything — Architectural research before building with AI tools