The AI Bubble Debate: What Will Actually Survive?

Everyone’s talking about the AI bubble bursting. Here’s what will actually survive.

Prasanna Hariram

10/16/20255 min read

Context Setting:

Everyone’s talking about the AI bubble bursting. Here’s what will actually survive. If you squint a little, 2024 feels a lot like 1999: dazzling demos, breathless headlines, and a long tail of startups that look suspiciously like wrappers around someone else’s core tech. The dot-com era didn’t end the internet; it ended the idea that a homepage alone was a business. What endured were boringly powerful primitives (payments, logistics, search), companies that cared about unit economics, and tight loops between technology and real customer value. AI is on the same trajectory. “AI for AI’s sake” will fade. Practical, business-value-driven AI—the stuff that ships, saves money, and compounds—will survive.

Problem-Solution Framework:

  • The context: Foundation models gave us an incredible universal prior over text, code, and media. But enterprises don’t buy priors; they buy outcomes. In production, the physics show up: latency, p95 reliability, cost-per-query, data rights, safety, compliance, and good old integration plumbing.

  • The core challenges:

    • ROI over vibes: Pretty demos don’t pay salaries; measurable productivity, deflection, and revenue lift do.

    • Inference economics: FLOPs, memory bandwidth, batching, and power budgets dominate deployment. “Biggest model” is not a strategy.

    • Grounding and governance: Raw model answers hallucinate; enterprises need citations, permissions, and auditability.

    • Agents with constraints: Open-ended autonomy collides with safety and QA. Narrow, workflow-embedded agents with humans-in-the-loop work.

    • Data rights and trust: “Scrape now, apologize later” doesn’t scale legally or reputationally. Rights-cleared data is a moat.

    • Regulation and provenance: EU AI Act, watermarking, evals, and audit trails are becoming buying criteria.

    • Power is the new platform: GPU supply, energy, and cooling cap what you can run. Efficiency wins.

    • Stack choices: Open-source, right-sized models and edge inference change the cost curve; lock-in becomes a strategic risk.

    • Market gravity: Incumbent suites will absorb winning features; undifferentiated wrappers face consolidation.

  • The solution thesis: Build narrow, workflow-native AI that moves a business KPI; choose the smallest reliable model; ground it with retrieval and permissions; add constrained agentic steps; respect data rights; design for compliance; optimize inference like your margins depend on it (they do); use open models and edge where it makes sense; assume platform consolidation and plan to plug in.

Step by Step Approach:

1. Start with a KPI, not a model

  • Pick a single, valuable metric you can measure weekly: developer throughput (+20–30% tasks completed), support deflection rate (+15–40%), sales ops cycle time (-25%), first-contact resolution (+10%), or average handle time (-20%).

  • Write the acceptance test before the model: what is “good” at p50 and p95? What’s the cost ceiling per event?

  1. Embed AI where work already happens (copilots > chatbots)

  • Ship inside IDEs, CRM, ticketing, and docs; autocomplete and next-best actions beat new tabs.

  • Pattern we see scaling: developer copilots that reduce keystrokes and context switches; support copilots that surface cited answers; sales ops assistants that draft, summarize, and file with CRM hygiene.

  1. Choose the smallest model that meets the spec

  • Start with mature open models (Llama, Mistral) and strong domain fine-tunes. Distill where possible.

  • Track cost-per-query and latency distribution (p50/p95/p99), not leaderboard deltas. If a 7B quantized model with good retrieval passes your evals, ship it.

  • Use mixture-of-experts or routing: specialized models for code, support, or sales content often beat a single giant generalist.

  1. Make RAG the default, not an afterthought

  • Retrieval-augmented generation with governed, audited data is the enterprise baseline.

  • Implement: high-recall indexing (hybrid lexical + vector), chunking tuned to task, metadata filters, and permission-aware retrieval.

  • Require citations and link-backs in every answer. Fail closed if grounding is weak. Cache verified answers.

  1. Constrain agents; keep a human in the loop

  • Build narrow agents that call deterministic tools (search, CRUD, templated forms, code mod), with clear preconditions and postconditions.

  • Add guardrails: allowed actions, budget caps, reversible operations, and sandboxed environments.

  • Human-in-the-loop at escalation points; capture feedback as supervised fine-tune data. Open-ended “autonomy” can wait.

  1. Treat data rights and quality as a moat

  • Use rights-cleared training corpora and explicit licenses (news, forums, image/audio). Maintain a data lineage register.

  • Build synthetic data pipelines where safe: self-play, augmentation, and counterfactuals to cover long tails—label and segregate clearly.

  • Automatic PII detection/redaction and policy-based access control at both retrieval and generation.

  1. Evaluate like a system, not a model

  • Curate golden sets: task-specific prompts, expected outputs, and edge cases. Measure groundedness, factuality (with citations), safety, and bias.

  • Online A/B with guardrails: monitor business KPI, CSAT, and intervention rates. Track regressions with model/version fingerprints.

  • Log everything: prompts, retrieved docs, decisions, latency, costs, user feedback. Build the audit trail you’ll wish you had.

  1. Win on inference economics

  • Quantize aggressively (e.g., 4-bit where acceptable), batch requests, cache KV states, and use speculative/streaming decoding for UX.

  • Choose inference runtimes optimized for throughput (vLLM, TensorRT-LLM, TGI). Profile kernels; watch memory bandwidth and attention hotspots.

  • Track a real P&L: $/1k tokens, $/action, joules/query, and GPU hours. Kill workloads that miss the margin target.

  1. Be regulation-ready by design

  • Map your use case to the EU AI Act risk categories; maintain a model card, risk register, and safety evals.

  • Add provenance: content authenticity signals, watermarking where applicable, and verifiable citation chains.

  • Provide controls: user consent flows, data deletion, opt-outs, rate limits, and escalation paths. Opaque systems get filtered out in procurement.

  1. Choose open-source and right-sized to avoid lock-in

  • Prefer portable models you can fine-tune and host. Keep adapters/LoRA for domain layers.

  • Abstract your orchestration (prompting, tools, retrieval) behind interfaces so swapping models is a configuration, not a rewrite.

  • Use cloud where it adds value; keep the option to move on-prem or onto alternative clouds as costs and policies shift.

  1. Push to edge/on-device when it helps privacy and cost

  • Offload classification, reranking, summarization, and draft generation to NPUs on phones/PCs where feasible (Apple on-device features, Copilot-class PCs).

  • Hybrid patterns: device does first pass + retrieval hints; server finalizes with grounding/audit. Reduce cloud tokens, improve latency, and respect data locality.

  1. Assume platform consolidation; design to be absorbed or indispensable

  • Incumbents (Microsoft, Google, Salesforce, ServiceNow, Adobe) will absorb winning features. Integrate with their ecosystems and app stores.

  • Your defensibility: proprietary eval loops, outcomes data, rights-cleared corpora, and deep workflow integration—things that don’t clone easily.

  • Be M&A-ready: clean data contracts, clear IP, reproducible infra, and metrics that show durable ROI.

What to stop doing

  • Generic chatbots with no data grounding and no KPI.

  • Chasing state-of-the-art benchmarks that don’t move your business metric.

  • Training or fine-tuning on dubious data; “scrape now, apologize later” invites lawsuits and brand damage.

  • Shipping open-ended autonomous agents into production without guardrails or QA.

Closing Thoughts:

The dot-com crash didn’t kill the internet; it cleared the noise so the useful primitives could compound. We’re heading for the same filtration in AI. What survives will look almost disappointingly practical: copilots that measurably boost output; RAG systems with citations and permissions; agentic workflows that are narrow, audited, and reversible; models that are small, fast, and cheap; data that is licensed and traced; deployments that respect power, privacy, and regulation; and stacks that are open and portable. The hype objects will fade. The value objects—those that win on ROI, inference economics, and trust—will quietly take over everything. Build for that world and you won’t care whether someone calls it a bubble. You’ll be too busy shipping.