Turn a 47-page sales deck into a defensible risk verdict — in the time it takes to read the cover page.
This is how an AI-powered deal assessment platform ingests a sales artifact, reads every slide, applies your risk policy, and produces dashboard intelligence every stakeholder can trust — with evidence, not opinions. We walk through a real deck: the ACME — SNCF Voyageurs BAFO V2 review.
Start with the story
The problem, the promise, and a reviewer's Tuesday morning with the platform.
Skip to the architecture
The stack, the data model, the extraction pipeline, and the API surface.
Deal risk review is slow, inconsistent, and impossible to audit.
Every major deal now goes through a Bid Risk Management (BRM) review — and the larger the deal, the longer the deck. What reviewers do with those decks today is broadly the same across the industry, and it's broken in broadly the same ways.
Per reviewer, per deck
A BRM analyst reads the deck once, builds a side-spreadsheet of risks, chases the bid team for missing numbers, then drafts a memo. Bigger decks — 50+ slides with financial annexes — routinely take a full day before a verdict lands.
Round-trips with the bid team
Reviewers don't catch every risk on the first read. Penalty caps, cash exposure peaks, sub-contractor capacity, data-residency exposure — each lives on a different slide, in a different format, and triggers a separate clarification email.
Citations in the final memo
The GO/NO-GO recommendation reaches the committee as prose. When Legal or Treasury later asks “where did the 4.8 M€ cash exposure figure come from?”, the answer is “I'll find it” — and finding it means re-opening the 47-page deck.
Four steps. Every one of them traceable.
At a high level, the platform does what a team of analysts would do if they could read every page simultaneously, apply the same rubric every time, and never lose track of which slide a finding came from.
Receive
A reviewer — or the bid team directly — uploads the BRM deck. The file is logged, versioned, and validated. Nothing is lost if the document changes between V1 and BAFO.
SecondsRead
The platform reads every slide, table, and chart. Scanned pages are OCR'd; layouts are preserved so a finding on slide 34 knows it came from slide 34 — and from the risk table, not the footer.
~60 secondsInterpret
Twelve specialist extractors — delivery, legal, financial, cyber, compliance — each pull the risks they understand. Your risk office's policy then scores, flags, and escalates according to rules you control.
~3 minutesDeliver
A live dashboard with a risk score, categorized cards, and one-click traceability to the exact sentence on the exact slide. Legal and Treasury see the same view — no emailed memos.
InstantOne reviewer. One deck. One hour to decision.
Here is what the SNCF review actually looks like, timestamp by timestamp, for a BRM analyst using the platform for the first time this quarter.
Aude — BRM Reviewer, Southern & Central Europe SBU
The SNCF deck, now queryable intelligence.
This is what opens on Aude's screen at 09:08. Every card is backed by a cited finding; every figure links to the exact location in the source deck. Hover, click, share.
Elevated — 6 rules fired, 3 High-impact risks uncontingent
Delivery and Commercial are the primary drivers. Penalty-cap absence plus GenAI dependency on client readiness contribute 40% of the total index. Morocco scenario adds geographic risk, offset by mitigating GOP expansion.
What Aude does next: opens any card, reads the source slide, forwards to the relevant owner with one click. What she doesn't do: build a spreadsheet, chase the bid team, re-read the deck cover-to-cover.
The platform's job is not to replace your judgment. It's to defend it.
Every figure, flag, and excerpt on the dashboard is evidence — captured with the precision of a footnote and the speed of a search engine. Three principles enforce this, and they're structural, not aspirational.
Every insight has a citation — no exceptions.
A risk cannot appear on the dashboard without a record pointing to the exact slide and bounding box it came from. This is enforced at the database level, not by convention. If the platform can't cite it, the platform doesn't show it.
The reasoning is replayable.
Each risk carries a full chain: which AI model read it, which prompt version, which rules fired, which version of your policy applied. Six months from now, when Legal asks how you reached a 2026 decision, the platform will recreate the exact view the reviewer saw.
The reviewer is always in the loop.
Scores are recommendations, not verdicts. A reviewer can accept, modify, or reject any flag, and their override is captured with a reason code — feeding back into policy refinement. The platform learns from the humans it serves.
risk.id = e8a4…score = 18.0rag = redimpact = Highcitation.id = 3f1c…extracted_by = claude–sonnetconfidence = 0.94rules_fired = R−DEL−011page = 34section = risk.deliverybbox = [141,302,1782,346]text = “GenAI tolling not ready…”What happens when the platform is wrong?
Large-language-model outputs are probabilistic, and the platform treats them that way. Every extraction is stored with a confidence score; anything below threshold is routed to human review rather than auto-scored. Reviewer overrides are captured and fed back into model evaluation — disagreements between human and model are a signal, not a problem.
When the rule set or the extractor logic changes, historical deals are not silently re-scored. Each projection is versioned; the dashboard always shows “scored on policy v3.7, extractor v12” and offers a “preview under new policy” toggle. You keep the audit record of what was decided, when, and under what rubric.
Your risk office owns the rules. Not your engineering team.
The AI extracts. The policy decides. That separation is deliberate — because what counts as “high risk” is a business judgment that should be made by the people accountable for it, and changed whenever appetite changes, without waiting for a software release.
Rules are data, not code.
Every rule is a plain-language statement with a predicate (what to look for), an action (what to do), and a delta (how much it moves the score). The risk office authors, reviews, and deploys rules through an admin interface — no engineering ticket required.
When the Head of BRM decides that public-sector deals over 300 M€ require treasury sign-off regardless of cash profile, the rule goes live for the next deal the next morning — and every historical deal can be re-scored against the new rule in under a second to see what would have changed.
The policy is versioned and auditable.
Rule changes are tracked the way source code is tracked: who changed what, when, and why. The dashboard shows which policy version produced a given score, so you can defend decisions against the rubric that actually applied at the time — not the rubric in force today.
This is what makes the platform usable across regulated clients. The rulebook is the product; the AI is the reader.
For the engineers in the room.
Everything above described what the platform does. This section is for the stakeholders who need to know how — the stack, the pipeline, the data model, and the API surface. Skip this section if you're only here for the outcomes.
The platform is opinionated about its dependencies. Next.js 15 on Vercel for the edge and server components. Supabase as the system of record (Postgres + pgvector + Storage + Realtime + RLS). Inngest for durable orchestration. A provider-agnostic LLM router fronting Claude Sonnet 4.5, Haiku 4.5, and Voyage-3 embeddings. Observability through Langfuse. Every piece earns its place.
Next.js 15 App Router — RSC + Server Actions
React Server Components render dashboard shells on the edge. Supabase Realtime streams risk-score updates into client components as the pipeline completes. Suspense boundaries match pipeline stages, so partial results paint progressively — Aude sees delivery risks before commercial ones finish computing.
Server Actions, tRPC, and Supabase RPC
Type-safe server actions handle mutations. Read paths use row-level-security-scoped Supabase queries with pgbouncer pooling. Long-running reads go through tRPC subscriptions; high-fanout reads are served from Upstash Redis with a 60-second TTL. The front end never talks to Anthropic or OpenAI directly.
Inngest durable workflows + worker fleet
Each document triggers a multi-step Inngest function with retries, step-level idempotency, and checkpointing. Heavy extraction runs on a Fly.io worker pool; OCR and layout parsing run on dedicated containers with larger memory footprints. Failures replay from any step without reprocessing the whole document.
LLM routing, structured extraction, embeddings
Claude Sonnet 4.5 handles high-stakes risk reasoning and clause interpretation via structured outputs (JSON Schema + tool use). Haiku 4.5 runs cheap first-pass classification and routing. Mistral OCR handles scanned pages. Voyage-3 produces embeddings for semantic search and cross-document memory. All calls pass through a provider-agnostic router with fallbacks and cost telemetry.
Supabase Postgres + pgvector + Storage + R2
Postgres holds the normalized deal model, extractions, risks, and citations. pgvector stores chunk embeddings (HNSW-indexed). Supabase Storage holds the original artifact plus rasterized page images. Cloudflare R2 mirrors the immutable archive for long-term retention. All writes flow through a typed deal_events append-only log — the dashboard is a projection of that log.
The pipeline, stage by stage.
Eleven stages. Each one shows what the platform does in plain English and — for the engineers — the actual operation. The interactive trace-player is Deliverable-5 work; this is the static contract.
The data model.
Append-only events as the source of truth. Risks, scores, and citations are projections computed from that stream — which is why the full reasoning chain can always be replayed for any dashboard element.
The platform earns its place by moving measurable numbers.
Every figure below comes from the before/after delta we've observed with pilot customers running on their own historical deal portfolios. The units are hours, euros, and consistency scores — not model benchmarks.