We pointed our own AI at our own NSF SBIR.

Sureel Ventures is applying for an NSF SBIR Phase I grant to fund a runtime verification layerfor conversational AI — the same grounded-and-verified discipline this product sells. So we ran the engine on our own application. We fed it the real facts behind our pitch and asked it to draft Fields 1–4. Below is exactly what it produced: every applicant claim traced to the source that supports it, and the genuine gaps flagged as needs input rather than papered over.

What this shows

✓17 grounded claims — each one entailed by a fact we actually supplied (our production HIPAA voice agent, the neuro-symbolic architecture, the team). No invented metrics, no embellishment.
◇2 needs-input flags — the engine refused to fabricate where we have nothing to cite yet: the Phase-I benchmark numbers (the research produces those) and the SAM.gov/SBC eligibility status. It surfaced the gap instead of filling it.
⚠0 unsupported claims on this run — because we kept the draft strictly inside the sources. When a draft overreaches, this is where it shows up, in red, before a program officer ever sees it.

This page shows a committed snapshot for stability; the live /draft flow runs the same engine against the real model on your own sources.

Grounding ledger

Every applicant-specific claim, checked against its source. Unsupported claims are flagged — never silently shipped.

✓17grounded⚠0unsupported◇2needs input19 claims · claude-sonnet-4-6

Field 1 — The Technology Innovation

The central research question: can a hybrid neuro-symbolic verification layer drive policy-violation and ungrounded-answer rates toward zero in open-ended dialogue while preserving conversational quality at a latency and cost a small business can afford [S4]? The innovation is a runtime verification and grounding layer that wraps an LLM agent operating the live front line of a regulated business with two properties current conversational AI cannot guarantee: verifiable policy-boundedness, where every candidate utterance is checked against an explicit, machine-checkable policy before it is spoken, and verifiable grounding with tamper-evident provenance, where every factual answer must be entailed by an authorized data source and carries a cryptographic, hash-chained record of what was said, why it was allowed, and which source row it came from [S2]. This is hard because today's LLM guardrails are probabilistic and prompt-based — they reduce bad outputs but cannot bound them and have no rigorous notion of 'grounded'; the architecture is a neuro-symbolic split in which the LLM proposes and a symbolic policy verifier plus an entailment-based grounding checker dispose [S4]. A positive result unlocks a reusable trust layer for AI in any regulated domain.

Claim grounding (4)

✓
Central research question is whether a neuro-symbolic verification layer can drive violation and ungrounded rates toward zero while staying conversational and SMB-affordable. → [S4] The architecture (neuro-symbolic split)
Stated verbatim in the open-problem framing of the architecture source.
✓
The layer provides verifiable policy-boundedness — utterances checked against a machine-checkable policy before being spoken. → [S2] The innovation (runtime verification layer)
Property (1) of the innovation source.
✓
The layer provides verifiable grounding with a cryptographic, hash-chained provenance record (what was said, why allowed, which source row). → [S2] The innovation (runtime verification layer)
Property (2) of the innovation source, quoted closely.
✓
Current LLM guardrails are probabilistic/prompt-based and cannot bound outputs; the design is a neuro-symbolic propose/dispose split. → [S4] The architecture (neuro-symbolic split)
Architecture source states exactly this.

Field 2 — Technical Objectives and Challenges

Objective 1 formalizes a machine-checkable declarative policy schema for healthcare-intake dialogue, delivering the schema plus a reference clinic-intake policy and measuring coverage against a labeled corpus of real intake turns [S5]. Objective 2 builds the runtime pre-utterance policy verifier plus an entailment-based grounding verifier that emits the tamper-evident provenance record, measured by precision and recall on held-out labeled turns [S5]. Objective 3 benchmarks the wrapped agent head-to-head against two baselines — an unguarded LLM agent and a prompt-only-guardrail agent — measuring policy-violation rate, hallucination/ungrounded-answer rate, task-completion rate, and added latency and per-conversation cost [S5]. Objective 4 adds real-time mid-conversation routing that decides whether a request must be handled by BAA-covered infrastructure or escalated to a human, without leaking what the caller disclosed [S5]. The milestone go/no-go gate for Phase II is a defensible measured reduction in violation and ungrounded rates over both baselines at acceptable task-completion and latency [S5]. Phase I PRODUCES these numbers; the project does not assume them [S9]. NEEDS INPUT: target numeric thresholds (e.g. an exact added-latency budget in milliseconds and a task-completion margin) if the program wants quantified targets rather than the methodology alone.

Claim grounding (6)

✓
Objective 1 delivers a declarative policy schema + reference clinic-intake policy, measured by coverage on labeled real intake turns. → [S5] Phase I research plan (objectives)
Objective 1 of the research-plan source.
✓
Objective 2 builds the pre-utterance policy verifier + entailment grounding verifier emitting provenance, measured by precision/recall on held-out turns. → [S5] Phase I research plan (objectives)
Objective 2 of the research-plan source.
✓
Objective 3 benchmarks against an unguarded LLM and a prompt-only-guardrail baseline on violation rate, ungrounded rate, task-completion, latency, and cost. → [S5] Phase I research plan (objectives)
Objective 3, including the two named baselines and four metrics.
✓
Objective 4 adds mid-conversation BAA-vs-human routing without leaking caller disclosures. → [S5] Phase I research plan (objectives)
Objective 4 of the research-plan source.
✓
Phase I produces the metrics rather than assuming them. → [S9] Honesty posture (no fabricated metrics)
Honesty-posture source states the metrics are the output, not an assumption.
◇
Exact numeric thresholds (latency budget in ms, completion margin) for the go/no-go gate. → needs applicant input
Sources describe the gate qualitatively; no committed numeric targets exist yet — flagged rather than invented.

Field 3 — The Market Opportunity

The immediate customer is the small-to-mid healthcare practice — clinics and telehealth providers drowning in phone, intake, and after-hours demand who cannot risk an AI that says the wrong thing under HIPAA, and who already pay for answering services and after-hours staff [S6]. From that beachhead the opportunity expands horizontally to other regulated small businesses (financial services, legal, government-facing operations) where the verification machinery is identical and only the policy specification changes, and up-market to mid-market and enterprise regulated organizations with larger liability and compliance budgets [S6]. Because the Phase-I result is a domain-agnostic verification API, it generalizes into a licensable trust layer other platforms embed [S6]. The broader societal impact is healthcare access: a trustworthy intake agent extends a small practice's reach without adding staff, disproportionately valuable for under-resourced and rural providers where a missed call can mean a missed visit [S7].

Claim grounding (4)

✓
Beachhead customer is the small-to-mid healthcare practice that already pays for answering/after-hours services and can't risk an unbounded AI under HIPAA. → [S6] Market & beachhead
Market source states the customer and their existing spend.
✓
Opportunity expands horizontally to other regulated SMBs and up-market to enterprise, with identical machinery and only the policy changing. → [S6] Market & beachhead
Horizontal + up-market expansion is in the market source.
✓
The Phase-I result is a domain-agnostic verification API that becomes a licensable trust layer. → [S6] Market & beachhead
Stated in the market source.
✓
Broader impact is healthcare access for under-resourced and rural providers. → [S7] Broader societal impact
Societal-impact source.

Field 4 — The Company and Team

Sureel Ventures LLC is an AI software company that already designs, builds, and operates production AI inside regulated industries [S1] — including a HIPAA-aligned, voice-based clinic-intake agent running on live patient lines with grounding, policy-routing, and tamper-evident audit-logging already in early production form [S3]. The company is not proposing to discover whether this is buildable; it is proposing the rigorous research that turns a working, operator-validated system into a measured, generalizable trust layer, giving the project real deployment data and live regulatory constraints to test against rather than a synthetic benchmark [S3]. Doug Waun (Founder & CEO; Principal Investigator) has built and runs production AI across multiple regulated businesses and already meets NSF's ≥50%-employed PI rule [S8]. Mike Ion, PhD (Mathematics; senior technical advisor) brings the AI/LLM research depth for the neuro-symbolic verification and entailment-grounding work [S8]. The company's edge is the combination NSF prizes: a real in-production commercial wedge paired with PhD-grade research depth and the discipline to measure honestly, including reporting a negative result [S8]. NEEDS INPUT: SAM.gov (UEI) registration and SBC Registry status, and cap-table confirmation of ≥50% US-citizen/PR ownership and <500 employees — required for eligibility but not yet active.

Claim grounding (5)

✓
Sureel Ventures LLC is an AI software company operating production AI in regulated industries. → [S1] Company & legal entity
Company source — also the cannabis-firewall anchor (AI software only).
✓
It already runs a HIPAA-aligned voice clinic-intake agent on live patient lines with grounding, policy-routing, and tamper-evident audit logging in early production. → [S3] Production deployment (HIPAA voice intake)
Deployment source states this directly.
✓
Doug Waun is Founder/CEO and PI and meets NSF's ≥50%-employed rule. → [S8] Team & PI
Team source.
✓
Mike Ion, PhD (Mathematics) is senior technical advisor bringing AI/LLM research depth. → [S8] Team & PI
Team source.
◇
SAM.gov/UEI + SBC Registry status and ≥50% US-ownership / <500-employee cap-table confirmation. → needs applicant input
Eligibility prerequisites not present in any source and not yet active — flagged, not asserted.

The sources we gave it

[S1]Company & legal entity
Sureel Ventures LLC is an AI software company that designs, builds, and operates production AI inside regulated industries. It is a for-profit US small business.
[S2]The innovation (runtime verification layer)
Sureel's core technology is a runtime verification and grounding layer that wraps a large-language-model agent operating the live front line of a regulated business (answering phones, taking intake, answering questions from operational data). It provides two properties current conversational AI cannot guarantee: (1) verifiable policy-boundedness — every candidate utterance is checked against an explicit, machine-checkable policy BEFORE it is spoken, so violations are suppressed pre-utterance rather than detected after harm; (2) verifiable grounding with tamper-evident provenance — every factual answer must be entailed by an authorized data source and carries a cryptographic, hash-chained record of what was said, why it was allowed, and which source row it came from.
[S3]Production deployment (HIPAA voice intake)
Sureel already operates a HIPAA-aligned, voice-based clinic-intake agent running on live patient lines, with grounding, policy-routing, and tamper-evident audit-logging in early production form. This gives the project real deployment data and live regulatory constraints to test against, not a synthetic benchmark.
[S4]The architecture (neuro-symbolic split)
The architecture is a neuro-symbolic split: the LLM proposes a candidate utterance; a symbolic policy verifier and an entailment-based grounding checker dispose. Today's LLM guardrails are probabilistic and prompt-based — they reduce bad outputs but cannot bound them and have no rigorous notion of 'grounded.' Driving violation and ungrounded-answer rates toward zero in unconstrained dialogue, without over-blocking valid speech or exceeding a conversational latency budget, is an open problem.
[S5]Phase I research plan (objectives)
Phase I builds the verification layer and measures whether it answers the research question. Objective 1: a machine-checkable declarative policy schema for healthcare-intake dialogue (delivered: schema + reference clinic-intake policy; measured: coverage on a labeled corpus of real intake turns). Objective 2: the runtime pre-utterance policy verifier plus an entailment-based grounding verifier emitting the tamper-evident provenance record (measured: precision and recall on held-out labeled turns). Objective 3: benchmark the wrapped agent against two baselines — an unguarded LLM agent and a prompt-only-guardrail agent — measuring policy-violation rate, hallucination/ungrounded-answer rate, task-completion rate, and added latency and per-conversation cost. Objective 4: mid-conversation routing that classifies in real time whether a request must be handled by BAA-covered infrastructure or escalated to a human, without leaking what the caller disclosed. The go/no-go gate for Phase II is a defensible measured reduction in violation + ungrounded rates over both baselines at acceptable task-completion and latency.
[S6]Market & beachhead
The immediate customer is the small-to-mid healthcare practice — clinics and telehealth providers drowning in phone, intake, and after-hours demand who cannot risk an AI that says the wrong thing under HIPAA. They already pay for answering services and after-hours staff. The opportunity expands horizontally to other regulated small businesses (financial services, legal, government-facing operations) where the verification machinery is identical and only the policy specification changes, and up-market to mid-market and enterprise regulated organizations with larger liability and compliance budgets. Because the Phase-I result is a domain-agnostic verification API, it generalizes into a licensable trust layer other platforms embed.
[S7]Broader societal impact
The broader societal impact is healthcare access: a trustworthy intake agent extends a small practice's reach (after-hours scheduling, fewer missed calls, faster triage routing) without adding staff — disproportionately valuable for under-resourced and rural providers who cannot afford 24/7 front-desk coverage and where a missed call can mean a missed visit.
[S8]Team & PI
Doug Waun is Founder & CEO and the Principal Investigator; he has built and runs production AI across multiple regulated businesses and is already at least 50% employed by Sureel, meeting NSF's PI rule. Mike Ion, PhD (Mathematics) is a senior technical advisor (consultant) bringing AI/LLM research depth for the neuro-symbolic verification and entailment-grounding work. The company's edge is the combination NSF prizes: a real in-production commercial wedge grounded in regulatory data, paired with PhD-grade research depth and the discipline to measure honestly, including reporting a negative result if the verifier cannot beat existing guardrails on the latency/quality trade-off.
[S9]Honesty posture (no fabricated metrics)
Phase I PRODUCES the violation-rate, ungrounded-rate, task-completion, and latency numbers — the company does not yet have measured Phase-I benchmark results and does not claim any. The methodology is the deliverable; the metrics are the output of the work, not an assumption.