
1. Does it survive unscripted input?
Say: “Let me type something weird.”
Change:
Grammar
Spelling
Format
Language
Order
Use nonsense
Real system:
Handles ambiguity, fails gracefully.
Fake system:
Breaks, resets, or suddenly “refreshes”.
2. Where exactly is the intelligence?
Ask:
“Which part is AI and which part is normal software?”
Good founders point to:
Retrieval
Ranking
Scoring
Routing
Validation
Decision logic
Bad founders point at:
The screen.
3. Is it dependent on perfect input?
If the demo:
Needs copy-pasted text
Pre-filled data
Ideal conditions
Founder typing
…you’re watching demo magic, not a product.
4. What happens when it’s wrong?
Ask:
“Show me a failure case.”
Real founders:
Willingly show errors
Explain why
Show safeguards
Fake founders:
Avoid showing failure
Claim high accuracy
Change topic
If there’s no error handling,
there’s no intelligence.
5. Can it be broken by refreshing the page?
Ask:
“What persists across sessions?”
If memory or state vanishes on refresh:
It’s either:
a stateless wrapper
or caching tricks.
Real systems know what happened earlier.
6. How fast is it?
Watch:
Latency
Thinking pauses
Weird delays
If it exactly mirrors ChatGPT speeds:
That’s a clue.
Real architecture usually shows:
Different response profiles
Model routing
Retrieval delays
7. Ask about evaluation
“How do you measure success?”
Green flags:
Ground-truth comparisons
Error analysis
Confidence scoring
Monitoring dashboards
Red flags:
“Users love it”
“Adoption is great”
No metrics
If nobody measures it,
nobody controls it.
8. Who operates it?
Ask:
“What breaks at 2 AM?”
If answer =
“We’ll handle it.”
Run.
Real companies talk about:
Alerting
Failovers
On-call systems
Automated recovery
9. Can it explain itself?
Say:
“Why did it give this answer?”
If answer is vague or handwavy:
No reasoning layer exists.
Explainability isn’t optional.
It’s architecture.
10. Kill the internet. Seriously.
Ask:
“What happens if your API goes down?”
If:
The demo dies.
Chats vanish.
Features evaporate.
It’s an API hostage system.
Bonus Trick (use this once quietly):
Ask them to:
Run the demo from a laptop you choose.
Or hot-spot your own phone.
If they flinch,
you’ve caught theatre.
The Demo Smell Index
Count how often you hear:
“Roadmap”
“Soon”
“Next quarter”
“Coming feature”
“Enterprise”
“Scale”
vs:
“Bug”
“Latency”
“Tradeoff”
“Constraint”
“Limit”
“Fail”
If right column = zero, it’s fiction.
Final Rule:
Demos should make you uncomfortable.
If you never see fragility, it’s fake.