Human–AI Loop · Pilot Program
The Human–AI Collaboration Testing Kit
We’ve been running the Human–AI Loop for two years. We think it produces meaningfully better outcomes than prompt-focused AI use. Now we want to find out if you get the same results — and what we need to improve.
Passive AI use → faster outputs. Human–AI Loop → better outcomes. This kit is how we test that hypothesis together.
Quick reminder
What the Human–AI Loop is. And what it isn’t.
The Loop is a structured collaboration methodology — not a prompting technique, not a tool, not a workflow automation. It’s a system for how humans and AI work together on knowledge work that requires judgment, creativity, and accountability.
The Loop IS
◎ A collaboration pattern — Test → Build → Codify → Share
◎ Human-led throughout, AI-amplified at every stage
◎ Designed for creative, judgment-intensive work
◎ A system that compounds — learning builds on learning
The Loop IS NOT
✕ A better prompt technique
✕ An automation or AI agent setup
✕ A tool you install and forget
✕ A replacement for human judgment — it’s the opposite
Want the full picture first? Read the Overview →
⚑ Important — read this before you start
The Loop requires more human judgment upfront than passive AI use. That’s by design — it’s what produces better outcomes.
You will need to brief your AI more deliberately, push back on outputs more actively, and invest time in capturing what you learned. This isn’t overhead you can skip — it’s the mechanism that makes the collaboration work.
Our thesis is that investment is worth it. The outcomes — more creative decisions, more robust outputs, reusable learning that compounds — justify the upfront cost. But we want you to experience that firsthand, not take our word for it.
We’re also committed to building tools that make the human side more scalable. One of the things we’d love your feedback on is exactly that — what tools would make the most difference in your day-to-day? More on that below.
Before you pilot
The Loop isn’t for everything. Here’s how to tell.
We’d rather you pilot something that’s a genuine fit than force the methodology onto the wrong type of work. If your use case falls into the “not a good fit” column, we’ve linked to approaches that may serve you better.
Good fit for the Loop
✓ Product strategy and decision write-ups
✓ Cross-functional planning and alignment docs
✓ Customer insight synthesis and research framing
✓ Thought leadership and positioning work
✓ Complex problem-solving with real tradeoffs
✓ Any work where human judgment shapes the outcome
Better suited to a different approach
✕ Data labeling, annotation, content moderation at scale
✕ Compliance checking and QA pipelines
✕ Fully automated processes that just need oversight
✕ Quick lookups, summaries, and single-turn tasks
✕ Anything with PII, regulated data, or legal/HR content
What we’re testing together
Our hypothesis. Your test.
We’ve seen this play out in two years of real work. Now we’re asking you to test it on yours — and tell us honestly what you find.
The hypothesis
When AI is treated as a true collaborative partner — with clear roles, structured handoffs, and human judgment at every stage — the outcomes are meaningfully more creative, more robust, and more impactful than when AI is used as a tool for faster output.
We believe the difference shows up in the quality of decisions, the depth of options considered, the reusability of what gets produced, and the confidence teams have in their outputs.
We’re not asking you to take our word for it. We’re asking you to run one cycle on real work, pay attention to what changed, and tell us what you found — including where it fell short or created friction you didn’t expect.
How to start
Two ways to run the pilot. Pick the one that fits.
The Loop works differently depending on whether you’re running it solo with your AI partners or facilitating it across a team. Both are valid starting points — but the steps look different. Figure out which scenario fits, then follow that path.
Scenario A
You + your AI partners
You’re a PM or leader who wants to test the Loop yourself first — with one or two AI teammates — before bringing it to a team. This is the fastest way to get a feel for the methodology on real work.
Your steps
Pick one real piece of work
A decision write-up, a strategy brief, a customer insight synthesis. Something bounded, something where the outcome matters.
Brief your AI deliberately
Don’t just paste in a request. Share context: what you’re trying to achieve, what you already know, what constraints exist, what a good outcome looks like.
Run Test → Build → Codify → Share
Explore the problem space first (Test). Build the output with AI (Build). Capture what worked as a reusable pattern (Codify). Share what you built and learned (Share).
Stay in the driver’s seat
Push back on output that sounds polished but misses the point. Verify claims. Redirect when needed. The Loop only works when human judgment is active throughout.
Compare before and after
Was the output more creative, more robust, or more useful than what you’d have produced with tool-style AI use? Was the overhead worth it? That’s the signal we’re after.
Recommended starting workflow
Decision or strategy one-pagers — universal, naturally produces a before/after comparison, and the codify output (a reusable brief template) is immediately useful for the next one.
Scenario B
You + a team + AI
You’re a leader who wants to run a structured pilot across a cross-functional team — with humans and AI working together through the full Loop cycle. This requires a shared test plan and clear role clarity before you start.
Your steps
Choose the workflow and the team
Pick one bounded piece of work and 2–4 team members. Cross-functional is ideal — different perspectives sharpen the brief and stress-test the output.
Set roles and ground rules
Who owns the brief? Who reviews AI output? Who makes the final call? Clarity here prevents the most common failure mode: everyone assuming someone else is driving.
Run the Loop as a team
Brief collectively, explore with AI, build together, then debrief on what the AI contributed vs. what humans had to provide. Each stage should have a clear human owner.
Codify as a team
Don’t let one person carry the learning. Capture what worked collectively — a shared prompt scaffold, a brief template, a set of principles. This is how it becomes a team asset, not a personal one.
Run a structured retro
What did AI contribute that surprised you? Where did it miss? What would you change about the collaboration setup next time? That’s your feedback to us.
Need a shareable team test plan?
We’ve put together a step-by-step guide you can share directly with your team — covering roles, how to run each stage together, and how to debrief.
Evaluation lens
What to pay attention to as you run it.
You don’t need to score anything formally. As you run the cycle, here are the areas we’re most interested in hearing about — the dimensions that tell us whether the system is working, or where it needs to improve.
Did the AI understand what you were actually trying to achieve?
Not just what you asked for — the underlying goal. If it kept missing the point, your brief probably needed more context. That’s useful signal.
We’re paying attention to: intent clarity
Did AI add something you wouldn’t have thought of alone?
Options, framings, risks, angles. If it mostly reformatted what you already knew — that’s a tool use pattern, not a collaboration pattern. Both are worth noting.
We’re paying attention to: contribution quality
Did you stay in the driver’s seat?
Did you push back, verify, redirect? Or did you find yourself accepting polished output without scrutiny? The Loop only works when human judgment is active throughout.
We’re paying attention to: human oversight
Did it fit how your team actually works?
Or did it create friction and confusion about roles? The methodology should reduce drag over time — not add it permanently.
We’re paying attention to: workflow fit
Your feedback
What we’d love to hear from you.
After you’ve run one cycle, we want your honest read. Not a formal report — just what you observed. Raw notes are fine. A conversation is even better.
Tell us
What changed — or didn’t
Was the output more creative, more robust, or more useful than tool-style AI use? Where did the methodology fall flat or disappoint?
Tell us
Where the friction was
What slowed you down? What felt forced? What would you change about the system to make it work better for your context?
Tell us
What tools would help
We’re building tools to make the human side more scalable. What would make the most difference — brief templates, retro guides, decision frameworks, something else entirely?
Send your feedback to Maura
Share your observations — what worked, what didn’t, what surprised you, what tools you’d want. The rougher the better. We’re looking for signal, not polish.
Go deeper
Everything you need to run this well.
Intent • Explore • Impact
We’re less interested in what AI can produce — and more interested in what humans and AI can achieve together.
That’s the question this pilot exists to answer. We’re grateful you’re willing to help us find out.