Auto‑Answering Engineering Questions in Slack

Meme showing Fred from Scooby-Doo unmasking a ghost labeled 'AGENT' to reveal 'PROMPTS, IF-ELSE, LOOPS, FUNCTIONS' underneath

There's been a lot of talk about building AI agents, but real-world examples are still thin on the ground. In this post I'll break down one I shipped at Unblocked: a Slack bot that automatically answers engineering questions.

Every engineering org has a long‑tail of repeat questions that slow everyone down: "How do I restart the CI runner?", "why did we fix issue #18363 by adding the if/else in src/login.ts API?", "why does ValidationTemplateFactory take so many arguments?"

We wanted a bot that would:

Listen in public Slack channels.
Answer only when the question is engineering‑related and we're confident in the response.
Stay polite—no spamming, no low‑quality guesses.

The result is a lightweight "agent" that handled 10-40% of routine engineering questions in public channels, saving devs hours each week.

Architecture at a glance

			Slack message
   │
   ▼
[1] Question Detector        — regex drops 80 % of chatter
                               
   │
   ▼
[2] Engineering Classifier   — small LLM decides "is this an
                               eng question?"
   │
   ▼
[3] Answer Generator         — larger LLM drafts a reply with
                               citations
   │
   ▼
[4] Quality Gate             — second-pass LLM vetoes weak or
                               risky answers
   │
   ▼
[5] Post to Thread           — bot replies in-thread
		

Step 1 – Detecting questions cheaply

Slack Events API delivers every top‑level message to our Lambda.
A one‑line regex (/\?\s*$/m) weeds out statements.

80 % of Slack noise disappears with this first filter.

Step 2 - Classify "engineering" questions

What the "Is Engineering Question?" prompt actually does

Technical-scope filter. Only software-engineering or DevOps topics qualify; anything else → NO.
Explicit exclusions. Logistics, task assignments, opinion polls, or "please do X" requests short-circuit to NO.
Structured audit output. Emit JSON with links, key "clues," and a single-token YES/NO verdict — clean telemetry for dashboards and tuning.

We used a cheaper, faster, and lighter model for this step since it didn't require heavy inference, and we prioritized speed of response.

Sample Prompt

					[[SYSTEM]]
# ROLE
You are an AI assistant that ...

# TASK
Evaluate whether the user's input is a question that needs software-engineering or DevOps expertise.

# EXCLUSION CATEGORIES
- Task assignment
- Team logistics
- Non-technical matters
- Opinion polls
- Action requests

# INSTRUCTIONS

1. Clue Identification:
   - Identify up to 5 keywords in the user's input that strongly indicate a software engineering or DevOps context.

2. Relevance Determination:
   - Evaluate whether the question falls into any of the EXCLUSIONS CATEGORIES. Set `isEngineeringQuestion` to NO if the question falls into one of the EXCLUSION CATEGORIES.
   
3. Answer: Provide a YES or NO answer on whether the question is relevant and answerable.

# JSON OUTPUT FORMAT
{
  "clues": ["clue1", ...],
  "isEngineeringQuestion": "<YES|NO>"
}
				

Step 3 – Draft the answer

Questions that pass the gate flow into our standard answer-completion pipeline, which:

Retrieves the most relevant docs and code snippets.
Generates a concise reply.
Embeds citations so users can verify every claim.

Step 4 - Validate the answer

Exclusion sweep: Hedge phrases like "However, without specific details…", or "It's hard to confirm…" are instant deal-breakers.
Slice the question: Break it into clear, bite-sized parts that need answering.
Coverage scan: Check that the answer responds to every part head-on; any miss → abort.
Source check: Verify each claim is backed by a cited doc or code snippet.
Fact filter: Make sure those sources are factual, not opinion or hearsay.
Single-strike verdict: One failure in the chain flips the verdict to NO and the bot stays silent.

This step is our double-edged sword: a great answer earns big trust, but a bad one hurts twice as much. So we let a slower, pricier model run these checks. Speed takes a back seat to accuracy here.

Sample Prompt

					[[SYSTEM]]
You are an AI assistant that vets answers for software-engineering questions.

# EVALUATION CRITERIA
Explicit • Helpful • Direct • Accurate • Actionable

# EXCLUSION LIST
"However, without specific details…"
"It's hard to confirm…"
"If you need more specific guidance…"

# EVAL STEPS (summary)
0. Rephrase question
1. Check excluded phrases
2. Identify key parts
3. Score clues (SATISFIED / UNSATISFIED)
4. Directness analysis
5. Faithfulness to docs
6. Concise reasoning
9. FINAL: YES or NO (JSON)
				

Step 5 - Post the vetted answer in the thread

Once the generated response passes the final check, the bot politely responds in the thread. We also include feedback buttons labeled "Helpful" and "Unhelpful" to measure the response's usefulness to users.

Food for thought

If you've read this far, you're probably asking the next hard question:

What about security and privacy? What about testing?
Suppose the bot's reply pulls from a document that not everyone in the channel can open. Should answers be scoped to the asker's permissions, to the overlap of everyone present (often empty), or to the union of their access (riskier but more helpful)?
How do we keep expanding these prompts, catching new edge cases without breaking the ones we've already nailed?
What happens when users respond to the bot's answer? Should the bot engage with follow-up questions in the same thread?

Those topics deserve their own deep dives. Stay tuned!