« Home / Posts

Auto‑Answering Engineering Questions in Slack

Meme showing Fred from Scooby-Doo unmasking a ghost labeled 'AGENT' to reveal 'PROMPTS, IF-ELSE, LOOPS, FUNCTIONS' underneath

There's been a lot of talk about building AI agents, but real-world examples are still thin on the ground. In this post I'll break down one I shipped at Unblocked: a Slack bot that automatically answers engineering questions.

Every engineering org has a long‑tail of repeat questions that slow everyone down: "How do I restart the CI runner?", "why did we fix issue #18363 by adding the if/else in src/login.ts API?", "why does ValidationTemplateFactory take so many arguments?"

We wanted a bot that would:

  1. Listen in public Slack channels.
  2. Answer only when the question is engineering‑related and we're confident in the response.
  3. Stay polite—no spamming, no low‑quality guesses.

The result is a lightweight "agent" that handled 10-40% of routine engineering questions in public channels, saving devs hours each week.


Architecture at a glance

Slack message │ ▼ [1] Question Detector — regex drops 80 % of chatter │ ▼ [2] Engineering Classifier — small LLM decides "is this an eng question?" │ ▼ [3] Answer Generator — larger LLM drafts a reply with citations │ ▼ [4] Quality Gate — second-pass LLM vetoes weak or risky answers │ ▼ [5] Post to Thread — bot replies in-thread


Step 1 – Detecting questions cheaply

80 % of Slack noise disappears with this first filter.


Step 2 - Classify "engineering" questions

What the "Is Engineering Question?" prompt actually does

  1. Technical-scope filter. Only software-engineering or DevOps topics qualify; anything else → NO.
  2. Explicit exclusions. Logistics, task assignments, opinion polls, or "please do X" requests short-circuit to NO.
  3. Structured audit output. Emit JSON with links, key "clues," and a single-token YES/NO verdict — clean telemetry for dashboards and tuning.

We used a cheaper, faster, and lighter model for this step since it didn't require heavy inference, and we prioritized speed of response.

Sample Prompt
[[SYSTEM]]
# ROLE
You are an AI assistant that ...

# TASK
Evaluate whether the user's input is a question that needs software-engineering or DevOps expertise.

# EXCLUSION CATEGORIES
- Task assignment
- Team logistics
- Non-technical matters
- Opinion polls
- Action requests

# INSTRUCTIONS

1. Clue Identification:
- Identify up to 5 keywords in the user's input that strongly indicate a software engineering or DevOps context.

2. Relevance Determination:
- Evaluate whether the question falls into any of the EXCLUSIONS CATEGORIES. Set `isEngineeringQuestion` to NO if the question falls into one of the EXCLUSION CATEGORIES.

3. Answer: Provide a YES or NO answer on whether the question is relevant and answerable.

# JSON OUTPUT FORMAT
{
"clues": ["clue1", ...],
"isEngineeringQuestion": "<YES|NO>"
}


Step 3 – Draft the answer

Questions that pass the gate flow into our standard answer-completion pipeline, which:

  1. Retrieves the most relevant docs and code snippets.
  2. Generates a concise reply.
  3. Embeds citations so users can verify every claim.


Step 4 - Validate the answer

This step is our double-edged sword: a great answer earns big trust, but a bad one hurts twice as much. So we let a slower, pricier model run these checks. Speed takes a back seat to accuracy here.

Sample Prompt
[[SYSTEM]]
You are an AI assistant that vets answers for software-engineering questions.

# EVALUATION CRITERIA
Explicit • Helpful • Direct • Accurate • Actionable

# EXCLUSION LIST
"However, without specific details…"
"It's hard to confirm…"
"If you need more specific guidance…"

# EVAL STEPS (summary)
0. Rephrase question
1. Check excluded phrases
2. Identify key parts
3. Score clues (SATISFIED / UNSATISFIED)
4. Directness analysis
5. Faithfulness to docs
6. Concise reasoning
9. FINAL: YES or NO (JSON)


Step 5 - Post the vetted answer in the thread

Once the generated response passes the final check, the bot politely responds in the thread. We also include feedback buttons labeled "Helpful" and "Unhelpful" to measure the response's usefulness to users.


Food for thought

If you've read this far, you're probably asking the next hard question:

Those topics deserve their own deep dives. Stay tuned!