How I made my AI Slack agent debuggable with Honeycomb traces

If your AI Slack bot hides tool calls, retries, and failures behind one final reply, Honeycomb traces make the hidden middle visible.

How I made my AI Slack agent debuggable with Honeycomb traces
Do not index
Do not index
River. Viktor. Stilla. PailFlow. From DIY bots to venture-funded startups, AI coworkers that live in Slack are having a moment.
Most Slack agents have the same basic shape. A person sends a Slack message. Slack sends an event to your app. Your app kicks off an agent, often inside a sandbox. The agent runs tools, writes output, and your app posts back into Slack.
When I use an agent in the terminal, I can see the mess: tool calls, permission prompts, stalls, and weird little "I can't do that" moments. When the same kind of agent runs through Slack, most of that disappears behind a typing indicator and one final message.
If you're building an AI tool that works this way, you're going to need traces. This is the setup I used to send mine to Honeycomb.

What is a trace and how do you determine your trace shape?

A trace is the story of one request as it moves through your system. For a Slack agent, that means one Slack message turning into one agent run and one Slack reply.
Your trace shape depends on your architecture. PailFlow has a chat gateway that pings an E2B sandbox, loads the right skills and templates, and runs on opencode.
For PailFlow, I put the telemetry in the chat gateway because that layer sees the whole workflow. It sees the Slack prompt, the sandbox lifecycle, the OpenCode event stream, stdout and stderr, retry behavior, run status, and Slack delivery status.
That gateway creates one trace for one run. The trace is made of spans. A span is one timed step inside the run, like preparing the sandbox, starting OpenCode, watching a tool call, or sending the final Slack reply.
For my bot, the trace includes spans like:
  • pailflow.run.execute
  • pailflow.e2b.sandbox.prepare
  • pailflow.opencode.command
  • invoke_agent pailflow.standard_opencode
The first few spans explain the backend path. The invoke_agent span explains the agent session. Inside that span, PailFlow attaches events for the user message, assistant responses, tool calls, retries, and run summary.
That gives me the infrastructure path and the agent timeline in the same trace.

How to add Honeycomb traces step by step

1. Send OpenTelemetry traces to Honeycomb

OpenTelemetry is the standard way to create and send trace data. Honeycomb is where I review that data. In PailFlow, the gateway sends traces to Honeycomb when HONEYCOMB_API_KEY is present.

2. Create one top-level span for the agent run

For PailFlow, the top-level run span is pailflow.run.execute. This is the span I want to find when someone says, "the bot got stuck" or "Slack never got a good answer."

3. Add spans for the big infrastructure steps

These are the spans that tell me whether the basic plumbing worked:
  • pailflow.e2b.sandbox.prepare
  • pailflow.opencode.command
They answer questions like: Did the sandbox start? Did OpenCode run? How long did each step take? Did the failure happen before the agent really got going?

4. Add an agent invocation span for Honeycomb Agent Timeline

One thing I missed at first: adding traces is not the same thing as getting an agent timeline.
You can emit a bunch of spans and Honeycomb will show you a normal backend trace. That is useful, but it still reads like infrastructure: Slack event received, sandbox started, command ran, response sent. You can see what services ran and how long they took, but you do not automatically get a readable view of what the agent did.
For that, you need Honeycomb's Agent Timeline.
The way I understand it, Agent Timeline works when Honeycomb can recognize an agent invocation span, then see the agent's activity as events inside that span. So instead of showing only backend work, the trace can show the agent session: the user prompt, assistant messages, tool calls, tool results, retries, and final output.
A normal trace might look like:
slack.event.received -> sandbox.prepare -> opencode.command -> slack.reply.sent
An agent timeline is more like:
invoke_agent -> user_message -> assistant_response -> tool_call -> tool_result -> assistant_response -> run_summary
That distinction matters. Honeycomb can automatically show normal service spans, but it will not magically know what happened inside your agent. You have to emit the agent-shaped events yourself.
As far as I understand it, "Agent Timeline" is Honeycomb's specific product view for this. Other observability tools may support GenAI tracing, span events, or custom trace views, but Honeycomb's Agent Timeline is the specific feature I was targeting.
PailFlow emits a Honeycomb Agent Timeline-compatible span:
invoke_agent pailflow.standard_opencode
That span gets attributes Honeycomb can use to group and display the run:
  • gen_ai.conversation.id
  • gen_ai.agent.name
  • gen_ai.operation.name
  • gen_ai.request.model
  • app.run_id
This is the span that helps Honeycomb show the agent session as a timeline instead of a pile of unrelated events.

5. Attach the agent's activity as events

Honeycomb can show service calls, but it cannot know what your agent did unless you tell it. So the gateway sends the moments I actually care about: the user message, the tool call, the agent response, and the run summary.
For PailFlow, the gateway parses structured opencode run --format json output and emits events such as:
  • opencode.user_message
  • opencode.tool_call
  • opencode.agent_response
  • opencode.run_summary
  • opencode.parser_skipped
That makes the hidden middle visible: the prompt, assistant-visible text, tool names, tool status, token and cost data, retry markers, and failure summaries.

6. Verify with a real failed run

After adding traces, run the bot from Slack and inspect the trace in Honeycomb. You should be able to answer:
  • Did Slack reach the gateway?
  • Was the run created?
  • Did E2B prepare a sandbox?
  • Did OpenCode start?
  • What prompt did the agent receive?
  • Which tools did it call?
  • What assistant-visible text did it produce?
  • Did it retry?
  • Did the run fail or complete?
  • Did the Slack reply send?
  • Is anything sensitive leaking?
If the trace cannot answer those questions, add another span or event at the layer that can see the missing context.

Conclusion

If your AI bot runs inside Slack, traces are how you make the hidden middle visible.
Start by asking what you need to know when the bot responds badly. Then emit spans around those moments so one Slack request becomes one reviewable run in Honeycomb.

Written by

Lola
Lola

Lola is the founder of Lunch Pail Labs. She enjoys discussing product, app marketplaces, and running a business. Feel free to connect with her on Twitter or LinkedIn.