Claude Code and OpenTelemetry — instrumenting your .NET app for traces that matter

The previous post wired Claude Code to the Aspire dashboard’s MCP server. Logs, traces, resource state — all queryable from the editor.

That works only if the running system has something useful to say.

Most .NET apps emit a lot at runtime. HTTP spans, EF Core spans, HttpClient spans, a wall of ILogger output. None of it answers the questions you actually have during a bad afternoon: which tenant is this? how big was the batch? did we deduplicate before we wrote? where exactly did this fail?

The information is missing because nobody put it there.

This post is the third in the small Aspire arc. It is about producing telemetry that is worth querying — with Claude Code as the instrument that adds the right ActivitySource, picks defensible attributes, and reviews the sampling configuration before it hits production.

Prerequisites

You need:

a .NET 8 (or later) application — Aspire-hosted is convenient but not required
the OpenTelemetry .NET SDK (Aspire wires it up via ServiceDefaults)
an OTLP-compatible exporter — the Aspire dashboard locally, Application Insights / Seq / Jaeger / Tempo in production
Claude Code

Everything below works without Aspire. The reason I am sticking with the Aspire example is continuity with the previous two posts and because the dashboard makes the result visible without extra plumbing.

What you get for free

Aspire’s ServiceDefaults project enables OpenTelemetry by default. The AddOpenTelemetry call usually looks something like this:

builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .AddRuntimeInstrumentation();
    })
    .WithTracing(tracing =>
    {
        tracing.AddAspNetCoreInstrumentation()
               .AddHttpClientInstrumentation()
               .AddEntityFrameworkCoreInstrumentation();
    });

builder.Services.Configure<OpenTelemetryLoggerOptions>(o =>
{
    o.IncludeFormattedMessage = true;
    o.IncludeScopes = true;
});

builder.Services.AddOpenTelemetry()
    .UseOtlpExporter();

For Grouply — the multi-tenant fleet platform from the earlier posts — that gives me, out of the box:

a span per inbound HTTP request to apiservice
a span per outbound HttpClient call
a span per EF Core query
runtime metrics (GC, threads, GC pauses)
log records that carry the active trace and span id

For a POST /imports request, the dashboard then shows the HTTP span as a parent, with EF Core spans for each query and HttpClient spans for each enrichment call.

Useful, but flat.

The actual workflow — validate the batch, deduplicate VINs, enrich from the external registration API, persist, publish to Service Bus, audit — is collapsed into the single inbound HTTP span. The shape of your domain is invisible. From the dashboard you cannot tell which tenant ran the import, how big the batch was, how many vehicles were rejected, or whether the slow part was deduplication or enrichment.

That is the gap custom instrumentation closes.

Custom spans for business operations

The Grouply case: VehicleImportService.ImportBatchAsync(tenantId, batch). Validates the batch, deduplicates by VIN, enriches from an external registration API, writes to Postgres, and publishes one event per accepted vehicle to the vehicles Service Bus topic.

Without instrumentation that workflow is one span, regardless of how complex it gets.

The minimum viable change is one ActivitySource per service and one StartActivity per operation worth naming:

public sealed class VehicleImportService
{
    private static readonly ActivitySource ActivitySource =
        new("Grouply.VehicleImport");

    private readonly IVehicleRepository _repository;
    private readonly IRegistrationLookup _lookup;
    private readonly IVehicleEventPublisher _publisher;

    public async Task<ImportResult> ImportBatchAsync(
        Guid tenantId,
        IReadOnlyList<VehicleDto> batch,
        CancellationToken ct)
    {
        using var activity = ActivitySource.StartActivity("ImportBatch");
        activity?.SetTag("tenant.id", tenantId);
        activity?.SetTag("import.batch_size", batch.Count);

        var deduplicated = await DeduplicateAsync(batch, ct);
        var enriched = await EnrichAsync(deduplicated, ct);
        var persisted = await PersistAsync(tenantId, enriched, ct);
        await PublishAsync(persisted, ct);

        activity?.SetTag("import.accepted_count", persisted.Count);
        activity?.SetTag("import.rejected_count",
            batch.Count - persisted.Count);

        return new ImportResult(persisted.Count, batch.Count - persisted.Count);
    }
}

For each inner step (DeduplicateAsync, EnrichAsync, PersistAsync, PublishAsync) you do the same thing: open a child activity, set the small handful of tags that matter for that step, let it close on dispose.

tracing.AddSource("Grouply.VehicleImport");

The dashboard now shows a span tree like this:

POST /imports                                  [apiservice]
└── ImportBatch    tenant.id=…, batch_size=120
    ├── Deduplicate                duplicates=4
    ├── Enrich                     external_calls=116
    ├── Persist                    accepted=114
    └── Publish                    events=114

Same workflow, same code paths — readable instead of opaque.

I rarely write this from scratch anymore. I ask Claude Code:

Add OpenTelemetry spans to VehicleImportService.ImportBatchAsync and its private helpers. Use a single static ActivitySource named Grouply.VehicleImport. Set tags for tenant id, batch size, and per-step counts. Mark the activity as failed when an exception escapes. Follow the OpenTelemetry semantic conventions for naming.

The reasons for letting Claude Code do this rather than typing it out:

it remembers the Activity? null-conditional pattern when sampling drops the span
it gets the package reference right (System.Diagnostics.DiagnosticSource)
it produces consistent tag names across services in the same solution
it picks up existing semantic-convention names where they exist instead of inventing new ones

What it cannot decide for you: which operations deserve a span. That is a domain decision. As a rule of thumb, instrument anything that has its own retry, timeout, error category, or SLA. A loop iteration almost never deserves its own span. A cross-service call almost always does.

Attributes that aren’t noise

Tags are how spans become searchable. They are also how telemetry bills explode and how PII leaks into observability storage.

Three dimensions to weigh before adding a tag:

Cardinality. A tag’s cost grows with the number of distinct values it can take. tenant.id has hundreds of values across the fleet — fine. vehicle.vin has millions — every span becomes unique, aggregation breaks, the backend complains. Put high-cardinality identifiers in span events or structured logs, not as span tags.

Cost. Most managed exporters bill per attribute volume. Application Insights, Datadog, New Relic — all of them charge for what you send. A tag you set on every span in a 50-RPS service is millions of writes per day.

PII. Anything you put on a span gets exported, indexed, and retained for as long as the backend keeps telemetry. Emails, names, full addresses, raw request bodies — none of it belongs there.

For the Grouply import flow, the rough split:

✅ tenant.id — bounded cardinality, useful for filtering, not PII on its own
✅ import.batch_size, import.accepted_count, import.rejected_count — small integer ranges, useful aggregations
✅ import.source (api, csv, partner_feed) — single-digit cardinality, very useful
✅ retry.count — bounded, drives alerting
❌ vehicle.vin — millions of distinct values, kills aggregation
❌ user.email — PII, never as a span tag
❌ request.body — PII risk plus exporter cost
❌ connection_string, anything ending in secret — obvious, but worth saying

This is the kind of review Claude Code is well-suited for. After it adds spans I usually ask:

Review the span attributes you just added against three criteria: cardinality (no per-record identifiers), cost (no large strings), and PII (no emails, names, addresses, raw payloads). Flag any tag that fails one of these and suggest a safer alternative.

It will sometimes push back on its own earlier choices, which is exactly what you want.

Correlating traces back to logs

The OpenTelemetry logging bridge connects ILogger output to the active span. With WithLogging configured (Aspire’s ServiceDefaults already does this), every log record carries the current trace id and span id automatically:

builder.Logging.AddOpenTelemetry(o =>
{
    o.IncludeFormattedMessage = true;
    o.IncludeScopes = true;
    o.ParseStateValues = true;
});

That single hook is what makes the previous post’s MCP queries useful. Without it, asking Claude Code

Show me errors from apiservice in the last 10 minutes.

returns a list of strings. With it, every entry in that list has a trace id, and follow-up questions like

Pull the full trace for the most recent failed import, including the log entries attached to each span.

return a span tree with the structured logs at the right nodes.

This is where the trilogy clicks shut. Post 1 made the AppHost readable. Post 2 made the runtime queryable. This post puts the signal in the right place so the queries from post 2 actually return something useful.

A small habit pays off here: log at the start and end of each instrumented operation, with the structured fields you would also put on the span. The log message stays human-readable, the span carries the same fields as tags, and trace correlation does the rest.

using var activity = ActivitySource.StartActivity("ImportBatch");
activity?.SetTag("tenant.id", tenantId);
activity?.SetTag("import.batch_size", batch.Count);

_logger.LogInformation(
    "Starting import for tenant {TenantId} with {BatchSize} vehicles",
    tenantId, batch.Count);

The redundancy is intentional. The span is for the trace timeline, the log is for the textual story. Both are linked by the trace id.

Sampling — when 100% kills your bill

In local development, every span gets exported. Aspire’s dashboard handles it without complaint and you want full fidelity while you debug.

In production that does not survive contact with your backend’s billing page.

A modest service handling 50 requests per second, with five custom spans per request, produces around 22 million spans per day. At Application Insights pricing that is a meaningful number on the invoice. Multiply by the number of services and the number of environments and the picture sharpens fast.

Two strategies, both supported by OpenTelemetry:

Head-based sampling decides at the start of a trace whether to keep it. Cheap, predictable, simple to configure. The downside is that you make the keep/drop decision before you know whether the trace contained errors.

tracing.SetSampler(new ParentBasedSampler(
    new TraceIdRatioBasedSampler(0.1)));

That keeps 10% of traces, all-or-nothing per trace, and respects the parent’s sampling decision when one exists. Good default. Bad fit if your error rate is below the sampling rate, because rare errors get dropped before you ever see them.

Tail-based sampling decides after the trace is complete. Errors and slow traces get kept, healthy fast traces get sampled down. It runs in the OpenTelemetry Collector (or a vendor equivalent) rather than in-process.

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow
        type: latency
        latency: { threshold_ms: 2000 }
      - name: baseline
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }

Errors always kept. Slow traces always kept. 5% of the rest as baseline. The collector buffers traces in memory until it can decide, which is why decision_wait matters.

Picking a strategy is a judgment call. Service shape, error rate, budget, and how aggressively you want to investigate latency outliers all feed into it. Claude Code is useful here as a reviewer, not a decider:

Review my sampling configuration. The service handles roughly 50 requests per second with a 0.5% error rate. I want to keep all errors and any trace slower than 2 seconds, plus a small baseline. Tell me what this configuration will and will not catch, and what I should monitor after rolling it out.

Treat the answer as a starting point. Real traffic shape always surprises you.

Where this falls short

OpenTelemetry is moving fast, and the fast-moving parts can hurt.

Semantic conventions shift. HTTP attribute names changed in the last few releases — http.method to http.request.method, http.status_code to http.response.status_code. Dashboards built on the old names quietly stopped matching. If you pin to one version of the convention, document it.

SDK churn. OpenTelemetry’s .NET packages release frequently. Mixing minor versions across OpenTelemetry, OpenTelemetry.Extensions.Hosting, and the various OpenTelemetry.Instrumentation.* packages is a recipe for runtime surprises. Keep them aligned and update them together.

Auto-instrumentation can be too eager. EF Core instrumentation will emit a span for every query, including the chatty ones from health checks and identity middleware. HttpClient instrumentation will emit a span for every retry inside a Polly policy. The right answer is usually a filter — tracing.AddEntityFrameworkCoreInstrumentation(o => o.SetDbStatementForText = false), or a custom processor that drops noisy spans — not removing the instrumentation.

Cost is real. Application Insights bills per GB ingested. A naive 100% sampling configuration on a busy service can burn a four-figure monthly budget without anyone noticing until the invoice arrives.

Claude Code cannot predict your traffic shape. It will give you a sensible default sampling configuration, but it does not know your actual RPS, error rate, latency distribution, or fairness rules across tenants. The first week after rolling out custom instrumentation needs a human watching dashboards.

Attribute selection stays a judgment call. Cardinality bombs and obvious PII are easy to flag. Whether vehicle.year belongs on every span, whether tenant.tier should be on a span or only on a metric — those are domain decisions. Ask, do not delegate.

What this completes

The Aspire arc has three pieces.

The first post made the architecture readable. Claude Code reads the AppHost, follows the dependency graph, and explains what is wired to what.

The second post made the runtime queryable. The Aspire dashboard’s MCP server lets Claude Code ask the running system what is happening — logs, traces, resource state, restart commands.

This post puts the signal in your code. Custom spans for business operations, attributes that are useful and safe, sampling that survives production, and logs correlated to traces. Without this layer, the second post returns lists of HTTP spans and walls of unstructured log lines. With it, every question from the second post lands on telemetry that has something to say.

If you want a concrete next step, pick one service. Add an ActivitySource. Instrument one workflow with Claude Code:

Add OpenTelemetry spans to this service for the import workflow. Use a single static ActivitySource. Set tags for tenant id, batch size, and per-step counts. Then review the attributes for cardinality, cost, and PII risk.

Then start the AppHost, run the workflow, and go back to the MCP server from the previous post:

Show me the most recent trace for an import on tenant acme, including the structured logs attached to each span.

The first time the answer comes back as a clean span tree with the right tags and the right log entries on the right spans, the loop closes.

The blueprint is in the AppHost.
The behaviour is in the dashboard.
The signal is in your code — and now it is worth reading.

Prerequisites#

What you get for free#

Custom spans for business operations#

Attributes that aren’t noise#

Correlating traces back to logs#

Sampling — when 100% kills your bill#

Where this falls short#

What this completes#