Recipe for Trace Resume with OpenZipkin

Until the official documentation is updated, let me help you with resuming trace.

Let say that the code, which you need to resume or temporarily switch to an existing OpenZipkin trace (in Java) while the code is already in a scope of an active span, will look like this.

NOTE: This is just an example on how to resume the span. For more details, please check out the official documentation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// Assume that you already have a brave.propagation.TraceContext object, called "context".
final Span span = tracer.joinSpan(context);

// Set the scope of the trace to the resumed span.
try (Tracer.SpanInScope __ = tracer.withSpanInScope(span)) {
    final Span nextSpan = tracer.nextSpan().name("span-resumed-from-given-context");
    nextSpan.tag("presenter", "shiroyuki");

    nextSpan.start();

    // Set the scope to the next span.
    try (Tracer.SpanInScope nextScope = tracer.withSpanInScope(nextSpan)) {
        //////////////////////////
        //                      //
        // Doing something here //
        //                      //
        //////////////////////////
    } finally {
        nextSpan.finish();
    }
}

But unless you literally spend time going through the documentation and lots of trails and erros (and tears in the process), here is the explanation.

0. The context of this example

Suppose this code currently has its own span. Let’s call it a span of “Trace A” (Trace A, Span Ax).

Suppose you have an instance of brave.propagation.TraceContext of a span we want to resume, named context. In this example, it does not matter whether it is a new trace or an existing one but we will call it a span of “Trace B” (Trace B, Span Bi).

What does this code actually do?

From this point, you will see annotated comments in this pattern /* TC: (TraceId, SpanId) */, e.g., /* TC: (A, Ax) */ for (Trace A, Span Ax), which provides the trace ID and spand ID in the scope of the tracer.

1. Join Trace B, Span Bi (span to resume)

First, we start off by joining the span based on the given trace context as you can see from line 2.

2
/* TC: (A, Ax) */final Span span = tracer.joinSpan(context);/* TC: (A, Ax) */

Despite of invoking tracer.joinSpan, the scope of the tracer is still belong to Trace A.

2. Set the scope to the span to resume

As we have the pointer to the span of Trace B, we need to set the scope of the tracer to that span.

5
6
7
/* TC: (A, Ax) */try (Tracer.SpanInScope __ = tracer.withSpanInScope(span)) {
    /* TC: (B, Bi) --> Span Bi resumed*/
    // ...

where we put it in the try {} block.

Now, any code executions in the scope of try (line 6-22) are in the scope of the resumed span of Trace B.

3. Create the next span (Trace B, Span Bj)

Now, let’s start a new span.

6
/* TC: (B, Bi) */final Span nextSpan = tracer.nextSpan().name("...");/* TC: (B, Bi) */

where nextSpan is for Span Bj of Trace B.

Nevertheless, the scope of the tracer is still on Span Bi.

4. Start the span

Up to line 8, the next span Bj is created. What we have to do next is to start the next span Bj.

9
/* TC: (B, Bi) */nextSpan.start()/* TC: (B, Bi) */

As you can see in the annotation, despite of invoking start, the scope is still of Span Bi. However, nextSpan.start() is to set when the span starts. tracer.withSpanInScope does not do that.

WARNING: If you decided to skip to setting the new scope to the next span (Bj) without calling nextSpan.start(), ZipKin will not be able to visualize the timeline if at least of the spans does not have the start/finish time.

In this example, you will see that we don’t start the span of the resumed span (Trace B, Span Bi) simply because it is assumed that the span is managed by a different thread/process and outside the control of this example code.

5. Change the scope to the next span

As mentioned earlier, nextSpan.start() is not enough to change the scope of the trace. We need to again use tracer.withSpanInScope.

12
13
14
/* TC: (B, Bi) */try (Tracer.SpanInScope nextScope = tracer.withSpanInScope(nextSpan)) {
    /* TC: (B, Bj) --> Span Bj in scope */
    // ...

At this point, whatever you do in the try block of nextScope, it will be in the new span.

6. Return back to the origin (Trace A, Span Ax)

Once you are done with the nextScope, just make sure that the code invokes nextSpan.finish().

18
19
20
21
    } finally {
        nextSpan.finish();/* TC: (B, Bj) --> Probably... not really sure here */
    }/* TC: (B, Bi) --> Back to Trace B, Span Bi */
}/* TC: (A, Ax) --> Back to Trace A */

In this example, you will see that we also don’t close (finish) the span of the resumed span (Trace B, Span Bi) simply because it is assumed that the span is managed by a different thread/process and outside the control of this example code.

So, what do we learn here?

To resume or switch to a different trace context while the code is in another trace context with Brave’s Tracer, we need to be aware a few things:

  • Simply invoking Tracer::joinSpan does not mean the scope of the tracer is now switched away from the active one.
  • Similarly, simply invoking Tracer::nextSpan also does not mean the scope of the tracer is switched to the new one.
  • Simply invoking Tracer::withSpanInScope does not mean the start and finish timestamps of the said span are set.
  • Make sure that Span::start and Span::finish are called at the right time if you want to see the timeline visualized.

If it is confusing or there are errors, please ping me on Twitter. Thanks for reading.