Instrumentation

Instrumentation for OpenTelemetry Erlang/Elixir

You are viewing the English version of this page because it has not yet been fully translated. Interested in helping out? See Contributing.

插桩是指向应用中添加可观测性代码的行为。

如果你正在对一个应用进行插桩,需要使用适合你语言的 OpenTelemetry SDK。然后,你可以使用 SDK 初始化 OpenTelemetry,并使用 API 对代码进行插桩。这将从你的应用及其安装的任何带有插桩的库中导出遥测数据。

如果你正在对一个库进行插桩,只需安装适合你语言的 OpenTelemetry API 包。你的库不会自行导出遥测数据。 只有当该库作为使用 OpenTelemetry SDK 的应用的一部分时,它才会导出遥测数据。 有关如何对库进行插桩的更多信息,请参见插桩库

有关 OpenTelemetry API 和 SDK 的更多信息,请参见规范

Setup

Add the following dependencies to your project:

  • opentelemetry_api: contains the interfaces you’ll use to instrument your code. Things like Tracer.with_span and Tracer.set_attribute are defined here.
  • opentelemetry: contains the SDK that implements the interfaces defined in the API. Without it, all the functions in the API are no-ops.
# mix.exs
def deps do
  [
    {:opentelemetry, "~> 1.3"},
    {:opentelemetry_api, "~> 1.2"},
  ]
end

Traces

Initialize Tracing

To start tracing a TracerProvider is required for creating a Tracer. When the OpenTelemetry SDK Application (opentelemetry) boots, it starts and configures a global TracerProvider. A Tracer for each loaded OTP Application is created once the TracerProvider has started.

If a TracerProvider is not successfully created (for example, the opentelemetry application is not booted or fails to boot), the OpenTelemetry APIs for tracing will use a no-op implementation and will not generate data.

Acquiring a Tracer

Each OTP Application has a Tracer created for it when the opentelemetry Application boots. The name and version of each Tracer is the same as the name and version of the OTP Application the module using the Tracer is in. If the call to use a Tracer is not in a module, for example when using the interactive shell, a Tracer with a blank name and version is used.

The created Tracer’s record can be looked up by the name of a module in the OTP Application:

opentelemetry:get_application_tracer(?MODULE)
:opentelemetry.get_application_tracer(__MODULE__)

This is how the Erlang and Elixir macros for starting and updating Spans get a Tracer automatically without need for you to pass the variable in each call.

Create Spans

Now that you have Tracers initialized, you can create Spans.

?with_span(main, #{}, fun() ->
                        %% do work here.
                        %% when this function returns the Span ends
                      end).
require OpenTelemetry.Tracer

...

OpenTelemetry.Tracer.with_span :main do
  # do work here
  # when the block ends the Span ends
end

The above code sample shows how to create an active Span, which is the most common kind of Span to create.

Create Nested Spans

parent_function() ->
    ?with_span(parent, #{}, fun child_function/0).

child_function() ->
    %% this is the same process, so the span parent set as the active
    %% span in the with_span call above will be the active span in this function
    ?with_span(child, #{},
               fun() ->
                   %% do work here. when this function returns, child will complete.
               end).
require OpenTelemetry.Tracer

def parent_function() do
    OpenTelemetry.Tracer.with_span :parent do
        child_function()
    end
end

def child_function() do
    # this is the same process, so the span :parent set as the active
    # span in the with_span call above will be the active span in this function
    OpenTelemetry.Tracer.with_span :child do
        ## do work here. when this function returns, :child will complete.
    end
end

Spans in Separate Processes

The examples in the previous section were Spans with a child-parent relationship within the same process where the parent is available in the process dictionary when creating a child Span. Using the process dictionary this way isn’t possible when crossing processes, either by spawning a new process or sending a message to an existing process. Instead, the context must be manually passed as a variable.

To pass Spans across processes we need to start a Span that isn’t connected to particular process. This can be done with the macro start_span. Unlike with_span, the start_span macro does not set the new span as the currently active span in the context of the process dictionary.

Connecting a span as a parent to a child in a new process can be done by attaching the context and setting the new span as currently active in the process. The whole context should be attached in order to not lose other telemetry data like baggage.

SpanCtx = ?start_span(child),

Ctx = otel_ctx:get_current(),

proc_lib:spawn_link(fun() ->
                        otel_ctx:attach(Ctx),
                        ?set_current_span(SpanCtx),

                        %% do work here

                        ?end_span(SpanCtx)
                    end),
span_ctx = OpenTelemetry.Tracer.start_span(:child)
ctx = OpenTelemetry.Ctx.get_current()

task = Task.async(fn ->
                      OpenTelemetry.Ctx.attach(ctx)
                      OpenTelemetry.Tracer.set_current_span(span_ctx)
                      # do work here

                      # end span here
                      OpenTelemetry.Tracer.end_span(span_ctx)
                  end)

_ = Task.await(task)

Linking the New Span

A Span can be created with zero or more Span Links that causally link it to another Span. A Span Link needs a Span context to be created.

Parent = ?current_span_ctx,
proc_lib:spawn_link(fun() ->
                        %% a new process has a new context so the span created
                        %% by the following `with_span` will have no parent
                        Link = opentelemetry:link(Parent),
                        ?with_span('other-process', #{links => [Link]},
                                   fun() -> ok end)
                    end),
parent = OpenTelemetry.Tracer.current_span_ctx()
task = Task.async(fn ->
                    # a new process has a new context so the span created
                    # by the following `with_span` will have no parent
                    link = OpenTelemetry.link(parent)
                    Tracer.with_span :"my-task", %{links: [link]} do
                      :hello
                    end
                 end)

Adding Attributes to a Span

Attributes let you attach key/value pairs to a Span so it carries more information about the current operation that it’s tracking.

The following example shows the two ways of setting attributes on a span by both setting an attribute in the start options and then again with set_attributes in the body of the span operation:

?with_span(my_span, #{attributes => [{'start-opts-attr', <<"start-opts-value">>}]},
           fun() ->
               ?set_attributes([{'my-attribute', <<"my-value">>},
                                {another_attribute, <<"value-of-attribute">>}])
           end)
Tracer.with_span :span_1, %{attributes: [{:"start-opts-attr", <<"start-opts-value">>}]} do
  Tracer.set_attributes([{:"my-attributes", "my-value"},
                         {:another_attribute, "value-of-attributes"}])
end

Semantic Attributes

Semantic Attributes are attributes that are defined by the OpenTelemetry Specification in order to provide a shared set of attribute keys across multiple languages, frameworks, and runtimes for common concepts like HTTP methods, status codes, user agents, and more. These attribute keys are generated from the specification and provided in opentelemetry_semantic_conventions.

For example, an instrumentation for an HTTP client or server would need to include semantic attributes like the scheme of the URL:

-include_lib("opentelemetry_semantic_conventions/include/trace.hrl").

?with_span(my_span, #{attributes => [{?HTTP_SCHEME, <<"https">>}]},
           fun() ->
             ...
           end)
alias OpenTelemetry.SemanticConventions.Trace, as: Trace

Tracer.with_span :span_1, %{attributes: [{Trace.http_scheme(), <<"https">>}]} do

end

Adding Events

A Span Event is a human-readable message on an Span that represents a discrete event with no duration that can be tracked by a single timestamp. You can think of it like a primitive log.

?add_event(<<"Gonna try it">>),

%% Do the thing

?add_event(<<"Did it!">>),
Tracer.add_event("Gonna try it")

%% Do the thing

Tracer.add_event("Did it!")

Events can also have attributes of their own:

?add_event(<<"Process exited with reason">>, [{pid, Pid)}, {reason, Reason}]))
Tracer.add_event("Process exited with reason", pid: pid, reason: Reason)

Set Span Status

A Status can be set on a Span, typically used to specify that a Span has not completed successfully - StatusCode.ERROR. In rare scenarios, you could override the Error status with StatusCode.OK, but don’t set StatusCode.OK on successfully-completed spans.

The status can be set at any time before the span is finished:

-include_lib("opentelemetry_api/include/opentelemetry.hrl").

?set_status(?OTEL_STATUS_ERROR, <<"this is not ok">>)
Tracer.set_status(:error, "this is not ok")

Metrics

To produce metrics the dependencies opentelemetry_experimental_api and opentelemetry_experimental must be added to the project. Application environment configuration for opentelemetry_experimental is used to configure a MeterProvider which is initialized when the application starts. Meters are created with the MeterProvider automatically on boot and the appropriate Meter is used to create instruments depending on where in your code your create the instrument. OpenTelemetry Erlang currently supports the following instruments:

  • Counter, a synchronous instrument that supports non-negative increments
  • Asynchronous Counter, an asynchronous instrument which supports non-negative increments
  • Histogram, a synchronous instrument that supports arbitrary values that are statistically meaningful, such as histograms, summaries, or percentile
  • Asynchronous Gauge, an asynchronous instrument that supports non-additive values, such as room temperature
  • UpDownCounter, a synchronous instrument that supports increments and decrements, such as the number of active requests
  • Asynchronous UpDownCounter, an asynchronous instrument that supports increments and decrements

For more on synchronous and asynchronous instruments, and which kind is best suited for your use case, see Supplementary Guidelines.

Initialize Metrics

To enable metrics in your application, you’ll need to have an initialized MeterProvider with a Reader. This is done through configuration of the opentelemetry_experimental application:

{opentelemetry_experimental,
  [{readers, [#{module => otel_metric_reader,
                config => #{export_interval_ms => 1000,
                            exporter => {otel_exporter_metrics_otlp, #{}}}}]}]},

This configuration tells the application to create a MetricProvider with a single Reader. The Reader exports every second to an OTLP receiver, like the collector, at localhost:4318 by default. To change the endpoint add to the map endpoints => ["<host>:<port>"] and configure the protocol to use protocol => http_protobuf | grpc.

Use exporter => {otel_exporter_metrics_console, #{}} for outputting the metrics to the console.

Acquiring a Meter

Instruments are created with a Meter. Acquiring a Meter manually is not required but done automatically when the macros for instrument creation are used.

Synchronous and asynchronous instruments

Using Counters

Counters can be used to measure a non-negative, increasing value.

Creating a counter can be done with the ?create_counter macro:

?create_counter(my_fun_counter, #{description => ~"Number of times this function
is called."})

To increment the counter use the ?counter_add macro passing the name of the instrument, the increment value and a map of attributes:

?counter_add(my_fun_counter, 1, #{}),

Using UpDown Counters

UpDown counters can increment and decrement, allowing you to observe a cumulative value that goes up or down.

For example, here’s how you report the number of items of some collection:

create_items_counter() ->
  ?create_counter('items.counter', #{description => ~"Number of items",
                                     unit => '{items}'})

add_item(Item) ->
  ...
  ?updown_counter_add('items.counter', 1),

remove_item(Item) ->
  ...
  ?updown_counter_add('items.counter', -1),

Using Histograms

Histograms are used to measure a distribution of values over time.

?create_histogram('task.duration', #{description => ~"Duration of a task",
                                     unit => 's'}),

The ?histogram_record macro is then used to record a measurement:

{Microseconds, Result} = timer:tc(TaskFun),
?histogram_record('task.duration', Microseconds),

Using Observable Counters

Observable counters can be used to measure an additive, non-negative, monotonically increasing value.

For example, here’s how you report time since the Erlang node started:

?create_observable_counter('uptime', fun(_Args) ->
                                         Uptime = erlang:convert_time_unit(erlang:monotonic_time() - erlang:system_info(start_time), native, seconds),
                                         [{Uptime, #{}}]
                                     end,
                                     [],
                                     #{description => ~"The duration since the node started.",
                                       unit => 's'}),

Using Observable UpDown Counters

Observable UpDown counters can increment and decrement, allowing you to measure an additive, non-negative, non-monotonically increasing cumulative value.

For example, the number of active HTTP connections for a web server:

?create_observable_updown_counter('http.server.active_requests', fun(_Args) ->
                                         ActiveRequests = ....
                                         [{ActiveRequests, #{}}]
                                     end,
                                     [],
                                     #{description => ~"Number of active HTTP server requests.",
                                       unit => {request}'}),

Using Observable Gauges

Observable Gauges should be used to measure non-additive values.

For example, here’s how you report memory usage of ETS tables on a node:

?create_observable_gauge('memory.ets', fun(_Args) ->
                                         EtsMemory = erlang:memory(ets),
                                         [{EtsMemory, #{}}]
                                     end,
                                     [],
                                     #{description => ~"Memory used by ETS tables.",
                                       unit => 'By'}),

Adding Attributes

Attributes can be added to any measurement as a map in the last place in the recording macro:

?updown_counter_add('items.counter', 1, #{~"key-1" => ~"value-1"}),

Registering Views

A view provides SDK users with the flexibility to customize the metrics output by the SDK. You can customize which metric instruments are to be processed or ignored. You can also customize aggregation and what attributes you want to report on metrics.

Every instrument has a default view, which retains the original name, description, and attributes, and has a default aggregation that is based on the type of instrument. When a registered view matches an instrument, the default view is replaced by the registered view. Additional registered views that match the instrument are additive, and result in multiple exported metrics for the instrument.

Here’s how you create a view that renames the latency instrument to request.latency:

{opentelemetry_experimental,
  [...
    {views, [#{name => request.latency',
               selector => #{instrument_name => 'latency'}}]}
  ]},

Or if instead you want a histogram for latency:

{opentelemetry_experimental,
  [...
    {views, [#{selector => #{instrument_name => 'latency'},
               aggregation_module => otel_aggregation_histogram_explicit}]}
  ]},

The SDK filters metrics and attributes before exporting metrics. For example, you can use views to reduce memory usage of high cardinality metrics or drop attributes that might contain sensitive data.

Here’s how you create a view that drops the latency:

{opentelemetry_experimental,
  [...
    {views, [#{selector => #{instrument_name => 'latency'},
               aggregation_module => otel_aggregation_drop}]}
  ]},

A wildcard can be used to match all instruments:

{opentelemetry_experimental,
  [...
    {views, [#{selector => #{instrument_name => '*'},
               aggregation_module => otel_aggregation_drop}]}
  ]},

Since Views are additive any additional views mean specific metrics can be exported while all others, that have no match besides the wildcard, are dropped.

Logs

The logs API, found in apps/opentelemetry_experimental_api of the opentelemetry-erlang repository, is currently unstable, documentation TBA.

Next Steps

You’ll also want to configure an appropriate exporter to export your telemetry data to one or more telemetry backends.