Observability is the ability to measure the internal states of a system by examining its outputs. Whether through application performance monitoring (APM), telemetry data, log analytics, traces, or metrics, the more real time insights you have into your system, the more quickly you can pinpoint performance problems and mitigate risks.
“Observability is about getting the right information at the right time into the hands of the people who have the ability and responsibility to do the right thing. Helping them make better technical and business decisions driven by real data, not guesses, or hunches, or shots in the dark. Time is the most precious resource you have — your own time, your engineering team’s time, your company’s time.”
Observability is important because it allows you to spot bottlenecks, resolve outages, and glean valuable insights about how your software behaves.
It is possible to observe complex distributed systems by correlating telemetry data – traces, metrics, and logs.
Here is a simple example where you can observe the flow of execution that happens when you call a method called requestStarted, where the entire trace is broken down into its constituent spans:
With this information, it becomes possible to find bugs, isolate performance bottlenecks, or set intelligent alerts.
OpenTelemetry is a Cloud Native Computing Foundation (CNCF) project dedicated to creating an open standard for application telemetry instrumentation. OpenTelemetry is emerging as the preeminent telemetry protocol in the observability industry.
Sometimes metrics go wonky and make you raise an eyebrow, at which time you’ll check the traces. Sometimes traces show increased latency, which makes you check on the metrics to see what they might tell you. The application may be emitting logs that provide more information about what was going on at the time of degraded performance. In this way, we start to see how analyzing metrics, traces, and logs together using an observability platform makes it much easier to understand and troubleshoot your complex distributed systems. Here are some popular application performance monitoring (APM) observability platforms:
Several of these companies use Confluent to move and process telemetry data at high throughput and low latency, and you can use Confluent’s fully managed connectors to more easily integrate with the observability backend of your choice. Confluent has fully managed sink connectors for Datadog, Splunk, and Elastic, with many more being added all the time.
Confluent also offers a metrics API to give users observability into their own Confluent Cloud usage, with first-class integrations to Datadog and Grafana Cloud.
Logs, metrics, and traces pertain to observability as it relates to application performance monitoring (APM). However, businesses are also highly interested in observing how business data flows end-to-end. This is called data observability and is often spoken about in the context of “data governance”.
For example, Confluent Cloud offers a powerful Stream Lineage interface to observe data as it flows throughout a business.
The purpose of this lab is to explore a working example of how OpenTelemetry enables metrics and traces in Java using the OpenTelemetry Java agent. What is nice about the Java agent is that it automatically sends telemetry data by simply instantiating Meter and Tracer objects and setting some environment variables.
There is a SpringBoot Java application that exposes an endpoint at http://localhost:8888/hello. App metrics and request traces are sent via OTLP (OpenTelemetry Protocol) to an observability backend (Elastic Observability APM in this case).
Launch the lab environment by clicking [https://gitpod.io/#https://github.com/riferrei/otel-with-java[^]](https://gitpod.io/#https://github.com/riferrei/otel-with-java[^]). ** On launch, all services are built and started with docker-compose.
Inspect the source code of the HelloApp Java application. Specifically, look at src/main/java/riferrei/otel/java/HelloAppController.java. This is where OpenTelemetry tracing and custom metrics are implemented.
Send GET requests to the Hello app.
[source,bash]
curl http://localhost:8888/hello
Repeat the previous curl command several times to the /hello endpoint as well as others (other endpoints are expected to result in error responses).
Execute the following echo command and Ctrl+Click the resulting URL to open the traces for the hello-app in the Kibana UI.
[source,bash]
echo https://5601-${GITPOD_WORKSPACE_URL#https://}/app/apm/services/hello-app/transactions
NOTE: The URL will look something like https://5601-aquamarine-python-rsq28cwb.ws-us17.gitpod.io/app/apm/services/hello-app/transactions
Scroll down to the bottom of the page and select the /hello endpoint from the Transactions section.
Scroll down again to see the trace sampling, which shows latency measurements at various stages of execution. + TIP: These trace samples are a great tool for understanding what is happening in a transaction. In more complex applications, this trace sample would show the flow of execution across many microservices, helping you to identify bugs and performance bottlenecks much more quickly.
Execute the following echo command and Ctrl+Click the resulting URL to open the "discover" area of the Kibana UI, where OpenTelemetry metrics will be automatically discovered.
echo https://5601-${GITPOD_WORKSPACE_URL#https://}/app/discover
NOTE: The URL will look something like https://5601-aquamarine-python-rsq28cwb.ws-us17.gitpod.io/app/discover. Ignore warnings.
Investigate the custom.metric.heap.memory and custom.metric.number.of.exec, which are the custom metrics defined in Constants.java and HelloAppController.java.
NOTE: This lab comes from [https://github.com/riferrei/otel-with-java[^]](https://github.com/riferrei/otel-with-java[^]), created by Ricardo Ferreira. There is a sibling repository at [https://github.com/riferrei/otel-with-golang[^]](https://github.com/riferrei/otel-with-golang[^]). The main difference between the Java and Go implementations is that the OpenTelemety Java agent creates trace spans automatically while Golang requires more manual instrumentation. There is an associated in-depth video walkthrough from SREcon 2021.
In this lab, you explored how to instrument a Java application using OpenTelemetry and send those app metrics and request traces to an observability backend for analysis.
From the original creators of Apache Kafka, learn why Confluent’s data streaming technologies are used by 70% of the Fortune 100. Build real-time data pipelines, unlock real-time data governance, and stream data from infinite souces for seamless data observability, monitoring, and metrics on any cloud.