Spring Boot Observability: Metrics Tuning

by Joris Kuipers, November 19, 2024

After writing about database metrics and metrics for Spring Batch I realized that apart from these more advanced topics, there are some simple tuning tips & tricks that I haven’t written down comprehensively together yet but only covered in various presentations and tweets.

So, without further ado, here are some simple ways to improve your experience when using Spring metrics with Micrometer!

Reducing tag cardinality

“Cardinality” is just a fancy term to indicate the number of possible values that something can have. Systems that store metrics don’t cope well with high-cardinality tags, like for example order IDs. In its new observability API, Micrometer therefore distinguishes explicitly between high-cardinality values that are used for exported traces only and low-cardinality values that are also used as tags (or dimensions, which is simply a different name for the same concept) of metrics.

I’ve found the HTTP metrics that Spring provides to provide tremendous value in dashboarding. You can use them to see the number of requests, filter by status, see execution times, etc.: check this demo for some examples.
For both incoming and outgoing requests, these requests include a uri tag that holds the request’s URL path.

However, it can happen that these values include request parameters that often have unique values. That will cause the cardinality of this tag to explode, which can eventually result in it simply not being recorded anymore.

Fortunately, Micrometer makes it very easy to tune how metrics are captured and exported by defining beans of type MetricsFilter. Here’s how we use that to ensure that any query parameters will be stripped from the uri tag of the http.server.requests and http.client.requests metrics:

/**
 * Ensures that <code>uri</code> tags of HTTP-related metrics do not include request parameters,
 * to reduce cardinality. The {@code @Order} ensures this filter runs before the auto-configured
 * ones that check if the maximum cardinality for the {@code uri} tag has been reached.
 *
 * @see <a href="https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#actuator.metrics.supported.spring-mvc">MVC metrics</a>
 * @see <a href="https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#actuator.metrics.supported.http-clients">RestTemplate metrics</a>
 */
@Bean @Order(-10)
MeterFilter queryParameterStrippingMeterFilter() {
    return MeterFilter.replaceTagValues("uri", url -> {
        int i = url.indexOf('?');
        return i == -1 ? url : url.substring(0, i);
    });
}

You can add this to a regular configuration class: we have a mono-repo with three dozen microservices and some shared libraries, so we actually define this as Spring Boot autoconfiguration that is automatically to applied to all services.
For more information, check out my Boot Loot presentation!

Ensuring that host name is captured for HTTP client metrics

As I mentioned already, I believe that the metrics for HTTP requests are extremely useful. We have a lot of dashboards that plot all sorts of graphs based on these, including some that show info on outgoing requests filtered by the host name of the server that we’re calling.

However, after upgrading to Spring Boot 3 I noticed that the corresponding client.name tag was no longer present in the HTTP client metrics. It turns out that in this version, this tag is excluded by default but again it’s very easy to change the configuration.

This is how you can ensure that name of the downstream server (which is expected to be a low-cardinality value in most applications) is included again:

/**
 * Starting with Boot 3, the {@code client.name} tag is no longer included by default
 * in the {@code http.client.requests} metrics. Restore it by overriding
 * {@link DefaultClientRequestObservationConvention#getLowCardinalityKeyValues(ClientRequestObservationContext)}.
 *
 * @return {@link ClientRequestObservationConvention} that adds the {@code client.name} to the low cardinality key-values.
 */
@Bean
ClientRequestObservationConvention clientNameAddingObservationConvention() {
    return new DefaultClientRequestObservationConvention() {
        @Override
        public KeyValues getLowCardinalityKeyValues(ClientRequestObservationContext context) {
            return super.getLowCardinalityKeyValues(context).and(this.clientName(context));
        }
    };
}

Adding Common Tags

Especially in a microservices environment, it’s important to be able to distinguish between metrics produced by different services. Also, when you’re running multiple instances of the same service it’s crucial that metrics are all stored individually and don’t overwrite each other. With Micrometer, an easy way to do this is to add a MeterRegistryCustomizer bean. Here’s an example that shows how we apply this:

/**
 * Adds some common tags for all metrics.
 */
@Bean
MeterRegistryCustomizer<MeterRegistry> commonTags(Environment env) {
    String serviceName = env.getProperty("spring.application.name", "unknown-service");
    Region awsRegion = null;
    if (!env.acceptsProfiles(Profiles.of("local", "test"))) {
        var regionProvider = DefaultAwsRegionProviderChain.builder().build();
        awsRegion = regionProvider.getRegion();
    }
    String region = awsRegion != null ? awsRegion.id() : "local";
    String hostname = awsRegion != null ? System.getenv("HOSTNAME") : "localhost";
    return registry -> registry.config().commonTags(
        "service", serviceName,
        "region", region,
        // this one gets mapped to a hostname by the DatadogMeterRegistry, ensuring metrics are unique across PODs
        "instance", hostname
    );
}

We include the name of the service by retrieving the common Spring property, the AWS Region using their SDK unless we know we’re not running on AWS, and the hostname.

This last one is interesting: at first we did not do this, but then after updating to a newer version of Micrometer metrics started to go missing. It turned out that starting with that version, metrics were always published to Datadog on the same second of every minute, thereby causing metrics produced by different Kubernetes Pods running the same service to overwrite each other, because to Datadog they all looked exactly the same: for more information about this, have a look at my Spring I/O talk on this!

Cutting Costs

Metrics are a great tool, and a Spring Boot application will gather and publish a _lot_ of different metrics for you. However, storing all these metrics in a timeseries database comes at a cost. We use Datadog, which charges for every custom metric that you have. That’s fine if you’re actually using those metrics for dashboarding, alerting or just occasional exploration, but is wasteful if you’re not.

Although you can tune tools like Datadog to not index all incoming metrics, an easier way is often to simply not publish the metrics you’re not using at all. We recently added an allow list for metrics, where only well-known metrics are published and everything else is simply discarded. Here’s what that looks like:

/**
 * Filters out metrics whose names have not been whitelisted, if filtering is enabled.
 */
@Bean
MeterFilter whitelistMetricsFilter(MetricsProperties metricsProperties) {
    if (!metricsProperties.isFilteringEnabled()) return MeterFilter.accept();
    return MeterFilter.denyUnless(id -> metricsProperties.getWhitelist().stream()
        .anyMatch(entry -> {
            // it's whitelisted if name matches an entry with a wildcard, or if it occurs as-is
            if (entry.endsWith("*")) return id.getName().startsWith(entry.substring(0, entry.length() - 1));
            return entry.equals(id.getName());
        }));
}

The MetricsProperties that you see is a simply @ConfigurationProperties class with a boolean enabled property and a whitelist property that’s a List<String>.
For our purposes, that defaults to http.*,jvm.memory.*,tomcat.threads.*,resilience4j.circuitbreaker.not.permitted.calls which is a fairly short list given what Spring application will make available by default.

For a big systems with multiple services, instances per service and environments the cost savings of a simple filter like this can easily be thousands of euros/dollars/<insert your preferred currency here> per year!

Conclusion

Metrics support as provided by Micrometer and Spring is the best thing since sliced bread. I’ve sworn to never build an application that doesn’t capture metrics in one of the many supported backends again, but without tweaking the configuration you might run into problems like missing tags, missing metrics, not being able to distinguish between the publishing services or very high operational costs.

Hopefully this blog has provided you with some battle-hardened tips & tricks to avoid these pitfalls!