Skip to content

Commit

Permalink
Kubernetes tracing and metrics improvements (#3298)
Browse files Browse the repository at this point in the history
* Add missing REST configuration to docs
* Change REST API to log at info level by default to reduce log volume
* Change REST API to generate a consistent trace log for all responses
* Change Traefik to only log non-2xx requests to reduce log volume
* Fix client IP not showing up in any trace logs
* Fix Lettuce histogram metrics enabled by default and generating ~10K series
* Fix monitor not tracing requests in some scenarios and add test coverage
* Improve GCP Pub/Sub performance by reusing printer across transactions
* Remove some REST API metric buckets to reduce number of overall time series

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
  • Loading branch information
steven-sheehy authored Feb 15, 2022
1 parent e10b457 commit e5a826c
Show file tree
Hide file tree
Showing 15 changed files with 255 additions and 42 deletions.
6 changes: 5 additions & 1 deletion charts/hedera-mirror-common/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -340,10 +340,14 @@ traefik:
deployment:
kind: DaemonSet
enabled: true
globalArguments: []
globalArguments: # Expose X-Forwarded-For header for tracing
- --entryPoints.web.forwardedHeaders.insecure
- --entryPoints.websecure.forwardedHeaders.insecure
logs:
access:
enabled: true
filters:
statuscodes: 300-599
podDisruptionBudget:
enabled: true
minAvailable: 1
Expand Down
10 changes: 7 additions & 3 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ The following table lists the available properties along with their default valu
value, it is recommended to only populate overridden properties in the custom `application.yml`.

| Name | Default | Description |
| -------------------------------------------------------- | ----------------------- | ---------------------------------------------------------------------------------------------- |
| -------------------------------------------------------- |-------------------------| ---------------------------------------------------------------------------------------------- |
| `hedera.mirror.rest.cache.entityId.maxAge` | 1800 | The number of seconds until the entityId cache entry expires |
| `hedera.mirror.rest.cache.entityId.maxSize` | 100000 | The maximum number of entries in the entityId cache |
| `hedera.mirror.rest.db.host` | 127.0.0.1 | The IP or hostname used to connect to the database |
Expand All @@ -321,15 +321,18 @@ value, it is recommended to only populate overridden properties in the custom `a
| `hedera.mirror.rest.db.pool.maxConnections` | 10 | The maximum number of clients the database pool can contain |
| `hedera.mirror.rest.db.pool.statementTimeout` | 20000 | The number of milliseconds to wait before timing out a query statement |
| `hedera.mirror.rest.db.port` | 5432 | The port used to connect to the database |
| `hedera.mirror.rest.db.username` | mirror_api | The username the processor uses to connect to the database |
| `hedera.mirror.rest.db.sslMode` | DISABLE | The ssl level of protection against Eavesdropping, Man-in-the-middle (MITM) and Impersonation on the db connection. Accepts either DISABLE, ALLOW, PREFER, REQUIRE, VERIFY_CA or VERIFY_FULL. |
| `hedera.mirror.rest.db.tls.ca` | "" | The path to the certificate authority used by the database for secure connections |
| `hedera.mirror.rest.db.tls.cert` | "" | The path to the public key the client should use to securely connect to the database |
| `hedera.mirror.rest.db.tls.enabled` | false | Whether TLS should be used for the database connection |
| `hedera.mirror.rest.db.tls.key` | "" | The path to the private key the client should use to securely connect to the database |
| `hedera.mirror.rest.log.level` | debug | The logging level. Can be trace, debug, info, warn, error or fatal. |
| `hedera.mirror.rest.db.username` | mirror_api | The username the processor uses to connect to the database |
| `hedera.mirror.rest.log.level` | info | The logging level. Can be trace, debug, info, warn, error or fatal. |
| `hedera.mirror.rest.maxRepeatedQueryParameters` | 100 | The maximum number of times any query parameter can be repeated in the uri |
| `hedera.mirror.rest.maxTimestampRange` | 7d | The maximum amount of time a timestamp range query param can span for some APIs. |
| `hedera.mirror.rest.metrics.enabled` | true | Whether metrics should be collected and exposed for scraping |
| `hedera.mirror.rest.metrics.config` | See application.yml | The configuration to pass to Swagger stats (https://swaggerstats.io/guide/conf.html#options) |
| `hedera.mirror.rest.metrics.ipMetrics` | false | Whether metrics should be associated with a masked client IP label |
| `hedera.mirror.rest.network.unreleasedSupplyAccounts` | [0.0.2, 0.0.42, ...] | An array of account IDs whose aggregated balance subtracted from the total supply is the released supply |
| `hedera.mirror.rest.port` | 5551 | The REST API port |
| `hedera.mirror.rest.metrics.enabled` | true | Whether metrics are enabled for the REST API |
Expand All @@ -339,6 +342,7 @@ value, it is recommended to only populate overridden properties in the custom `a
| `hedera.mirror.rest.metrics.config.uriPath` | '/swagger' | The REST API metrics uri path |
| `hedera.mirror.rest.openapi.specFileName` | 'openapi' | The file name of the OpenAPI spec file |
| `hedera.mirror.rest.openapi.swaggerUIPath` | '/docs' | Swagger UI path for your REST API |
| `hedera.mirror.rest.response.compression` | true | Whether GZIP compression should be enabled for response bodies |
| `hedera.mirror.rest.response.includeHostInLink` | false | Whether to include the hostname and port in the next link in the response |
| `hedera.mirror.rest.response.limit.default` | 25 | The default value for the limit parameter that controls the REST API response size |
| `hedera.mirror.rest.response.limit.max` | 100 | The maximum size the limit parameter can be that controls the REST API response size |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@

import io.github.mweirauch.micrometer.jvm.extras.ProcessMemoryMetrics;
import io.github.mweirauch.micrometer.jvm.extras.ProcessThreadMetrics;
import io.lettuce.core.metrics.MicrometerCommandLatencyRecorder;
import io.lettuce.core.metrics.MicrometerOptions;
import io.lettuce.core.resource.ClientResources;
import io.lettuce.core.resource.DefaultClientResources;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.binder.MeterBinder;
import io.micrometer.core.instrument.binder.db.DatabaseTableMetrics;
import io.micrometer.core.instrument.binder.db.PostgreSQLDatabaseMetrics;
Expand All @@ -47,6 +52,14 @@ class MetricsConfiguration {
private final DataSource dataSource;
private final DataSourceProperties dataSourceProperties;

// Override default ClientResources to disable histogram metrics
@Bean(destroyMethod = "shutdown")
ClientResources clientResources(MeterRegistry meterRegistry) {
MicrometerOptions options = MicrometerOptions.builder().build();
var commandLatencyRecorder = new MicrometerCommandLatencyRecorder(meterRegistry, options);
return DefaultClientResources.builder().commandLatencyRecorder(commandLatencyRecorder).build();
}

@Bean
MeterBinder processMemoryMetrics() {
return new ProcessMemoryMetrics();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,13 @@
import java.io.IOException;

public class ProtoJsonSerializer extends JsonSerializer<Message> {

private static final JsonFormat.Printer PRINTER = JsonFormat.printer()
.includingDefaultValueFields()
.omittingInsignificantWhitespace();

@Override
public void serialize(Message message, JsonGenerator gen, SerializerProvider serializers) throws IOException {
gen.writeRawValue(JsonFormat.printer()
.includingDefaultValueFields()
.omittingInsignificantWhitespace()
.print(message));
gen.writeRawValue(PRINTER.print(message));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@
import lombok.extern.log4j.Log4j2;
import org.apache.commons.lang3.StringUtils;
import org.apache.logging.log4j.Level;
import org.reactivestreams.Publisher;
import org.springframework.http.server.reactive.ServerHttpRequest;
import org.springframework.http.server.reactive.ServerHttpResponse;
import org.springframework.util.CollectionUtils;
import org.springframework.web.server.ServerWebExchange;
import org.springframework.web.server.WebFilter;
Expand All @@ -38,35 +38,46 @@
@Named
public class LoggingFilter implements WebFilter {

static final String X_FORWARDED_FOR = "X-Forwarded-For";
private static final String ACTUATOR_PATH = "/actuator/";
private static final String LOCALHOST = "127.0.0.1";
private static final String X_FORWARDED_FOR = "X-Forwarded-For";
private static final String LOG_FORMAT = "{} {} {} in {} ms: {}";

@Override
public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
return chain.filter(exchange).transformDeferred(call -> doFilter(exchange, call));
long start = System.currentTimeMillis();
return chain.filter(exchange).transformDeferred(call ->
call.doOnEach(signal -> doFilter(exchange, signal.getThrowable(), start))
.doOnCancel(() -> doFilter(exchange, new CancelledException(), start)));
}

private Publisher<Void> doFilter(ServerWebExchange exchange, Mono<Void> call) {
long startTime = System.currentTimeMillis();
return call.doOnSuccess(done -> onSuccess(exchange, startTime))
.doOnError(cause -> onError(exchange, startTime, cause));
private void doFilter(ServerWebExchange exchange, Throwable cause, long start) {
ServerHttpResponse response = exchange.getResponse();
if (response.isCommitted() || cause instanceof CancelledException) {
logRequest(exchange, start, cause);
} else {
response.beforeCommit(() -> {
logRequest(exchange, start, cause);
return Mono.empty();
});
}
}

private void onSuccess(ServerWebExchange exchange, long startTime) {
private void logRequest(ServerWebExchange exchange, long startTime, Throwable cause) {
long elapsed = System.currentTimeMillis() - startTime;
ServerHttpRequest request = exchange.getRequest();
URI uri = request.getURI();
Level level = StringUtils.startsWith(uri.getPath(), ACTUATOR_PATH) ? Level.DEBUG : Level.INFO;
log.log(level, "{} {} {} in {} ms: {}", getClient(request), request.getMethod(), uri, elapsed,
exchange.getResponse().getStatusCode());
}
Object message = exchange.getResponse().getStatusCode();
Level level = Level.INFO;

private void onError(ServerWebExchange exchange, long startTime, Throwable t) {
long elapsed = System.currentTimeMillis() - startTime;
ServerHttpRequest request = exchange.getRequest();
log.warn("{} {} {} in {} ms: {}", getClient(request), request.getMethod(), request.getURI(), elapsed,
t.getMessage());
if (cause != null) {
level = Level.WARN;
message = cause.getMessage();
} else if (StringUtils.startsWith(uri.getPath(), ACTUATOR_PATH)) {
level = Level.DEBUG;
}

log.log(level, LOG_FORMAT, getClient(request), request.getMethod(), uri, elapsed, message);
}

private String getClient(ServerHttpRequest request) {
Expand All @@ -84,4 +95,12 @@ private String getClient(ServerHttpRequest request) {

return LOCALHOST;
}

private static class CancelledException extends RuntimeException {
private static final String MESSAGE = "cancelled";

CancelledException() {
super(MESSAGE);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
package com.hedera.mirror.monitor.config;

/*-
* ‌
* Hedera Mirror Node
* ​
* Copyright (C) 2019 - 2022 Hedera Hashgraph, LLC
* ​
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
* ‍
*/

import static org.assertj.core.api.Assertions.assertThat;

import java.io.StringWriter;
import java.time.Duration;
import org.apache.logging.log4j.Level;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.core.Logger;
import org.apache.logging.log4j.core.appender.WriterAppender;
import org.apache.logging.log4j.core.layout.PatternLayout;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.CsvSource;
import org.springframework.mock.http.server.reactive.MockServerHttpRequest;
import org.springframework.mock.web.server.MockServerWebExchange;
import reactor.core.publisher.Mono;
import reactor.test.StepVerifier;

class LoggingFilterTest {

private static final Duration WAIT = Duration.ofSeconds(5);

private LoggingFilter loggingFilter;
private StringWriter logOutput;
private WriterAppender appender;

@BeforeEach
void setup() {
loggingFilter = new LoggingFilter();
logOutput = new StringWriter();
appender = WriterAppender.newBuilder()
.setLayout(PatternLayout.newBuilder().withPattern("%p|%m%n").build())
.setName("stringAppender")
.setTarget(logOutput)
.build();
Logger logger = (Logger) LogManager.getLogger(loggingFilter);
logger.addAppender(appender);
logger.setLevel(Level.DEBUG);
appender.start();
}

@AfterEach
void cleanup() {
appender.stop();
Logger logger = (Logger) LogManager.getLogger(loggingFilter);
logger.removeAppender(appender);
}

@CsvSource({
"/, 200, INFO",
"/actuator/, 200, DEBUG"
})
@ParameterizedTest
void filterOnSuccess(String path, int code, String level) {
MockServerWebExchange exchange = MockServerWebExchange.from(MockServerHttpRequest.get(path).build());
exchange.getResponse().setRawStatusCode(code);

loggingFilter.filter(exchange, serverWebExchange -> Mono.defer(() -> exchange.getResponse().setComplete()))
.as(StepVerifier::create)
.expectComplete()
.verify(WAIT);

assertLog(Level.getLevel(level), "\\w+ GET " + path + " in \\d+ ms: " + code);
}

@Test
void filterXForwardedFor() {
String clientIp = "10.0.0.100";
MockServerWebExchange exchange = MockServerWebExchange.from(MockServerHttpRequest.get("/")
.header(LoggingFilter.X_FORWARDED_FOR, clientIp)
.build());
exchange.getResponse().setRawStatusCode(200);

loggingFilter.filter(exchange, serverWebExchange -> Mono.defer(() -> exchange.getResponse().setComplete()))
.as(StepVerifier::create)
.expectComplete()
.verify(WAIT);

assertLog(Level.INFO, clientIp + " GET / in \\d+ ms: 200");
}

@Test
void filterOnCancel() {
MockServerWebExchange exchange = MockServerWebExchange.from(MockServerHttpRequest.get("/").build());

loggingFilter.filter(exchange, serverWebExchange -> exchange.getResponse().setComplete())
.as(StepVerifier::create)
.thenCancel()
.verify(WAIT);

assertLog(Level.WARN, "\\w+ GET / in \\d+ ms: cancelled");
}

@Test
void filterOnError() {
MockServerWebExchange exchange = MockServerWebExchange.from(MockServerHttpRequest.get("/").build());
exchange.getResponse().setRawStatusCode(500);

var exception = new IllegalArgumentException("error");
loggingFilter.filter(exchange, serverWebExchange -> Mono.error(exception))
.onErrorResume((t) -> exchange.getResponse().setComplete())
.as(StepVerifier::create)
.expectComplete()
.verify(WAIT);

assertLog(Level.WARN, "\\w+ GET / in \\d+ ms: " + exception.getMessage());
}

private void assertLog(Level level, String pattern) {
assertThat(logOutput)
.asString()
.hasLineCount(1)
.contains(level.toString())
.containsPattern(pattern);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.core.Logger;
import org.apache.logging.log4j.core.appender.WriterAppender;
import org.apache.logging.log4j.core.layout.PatternLayout;
import org.assertj.core.api.InstanceOfAssertFactories;
import org.assertj.core.api.ObjectAssert;
import org.junit.jupiter.api.AfterEach;
Expand Down Expand Up @@ -77,6 +78,7 @@ void setup() {

logOutput = new StringWriter();
writerAppender = WriterAppender.newBuilder()
.setLayout(PatternLayout.newBuilder().withPattern("%p|%m%n").build())
.setName("stringAppender")
.setTarget(logOutput)
.build();
Expand Down Expand Up @@ -166,6 +168,7 @@ void onError(Throwable throwable, String status) {
assertThat(logOutput)
.asString()
.hasLineCount(1)
.contains("INFO")
.contains("Scenario " + SCENARIO_NAME + " published 0 transactions in")
.contains("Errors: {" + status + "=1}");
}
Expand All @@ -179,6 +182,7 @@ void statusSuccess() throws Exception {
assertThat(logOutput)
.asString()
.hasLineCount(1)
.contains("INFO")
.contains("Scenario " + SCENARIO_NAME + " published 1 transactions in")
.contains("Errors: {}");
}
Expand Down
Loading

0 comments on commit e5a826c

Please sign in to comment.