Easily connecting your Spring Boot applications to the Elastic Stack with Log4j2

Because if the Elastic Stack fails for some reason, we can still access the logs, stored in the files, for some time, before they are overwritten.

The settings for how long you want to keep the “backup” files depend on the situation and the application and is something to agree with the production guys.

In any case, the interesting bit is the Gelf appender.

With that one, we are sending the logs directly to the Elastic Stack and defining the fields we want to send.

The appender has some default fields that are sent if you don’t specify anything, check the documentation for details, but I always build on top of that.

In this case I’m sending:· Timestamp: essential· Log level: essential· Simple class name producing the log: I prefer it over the lengthy fully qualified name· Class name: I send it, just in case, but I’ve never used it so I have it filtered out in Logstash.

· Hostname and simple hostname: The same as with class name, simple name is usually good enough and is shorter.

· Application name so I can filter easily and also so I can have different settings in Elastic Stack depending on the application· Thanks to includeFullMdc=”true”, all the fields added to Log4J Mapped Diagnostic Context (MDC) will be added as fields to the log.

This feature is very useful as, for example, it means that the token-per-request ID described in the entry “Spring Boot: Setting a unique ID per request” is added automatically, isn’t it cool?Configuration and environment variables neededIf you check the production file, you’ll see that there baseDir, the directory where the log files are stored is set to the logs directory at the home of the user running the application ($${env:HOME}/logs).

Change that if you want to store the logs elsewhere.

Also, the address of the machine where Logstash is defined is passed through an environment variable LOGSTASH_PROXY because we want to be able to change the Logstash server on different environments and we don’t want to redeploy all the applications when the Logstash server address changes, but it is up to you.

The applicationName variable is something that I have to unfortunately write explicitely in the configuration file, because Log4j2 is configured before Spring Boot has finished starting and I have not found a way to get Log4j2 to use a JVM property.

If someone can find a better way to do it, I’m all ears.

Logstash configurationIn order to be able to accept requests from the library, you have to define a Logstash pipeline with a configuration similar to this one:gelf-pipeline.

confWe are using TCP instead of UDP because this way we can load-balance the Logstash automatically with Fabio and it does not support UDP yet, but if you don’t have that restriction, UDP is fine.

After that, we are removing some of the fields that we never use (source_host, className, facility), so they don’t consume space in the indexes, and then set the application name just in case it is not being sent, because we need it in the next step.

Finally, we define the index based on the application name and the date.

Why?.Because ElasticSearch index management, usually through the curator utility, is based in patterns against the names, and this way we can, for example, keep the logs of one application for 30 days and the logs of another application for just a week.

Elastic Stack introduced some index lifecyle management options in 6.

6, but I still find it easier to manage them through the curator.

As the comments in the configuration file tell you, if you uncomment the stdout output, you will be able to see in the console where you launched Logstash what it is received and then sent to elastic (or better yet, comment the elasticsearch section until you are comfortable with what you are seeing).

That’s pretty useful if you want to test if it really works by installing Logstash locally and sending the logs there so you can see what gets there in the console.

Extra bonusAs mentioned earlier, with the configuration we have shown the Gelf library sends all the MDC fields to be indexed, and that’s something you can take advantage of.

For example, we are using it to add some extra fields like “operation called”, “parameters”, “Authenticated principal”, “client ip” through some AOP magic (see Aspect Oriented Programming with Spring for reference).

But basically, the idea is to have some code like this one to add extra fields automatically:We use this technique just inside Aspects, not to clutter the code, to add some common fields, but it is still possible to parse the message string in Logstash and extract the fields there using the typical grok filter ( see the Grok filter plugin for reference ).

You can see here a sample log message, as it is displayed in Kibana when looking at the details:Sample log message details as seen in KibanaConclusionWith these settings, one should be able to develop applications that, when deployed, send the logs automatically to the Elastic Stack without wasting much space due to old log files and with messages that are easier to index and filter.

Don’t forget that what I have shown is one way of doing it, the one that serves us well, so there are lots of things that can be customized to suit a different set of requirements.

In any case, remember not to log everything but just what is necessary, especially in production ;).

Happy coding!.

. More details

Leave a Reply