Centralize logs from Docker applications
This article aims at showing how we can centralize logs from a Docker application in a database where we can then query them.
This article is built around an example where our application consists of an nginx instance, an Elasticsearch database, and Kibana to render beautiful graphs and diagrams. The code of the example is available on github.
We need to collect and transport our logs as a data flow from a distributed system to a centralized remote location. That way, we can get an aggregate vision of the system in near real time.
The logging system is plugged at the container level because the application should be loosely coupled with the logging system. Depending on the environment (development, pre-production, production) we might not send logs to the same system: a file for development, Elasticsearch for pre-production, Elasticsearch and HDFS for production.
Architecture
Choosing our middleware
We need a tool to extract the logs from the Docker container and push them in Elasticsearch. In order to do that, we can choose between several tools like Logstash or Fluentd.
Logstash is built by Elastic and is well integrated with Elasticsearch and Kibana. It has lots of plugins. Fluentd describes itself as an open source data collector for unified logging layer. Docker provides a driver to push logs directly into Fluentd. Fluentd also has a lot of plugins like one to connect to Elasticsearch.
I've chosen Fluentd because Docker pushes it, and Kubernetes (an important Docker project) uses it. Furthermore, in our example, Fluentd Elasticsearch's plugin plays well with Kibana.
Infrastructure
We can use two types of infrastructure: either a classic architecture with servers or a cloud-based one. I've chosen the classic one for simplicity's sake.
Therefore two servers are needed, one for our application and Fluentd, and one for our Elasticsearch database and Kibana.
Process
The process can be described in 6 steps
- Users connect to our application (nginx) and this generates logs
- Our containerized application sends its logs to stdout and stderr
- Docker intercepts logs from the container and uses its native Fluentd output driver to send them to the Fluentd container running locally
- Fluentd parses and structures logs
- Structured data are sent to Elasticsearch in batches, we might have to wait a minute or two for the data to arrive in Elasticsearch. We can parameterize this behavior with the Buffer plugins
- Data is exposed to administrators through graphs and diagrams with Kibana
Application
As the application is a simple nginx, I've packaged a new image since the official one uses a custom logger that is not appropriate for our purpose. We can run the app using Docker-compose up
with the following Configuration
$ cat ./Docker-compose.yml
nginx:
image: thibautgery/Docker-nginx
ports:
- 8080:80
Fluentd
Fluentd is a middleware to collect logs which flow through steps and are identified with tags. Here is a simple configuration with two steps to receive logs through HTTP and print them to stdout:
$ cat ./fluentd/fluentd.sample
<source>
@type http
port 8888
</source>
<match myapp.access>
@type stdout
</match>
In this sample, each step defines how the data is processed:
- The first step defines how to capture the data, here on port 8888 using HTTP.
- The second step defines how to output the data, in this case by printing it to stdout.
The data is streamed through Fluentd. Each chunk of data is tagged with a label which is used to route the data between the different steps.
In the previous example, the tag is specified after the key match
: app.access
. The tag of the incoming data is the URL of the request. For example running curl http://localhost:8888/myapp.access?json={"event":"data"}
outputs {"event":"data"}
to stdout.
This slideshare explains the basics of Fluentd.
Each step is a plugin. There are more than 150 plugins divided into 6 categories. The most important ones are:
- Input plugins to accept and parse data
- Output plugins to send the data to external systems, in our example Elasticsearch
Configuration
Docker pushes its logs to Fluentd
First of all, the Fluentd agent can run anywhere, but for simplicity's sake we run it on the same node as the application. The official image can be found on the Docker hub.
We need to configure the Fluentd agent :
$ cat ./conf/Fluentd
<source>
@type forward
</source>
<match nginx.Docker.**>
@type stdout
</match>
Fluentd accepts connections on the 24224 port and prints logs on stdout thanks to two default plugins in_forward and out_stdout
We can run Fluentd with Docker-compose -f Docker-compose-fluentd.yml up
$ cat ./Docker-compose-fluentd.yml
Fluentd:
image: fluent/Fluentd
restart: always
ports:
- 24224:24224
volumes:
- ./conf:/Fluentd/etc
The default logging format option for Docker is json-file. We can use the log-driver option to specify Fluentd. By default it connects to localhost
on the 24224
port.
$ cat ./Docker-compose.yml
nginx:
image: thibautgery/Docker-nginx
ports:
- 8080:80
log_driver: Fluentd
log_opt:
Fluentd-tag: "nginx.Docker.{{.Name }}"
The Docker driver uses a default tag for Fluentd: Docker._container-id_
. We override it to be nginx.Docker._container-name_
with the log_opt
, Fluentd-tag: "nginx.Docker.{{.Name }}"
. The tag in the Docker driver must match the one in Fluentd. We should be able to see the Nginx logs in Fluentd container log.
Right now, our system is useless. We need to send logs to a distant database, Elasticsearch.
Fluentd pushes its log to Elasticsearch
Fluentd needs the fluent-plugin-elasticsearch in order to send data to Elasticsearch. I have packaged the image here
We need to update the Fluentd agent configuration
$ cat ./conf/Fluentd
<source>
@type forward
</source>
<match nginx.Docker.**>
type elasticsearch
hosts http://elasticsearch.host.com:9200
logstash_format true
</match>
Don't forget to change the hosts to point to the Elasticsearch instance.
The logstash_format true
configuration is meant to write data into ElasticSearch in a Logstash compliant format, hence allowing the leveraging of Kibana.
We can run Fluentd with:
$ cat ./Docker-compose-Fluentd.yml
Fluentd:
image: thibautgery/fluent.d-es
ports:
- 24224:24224
volumes:
- ./conf:/Fluentd/etc
Then we can run the application and query it with our favorite browser to fetch some lines form Elasticsearch in the Logstash index. Since Fluentd buffers the data before sending them in batches, we might have to wait a minute or two.
Unfortunately, only the Docker metadata are sent (like the Docker name, label, id...) but the log
field contains the raw log lines of nginx and it is not structured. For example, we cannot query all failed HTTP requests (status code >= 400)
This line of log need to be parsed.
Structure the application logs
Fluentd needs the fluent-plugin-parser in order to format a specific field a second time. I have packaged the image with it here
We need to update the configuration :
$ cat ./conf/Fluentd
<source>
@type forward
</source>
<match nginx.docker.**>
type parser
key_name log
format nginx
remove_prefix nginx
reserve_data yes
</match>
<match docker.**>
type elasticsearch
hosts http://elasticsearch.host.com:9200
logstash_format true
</match>
The second block of configuration :
- uses the plugin parser
- parses the
log
field - parses it using the pre-build Regex of Nginx
- removes the prefix
nginx
on the tagnginx.docker.**
- keeps the previous informations in the message and emits it as
docker.**
Here we use the tag concept to route the data through the correct steps:
- the data arrives with the tag set by docker driver :
nginx.docker._container-name_
- Fluentd sends it to the second step
- the tag is modified to
docker._container-name_
- data goes through the third step
- data is sent to Elasticsearch
Here is the structured data we can now used to create diagrams :
We can then run the application and query it with our favorite browser to see the data correctly formatted in Kibana.
Run it
Our system collects logs from our application and send them to Elasticsearch. The Docker engine requires to have Fluentd up and running to start our container.
Even if Fluentd dies, our containers using Fluentd continues to work properly. Furthermore, if Fluentd stops for short periods of time, we do not lose any piece of log because the Docker engine buffers unsent messages, so that they can be sent later when Fluentd is back online.
Finally, since Docker 1.9 we can show labels and environment variables with the logging driver of Docker. In our example, we added: service: nginx
and it shows up in Kibana.
We are now able to create graphs such as this one:
Conclusion
So far, we have seen how to collect and structure logs from Docker to push them in Elasticsearch. We can easily change the Elasticsearch plugin to the Mongo or HDFS plugin and push logs to the database of our choice. We can also add an alerting system like Zabbix
We can add nodes to our infrastructure and add several containers in one node. Keep in mind that this article doesn't cover everything. For instance we have not answered the following questions :
- how to monitor the Docker running Fluentd ?
- how to keep high availability in the monitoring system ?
Run everything in two commands with the Ansible scripted repository