Home Centralized logs with Loki and Vector
Post
Cancel

Centralized logs with Loki and Vector

Context

In this post we will be using the Loki and Vector to centralize logs from a docker-compose application.

In our previous post we already saw how to run a sidecar pattern in docker-compose, so we will be using the same pattern here, but besides sharing the network namespace, we will also share the log files via a shared volume approach.

Services

In our example we will be using:

  • Loki: A log aggregation system

  • Vector: A log aggregator that can be used to retrieve and push logs using multiple data sources and data sinks.

  • Fake Service: A simple service that will be used to generate logs

docker-compose.yml

So let’s take a look at the docker-compose.yml file.

First let’s declare the services:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
version: '3.7'
services:
  loki:
    image: grafana/loki:2.2.1
    command: -config.file=/etc/loki/loki-config.yaml
    ports:
      - 3100:3100
    volumes:
      - ./loki:/etc/loki
      - ./loki-data:/loki-data
  fakeservice:
    image: nicholasjackson/fake-service:v0.24.2
    environment:
      LOG_OUTPUT: /var/log/web.log
      LOG_FORMAT: json
    volumes:
      - fakeservice:/var/log
  vector:
    image: timberio/vector:latest-alpine
    command: --config /etc/vector/vector.toml
    volumes:
      - ./vector:/etc/vector

Ok so we are referencing some host to container mapped volumes for the configuration of both vector and loki, and we are also referencing a volume for the fakeservice logs.

Ok so now let’s set vector in a sidecar pattern, so we will be sharing the network namespace of the fakeservice container, and we will also be sharing the log files via a volume.

We do this by adding network_mode: service for vector to be on the same namespace as fakeservice and we also use the named volume approach from fakeservice to have the same path.

We will also need to declare the named volume on our volumes key for docker-compose.

This is how our final docker-compose should look like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
version: '3.7'
services:
  loki:
    image: grafana/loki:2.2.1
    command: -config.file=/etc/loki/loki-config.yaml
    ports:
      - 3100:3100
    volumes:
      - ./loki:/etc/loki
      - ./loki-data:/loki-data
  fakeservice:
    image: nicholasjackson/fake-service:v0.24.2
    environment:
      LOG_OUTPUT: /var/log/web.log
      LOG_FORMAT: json
    volumes:
      - fakeservice:/var/log
  vector:
    image: timberio/vector:latest-alpine
    command: --config /etc/vector/vector.toml
    volumes:
      - ./vector:/etc/vector
      - fakeservice:/var/log
    network_mode: "service:fakeservice"
volumes:
  fakeservice:

But we are still missing the configuration for both loki and vector, so let’s take a look at that.

Loki configuration

For the loki configuration we will base ourselves on the local loki configuration example provided by Grafana labs available under the loki repository

We will only change the listener address to allow access from everywhere for simplicity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#Default values from https://github.com/grafana/loki/blob/master/cmd/loki/loki-local-config.yaml
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

ingester:
  lifecycler:
    address: 0.0.0.0
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h       # Any chunk not receiving new logs in this time will be flushed
  max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h
  chunk_target_size: 1048576  # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
  chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
  max_transfer_retries: 0     # Chunk transfers disabled

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /tmp/loki/boltdb-shipper-active
    cache_location: /tmp/loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: filesystem
  filesystem:
    directory: /tmp/loki/chunks

compactor:
  working_directory: /tmp/loki/boltdb-shipper-compactor
  shared_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

ruler:
  storage:
    type: local
    local:
      directory: /tmp/loki/rules
  rule_path: /tmp/loki/rules-temp
  alertmanager_url: http://localhost:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true

The important part for our use case is the storage_config, where we are using the filesystem to store the logs, and the schema_config where we are using the boltdb-shipper.

Also the ingester.lifecycle.listener_address is set to 0.0.0.0

And that’s it for our loki configuration in this example.

Vector configuration

For the vector configuration we will be using:

  • Source
    • Which will be a file source in our case as we will be reading the logs from the shared volume
  • Transform
    • Which will be a remap transform to parse the json logs (this particular example used to be typed json_parser but it was deprecated)
  • Sink
    • Which will be a loki sink to send the logs to loki
1
2
3
4
5
[sources.logs]
type = "file"
ignore_older_secs = 600
include = ["/var/log/web.log"]
read_from = "beginning"

Because we are sharing the volume, and we set the LOG_OUTPUT to /var/log/web.log, we can just reference the path directly.

Our transform will be a remap transform, which will parse the json logs.

1
2
3
4
[transforms.logs_json]
inputs       = ["logs"]
type         = "remap"
source       = ". = parse_json!(.message)"

And finally our sink will be a loki sink, which will send the logs to loki.

1
2
3
4
5
6
[sinks.loki]
type = "loki"
inputs = ["logs_json"]
endpoint = "http://loki:3100"
encoding.codec = "json"
labels = {app="web", env="production"}

We are labeling the logs with app and env labels, which will be useful for querying the logs later using loki api.

And that’s it for our vector configuration in this example.

Running the example and caveats

Now that we have our docker-compose and our configuration files, we can run the example.

1
docker-compose up -d

Given we are starting all services without any dependency, if we check our vector service logs we will see that it will fail to connect to loki, as loki is not ready yet.

1
docker-compose logs vector
vector_1       | 2023-01-29T14:44:54.061551Z ERROR vector::topology::builder: msg="Healthcheck failed." error=A non-successful status returned: 503 Service Unavailable component_kind="sink" component_type="loki" component_id=loki component_name=loki

But vector should be able to retry the connection eventually, but not for the topology builder, so we need to either restart the vector service or in our case we can add the healthcheck=false to our loki.sink vector configuration.

1
2
3
4
5
6
7
[sinks.loki]
type = "loki"
inputs = ["logs_json"]
endpoint = "http://loki:3100"
encoding.codec = "json"
labels = {app="web", env="production"}
healthcheck = false

So, let’s restart all our services again and proceed with having healthcheck false, otherwise vector won’t retry the connection at boot time.

1
docker-compose down && docker-compose up -d

So now if we check vector logs, we shouldn’t see the healthcheck validation but rather that we have healthcheck disabled.

1
docker-compose logs vector
vector_1       | 2023-01-29T15:10:43.649612Z  INFO vector::topology::running: Running healthchecks.
vector_1       | 2023-01-29T15:10:43.649740Z  INFO vector::topology::builder: Healthcheck disabled.

So, now we need to generate some logs, so we can check if everything is working as expected.

1
docker-compose exec fakeservice curl localhost:9090

We should get a response in the lines of:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "name": "Service",
  "uri": "/",
  "type": "HTTP",
  "ip_addresses": [
    "192.168.80.3"
  ],
  "start_time": "2023-01-29T14:47:31.464335",
  "end_time": "2023-01-29T14:47:31.465316",
  "duration": "980.833µs",
  "body": "Hello World",
  "code": 200
}

So we can retry the curl command above a couple of times, and then we can check our loki logs using the loki api.

Given we are setting the labels app and env, we can query the logs using those labels.

1
curl localhost:3100/loki/api/v1/query --data-urlencode 'query={app="web",env="production"}' | jq

And we should be getting a response from loki with the following sample output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
{
  "status": "success",
  "data": {
    "resultType": "streams",
    "result": [
      {
        "stream": {
          "app": "web",
          "env": "production"
        },
        "values": [
          [
            "1675005378303990220",
            "{\"@level\":\"info\",\"@message\":\"Finished handling request\",\"@timestamp\":\"2023-01-29T15:16:16.312688Z\",\"duration\":78750}"
          ],
          [
            "1675005378303987762",
            "{\"@level\":\"info\",\"@message\":\"Handle inbound request\",\"@timestamp\":\"2023-01-29T15:16:16.312632Z\",\"request\":\"GET / HTTP/1.1\\nHost: localhost:9090\\nuser-agent: curl/7.83.1\\naccept: */*\"}"
          ],
          [
            "1675005378303984595",
            "{\"@level\":\"info\",\"@message\":\"Finished handling request\",\"@timestamp\":\"2023-01-29T15:16:15.410187Z\",\"duration\":236708}"
          ],
          [
            "1675005378303981095",
            "{\"@level\":\"info\",\"@message\":\"Handle inbound request\",\"@timestamp\":\"2023-01-29T15:16:15.409971Z\",\"request\":\"GET / HTTP/1.1\\nHost: localhost:9090\\nuser-agent: curl/7.83.1\\naccept: */*\"}"
          ],
          [
            "1675005378303975554",
            "{\"@level\":\"info\",\"@message\":\"Finished handling request\",\"@timestamp\":\"2023-01-29T15:16:14.335484Z\",\"duration\":117250}"
          ],
          [
            "1675005378303918637",
            "{\"@level\":\"info\",\"@message\":\"Handle inbound request\",\"@timestamp\":\"2023-01-29T15:16:14.335391Z\",\"request\":\"GET / HTTP/1.1\\nHost: localhost:9090\\nuser-agent: curl/7.83.1\\naccept: */*\"}"
          ]
        ]
      }
    ],
    "stats": {
      "summary": {
        "bytesProcessedPerSecond": 196694,
        "linesProcessedPerSecond": 2625,
        "totalBytesProcessed": 899,
        "totalLinesProcessed": 12,
        "execTime": 0.004570542
      },
      "store": {
        "totalChunksRef": 0,
        "totalChunksDownloaded": 0,
        "chunksDownloadTime": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      },
      "ingester": {
        "totalReached": 1,
        "totalChunksMatched": 1,
        "totalBatches": 1,
        "totalLinesSent": 6,
        "headChunkBytes": 899,
        "headChunkLines": 12,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      }
    }
  }
}

If loki still has no logs, we should see the following json output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
  "status": "success",
  "data": {
    "resultType": "streams",
    "result": [],
    "stats": {
      "summary": {
        "bytesProcessedPerSecond": 0,
        "linesProcessedPerSecond": 0,
        "totalBytesProcessed": 0,
        "totalLinesProcessed": 0,
        "execTime": 0.005716292
      },
      "store": {
        "totalChunksRef": 0,
        "totalChunksDownloaded": 0,
        "chunksDownloadTime": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      },
      "ingester": {
        "totalReached": 1,
        "totalChunksMatched": 1,
        "totalBatches": 0,
        "totalLinesSent": 0,
        "headChunkBytes": 0,
        "headChunkLines": 0,
        "decompressedBytes": 0,
        "decompressedLines": 0,
        "compressedBytes": 0,
        "totalDuplicates": 0
      }
    }
  }
}

If that happens its probably because the buffer on vector hasn’t flushed yet, so we can force more logs to be generated by running the docker-compose command again.

1
docker-compose exec fake-service curl localhost:9090

Conclusion

And that’s it, we have a simple logging pipeline setup using vector and loki.

This approach to using a file based source is helpful when you don’t have the ability to change the logging driver, or you don’t want to use a network based logging driver from the service to lokis http endpoint.

I’ve used in Nomad to scrape allocations logs and send them to loki, segregating each job with its own loki and grafana. But that is a post for another day.

As usual, the code for this post is available on github under the vector-loki-docker repo.

This post is licensed under CC BY 4.0 by the author.