'Logging to Amazon S3

Has anyone built, or does anyone know how to build, a logging framework that deposits the log files on Amazon S3?

We're building a SaaS app, and, naturally, we plan to have a zillion servers and customers. I'd like to simplify the entire logging structure. At the moment we use SLF4j for logging and Logback as the actual logging implementation.

I'm thinking that we could drop in another implementation that would accumulate log messages in memory, compress them, and then upload them to S3. If the S3 connection were down or slow, the implementation would queue the files up on disk. Kind of like Facebook's Scribe.

My guess is that it would take me four or five days to write and test this. If there's another implementation out there, I'd love to know about it.



Solution 1:[1]

There is a plugin for fluentd that stores files to s3. (Fluentd is a nice "log file collector")

Read more about it here: https://docs.fluentd.org/output/s3

If the s3 connection is down or slow it will buffer the output for you.

Solution 2:[2]

You can try to write a custom appender for logback or log4j2 and use this appender in respective configuration.

This way, you don't have to write an entire logging framework, but only the part you need and use the rest from a working framework.

There are also a few of them on github. For instance: shuwada/logback-s3

Solution 3:[3]

I was searching google for the same question. But apparently I am slightly better off. I know how to log to S3. There is no out-of-the-box solution.

I suggest something like fuse for s3 to mount your syslog: https://github.com/s3fs-fuse/s3fs-fuse

Now all you need is to mount it also in your log-parser system which can be any of the off-the-shelf system that can read logs off of a directory.

This is what I came up with.

What I am still searching for before implementing is the performance issue of such logging to S3 since AWS has it's own issues. There was a discussion on setting block/file size to improve performance and lowering read/write cost. Hope it helps another lost soul.

Solution 4:[4]

I was looking for something similar. I'm using winston.js for logging purposes already but I found this plugin that let's you save your logs to AWS S3. (Winston-S3) I haven't tried it yet but I will do shortly.

It shouldn't be to difficult to show those logs in some admin page.

Solution 5:[5]

You could log to Datadog and then configure log archiving. For example, you could have containers output logs to stdout as JSON and then have the Datadog agent or fluentd/fluentbit forward logs those logs to Datadog. Datadog would automatically parse and index the logs for rapid searching. Different logging formats also work if you write your own parsing rules. At the end of retention period if you have log archiving set it will automatically upload them to S3 for you.

The disadvantage is being locked into Datadog and the price. The advantage is you can easily re-hydrate the logs back into Datadog for fast searching and you don't have to worry about maintaing a self-hosted solution.

If you want an open-source alternative you could try out Loki. It has a S3 storage backend.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Xavier Guihot
Solution 3 Gunith D
Solution 4 Eric Dela Cruz
Solution 5 Almenon