Solvedgoaccess Support structured log formats such as JSON
βοΈAccepted Answer
Native JSON support has been added. Feel free to build from development to test this out. It will be pushed out in the upcoming release (v1.4.4). In addition, I added a quick pre-defined format for Caddy, e.g.,
# goaccess accesss.log --log-format=CADDY -o report.html
or from a remote server:
# ssh -n root@server 'tail -f access.log' | goaccess - --log-format=CADDY -o report.html --real-time-html
Note: SSH requires -n
so GoAccess can read from stdin. Also, make sure to use SSH keys for authentication as it won't work if a passphrase is required. Also, please make sure to build using --with-openssl
so that it can use libssl to get the cipher and tls version out in the report.
The CADDY
format is composed of these fields:
{
"ts": "%x.%^",
"request": {
"remote_addr": "%h:%^",
"proto": "%H",
"method": "%m",
"host": "%v",
"uri": "%U",
"headers": {
"User-Agent": ["%u"]
},
"tls": {
"cipher_suite": "%k",
"proto": "%K"
}
},
"duration": "%T",
"size": "%b",
"status": "%s",
"resp_headers": {
"Content-Type": ["%M"]
}
}
Thanks again for the all the info provided.
Closing this, feel free to reopen it if needed.
Other Answers:
@ArpitKotecha It's certainly in the pipeline. Stay tuned!
@rumpelsepp You certainly can, I saw this post https://alexmv12.xyz/blog/goaccess_caddy/, so you could derive a one-liner:
example.com:X 127.0.0.1 [30/Mar/2020:19:38:34 CST] "GET / HTTP/2.0" 200 2326 "http://example.com/bot.html" "curl/7.64.1" 1.4711e-05
jq -j '.ts |= strftime("%d/%b/%Y:%H:%M:%S %Z") | .request.remote_addr |= .[:-6] | .request.host, ":X ", .request.remote_addr, " [", .ts ,"]", " \"", .request.method, " ", .request.uri, " ", .request.proto, "\" ", .status, " ", .size, " \"", .request.headers.Referer[0] // "-","\"", " \"", .request.headers."User-Agent"[0] // "-","\"", " ", .latency, "\n"' access.log | goaccess - --log-format=VCOMBINED
or to capture the latency:
jq -j '.ts |= strftime("%d/%b/%Y:%H:%M:%S %Z") | .request.remote_addr |= .[:-6] | .request.host, ":X ", .request.remote_addr, " [", .ts ,"]", " \"", .request.method, " ", .request.uri, " ", .request.proto, "\" ", .status, " ", .size, " \"", .request.headers.Referer[0] // "-","\"", " \"", .request.headers."User-Agent"[0] // "-","\"", " ", .latency, "\n"' ../logs/caddy.log | ./goaccess - --log-format='%v:%^ %h %^[%d:%t %^] "%r" %s %b "%R" "%u" %T' --date-format=%d/%b/%Y --time-format=%T
I'm getting closer to the point where I can implement this natively.
As an interim work around whilst this works its way along the pipeline, i have created a python script that will allow Caddy JSON data logs to be streamed to GoAccess in real-time. Details and the code can be found on my github site CaddyGoAccessDataLoggerConverter.
It may be that this approach can be adapted for other providers of structured logs too.
Matt, thanks for pointing this out. I do agree that having structured logs seems pretty useful, especially for debugging purposes. I think having an easier way to parse such logs in goaccess would be awesome and I can certainly start looking into this.
I guess a quick way of having this implemented would be to use a JSON parser but I'm positive that would impact the performance of the current parser. One of the main goals of goaccess is the ability to parse a log as fast as possible while being flexible enough to process virtually any log format. To accomplish this it performs a lot of pointer arithmetic while sacrificing the easiness of writing a custom format. Nonetheless, I certainly believe that there's room for improvement in this area and I plan to work on it.
As far as JSON objects not necessarily being ordered, is there a case where Caddy would log requests in different formats in the same access log? For instance, you mentioned that header fields vary, TLS may or may not be used, etc. Does that mean that the same access log may or may not contain certain fields and will depend on the request? I ask because that would certainly make things a bit more tricky for the current parser. If that is the case, feel free to post a few lines straight from your access log and I may be able to find a pattern on it.
For what is worth, the following format should work for the sample line you posted above (assumes there's a space between the key and the value on most cases, "key": "val"
),
{ "level": "info", "ts": 1585597114.7687502, "logger": "http.log.access", "msg": "handled request", "request": { "method": "GET", "uri": "/", "proto": "HTTP/2.0", "remote_addr": "127.0.0.1:50876", "host": "example.com", "headers": { "User-Agent": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 Edg/81.0.416.72" ], "Accept": [ "*/*" ] }, "tls": { "resumed": false, "version": 771, "ciphersuite": 49196, "proto": "h2", "proto_mutual": true, "server_name": "example.com" } }, "latency": 0.000014711, "size": 2326, "status": 200, "resp_headers": { "Server": [ "Caddy" ], "Content-Type": ["text/html"] } }
goaccess access.log --log-format='%^:%^: %x.%^ %^{%^: "%m"%^: "%U"%^: "%H"%^: "%h:%^"%^: "%v"%^[ "%u"%^]%^}%^}%^}%^: %T,%^: %b,%^: %s, %^' --date-format=%s --time-format=%s
About the name, the go comes from my initials
If I understand correctly, GoAccess needs to extract certain fields from mostly-arbitrary lines of data, in order to do its job.
Structured logs make this very easy by using a formally-structured encoding such as JSON or logfmt to key fields to values. Caddy adopts this approach since it is extremely useful for log aggregation and processing. In fact, Common Log Format and Combined and other antiquated standard formats don't provide enough information to be useful in advanced server applications.
An example access log from Caddy contains much more information than most traditional log formats such as CLF or Combined:
(In reality, log entries do not have newlines; I have formatted this one for readability.)
Logs that are emitted in this manner (instead of a custom format) also benefit from performance improvements; these logs require no allocations.
Unfortunately, it seems a bit tedious (impossible?) to get GoAccess to work with structured logs. For example, I do not think there is a way to skip until a next substring -- and even that only works if we know the order of the fields. (In Caddy's case, the field order is fairly deterministic since the JSON encoder is append-only, but JSON objects in general are not necessarily ordered.)
As you can see, all the required information (and much more!) is available in that log emission, but I don't know how to make GoAccess work with it: header fields vary, TLS may or may not be used, etc. The formats of durations and timestamps can even be customized, so that shouldn't be a problem for compatibility.
The main problem is that there doesn't seem to be a way to parse a structured format such as JSON and then specify fields from which to extract the needed information.
For example:
Does that make sense?
I am new to this project, and was about to jump in and contribute JSON support, but then realized that GoAccess is not written in Goπ
.