Solvedgoaccess Support structured log formats such as JSON

If I understand correctly, GoAccess needs to extract certain fields from mostly-arbitrary lines of data, in order to do its job.

Structured logs make this very easy by using a formally-structured encoding such as JSON or logfmt to key fields to values. Caddy adopts this approach since it is extremely useful for log aggregation and processing. In fact, Common Log Format and Combined and other antiquated standard formats don't provide enough information to be useful in advanced server applications.

An example access log from Caddy contains much more information than most traditional log formats such as CLF or Combined:

{
	"level": "info",
	"ts": 1585597114.7687502,
	"logger": "http.log.access",
	"msg": "handled request",
	"request": {
		"method": "GET",
		"uri": "/",
		"proto": "HTTP/2.0",
		"remote_addr": "127.0.0.1:50876",
		"host": "example.com",
		"headers": {
			"User-Agent": [
				"curl/7.64.1"
			],
			"Accept": [
				"*/*"
			]
		},
		"tls": {
			"resumed": false,
			"version": 771,
			"ciphersuite": 49196,
			"proto": "h2",
			"proto_mutual": true,
			"server_name": "example.com"
		}
	},
	"latency": 0.000014711,
	"size": 2326,
	"status": 200,
	"resp_headers": {
		"Server": [
			"Caddy"
		],
		"Content-Type": ["text/html"]
	}
}

(In reality, log entries do not have newlines; I have formatted this one for readability.)

Logs that are emitted in this manner (instead of a custom format) also benefit from performance improvements; these logs require no allocations.

Unfortunately, it seems a bit tedious (impossible?) to get GoAccess to work with structured logs. For example, I do not think there is a way to skip until a next substring -- and even that only works if we know the order of the fields. (In Caddy's case, the field order is fairly deterministic since the JSON encoder is append-only, but JSON objects in general are not necessarily ordered.)

As you can see, all the required information (and much more!) is available in that log emission, but I don't know how to make GoAccess work with it: header fields vary, TLS may or may not be used, etc. The formats of durations and timestamps can even be customized, so that shouldn't be a problem for compatibility.

The main problem is that there doesn't seem to be a way to parse a structured format such as JSON and then specify fields from which to extract the needed information.

For example:

%x: ts
%v: request>host
%h: request>remote_addr
%U: request>uri
%H: request>proto
%T: latency
%R: request>headers>Referer
%s: status
%b: size
...

Does that make sense?

I am new to this project, and was about to jump in and contribute JSON support, but then realized that GoAccess is not written in Go πŸ˜… .

29 Answers

βœ”οΈAccepted Answer

Native JSON support has been added. Feel free to build from development to test this out. It will be pushed out in the upcoming release (v1.4.4). In addition, I added a quick pre-defined format for Caddy, e.g.,

# goaccess accesss.log --log-format=CADDY -o report.html

or from a remote server:

# ssh -n root@server 'tail -f access.log' | goaccess - --log-format=CADDY -o report.html --real-time-html

Note: SSH requires -n so GoAccess can read from stdin. Also, make sure to use SSH keys for authentication as it won't work if a passphrase is required. Also, please make sure to build using --with-openssl so that it can use libssl to get the cipher and tls version out in the report.

The CADDY format is composed of these fields:

{
    "ts": "%x.%^",
    "request": {
        "remote_addr": "%h:%^",
        "proto": "%H",
        "method": "%m",
        "host": "%v",
        "uri": "%U",
        "headers": {
            "User-Agent": ["%u"]
        },
        "tls": {
            "cipher_suite": "%k",
            "proto": "%K"
        }
    },
    "duration": "%T",
    "size": "%b",
    "status": "%s",
    "resp_headers": {
        "Content-Type": ["%M"]
    }
}

Thanks again for the all the info provided.

Closing this, feel free to reopen it if needed.

Other Answers:

@ArpitKotecha It's certainly in the pipeline. Stay tuned!

@rumpelsepp You certainly can, I saw this post https://alexmv12.xyz/blog/goaccess_caddy/, so you could derive a one-liner:

example.com:X 127.0.0.1 [30/Mar/2020:19:38:34 CST] "GET / HTTP/2.0" 200 2326 "http://example.com/bot.html" "curl/7.64.1" 1.4711e-05

jq -j '.ts |= strftime("%d/%b/%Y:%H:%M:%S %Z") | .request.remote_addr |= .[:-6]  | .request.host, ":X ", .request.remote_addr, " [", .ts ,"]", " \"", .request.method, " ", .request.uri, " ", .request.proto, "\" ", .status, " ", .size, " \"", .request.headers.Referer[0] // "-","\"", " \"", .request.headers."User-Agent"[0] // "-","\"", " ", .latency, "\n"' access.log | goaccess - --log-format=VCOMBINED

or to capture the latency:

jq -j '.ts |= strftime("%d/%b/%Y:%H:%M:%S %Z") | .request.remote_addr |= .[:-6]  | .request.host, ":X ", .request.remote_addr, " [", .ts ,"]", " \"", .request.method, " ", .request.uri, " ", .request.proto, "\" ", .status, " ", .size, " \"", .request.headers.Referer[0] // "-","\"", " \"", .request.headers."User-Agent"[0] // "-","\"", " ", .latency, "\n"' ../logs/caddy.log | ./goaccess - --log-format='%v:%^ %h %^[%d:%t %^] "%r" %s %b "%R" "%u" %T' --date-format=%d/%b/%Y --time-format=%T

I'm getting closer to the point where I can implement this natively.

As an interim work around whilst this works its way along the pipeline, i have created a python script that will allow Caddy JSON data logs to be streamed to GoAccess in real-time. Details and the code can be found on my github site CaddyGoAccessDataLoggerConverter.

It may be that this approach can be adapted for other providers of structured logs too.

Matt, thanks for pointing this out. I do agree that having structured logs seems pretty useful, especially for debugging purposes. I think having an easier way to parse such logs in goaccess would be awesome and I can certainly start looking into this.

I guess a quick way of having this implemented would be to use a JSON parser but I'm positive that would impact the performance of the current parser. One of the main goals of goaccess is the ability to parse a log as fast as possible while being flexible enough to process virtually any log format. To accomplish this it performs a lot of pointer arithmetic while sacrificing the easiness of writing a custom format. Nonetheless, I certainly believe that there's room for improvement in this area and I plan to work on it.

As far as JSON objects not necessarily being ordered, is there a case where Caddy would log requests in different formats in the same access log? For instance, you mentioned that header fields vary, TLS may or may not be used, etc. Does that mean that the same access log may or may not contain certain fields and will depend on the request? I ask because that would certainly make things a bit more tricky for the current parser. If that is the case, feel free to post a few lines straight from your access log and I may be able to find a pattern on it.

For what is worth, the following format should work for the sample line you posted above (assumes there's a space between the key and the value on most cases, "key": "val"),

{ "level": "info", "ts": 1585597114.7687502, "logger": "http.log.access", "msg": "handled request", "request": { "method": "GET", "uri": "/", "proto": "HTTP/2.0", "remote_addr": "127.0.0.1:50876", "host": "example.com", "headers": { "User-Agent": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 Edg/81.0.416.72" ], "Accept": [ "*/*" ] }, "tls": { "resumed": false, "version": 771, "ciphersuite": 49196, "proto": "h2", "proto_mutual": true, "server_name": "example.com" } }, "latency": 0.000014711, "size": 2326, "status": 200, "resp_headers": { "Server": [ "Caddy" ], "Content-Type": ["text/html"] } }

goaccess access.log --log-format='%^:%^: %x.%^ %^{%^: "%m"%^: "%U"%^: "%H"%^: "%h:%^"%^: "%v"%^[ "%u"%^]%^}%^}%^}%^: %T,%^: %b,%^: %s, %^' --date-format=%s --time-format=%s

About the name, the go comes from my initials 😊 β€” I do appreciate the intent to contribute though.

Related Issues:

16
goaccess Support structured log formats such as JSON
Native JSON support has been added Feel free to build from development to test this out It will be p...
8
goaccess Web access returns 400 Invalid Request
@allinurl I just experienced this issue as well I believe it occurs when the total size of the reque...
6
goaccess GoAccess debian repository install does not work on Stretch (libssl1.0.2)
same for Debian 10!! Debian 4.19.67-2+deb10u1 (2019-09-20) x86_64 GNU/Linux Hi Due to unmet dependen...
5
goaccess goaccess not running in --real-time-html
Glad that worked :) Usually it's best to use the public IP with --ws-url rather than the hostname ju...
3
goaccess How can I pass custom nginx log format to GoAccess
@xxxatt For the sample line you posted above this works for me Please make sure you are using the la...
328
amplify js Uncaught ReferenceError: global is not defined in latest Angular 6 RC
Just for reference I have passed through this issue with adding these lines on my index.html head: ...
162
react native firebase πŸ”₯(Android) Program type already present: io.invertase.firebase.BuildConfig
I think I found my ultimate error here In my package.json file I had at some point earlier ...
122
amplify cli Many-To-Many
You can implement many to many yourself using two 1-M @connections and a joining @model ...
117
superset Was unable to import superset Error: cannot import name '_maybe_box_datetimelike'
tx @Uneasy-listening !!! This worked for me: [only necessary if you have already installed pandas (p...
114
amplify js fetch is not defined
nodejs fix: I'm using amazon-cognito-auth-js with my express app and I'm following the case 1 exampl...
106
amplify js Error: No credentials, applicationId or region
I had the same issue (running on the latest Amplify v3) and worked around it by changing the followi...
104
amplify js Is it possible to get cognito user attributes in Lambda/cloud logic ?
I have been looking around for a while I feel the answers here didn't really answer the problem ...
104
react native firebase πŸ”₯ Version mismatch causing app termination
Looks like GoogleAppMeasurement gets imported as a dependency with version 5.3.0 A simple addition t...
100
react native firebase onNotificationOpened not working on Android (background/foreground)
Solved the issue only on background using @ZardozSpeaks approach Under my SplashActivity.java ...
95
grafana HTTP Error Bad Gateway when using prometheus
I still face this error after explicitly writing the URL (in my case it was http://localhost:9090) ...
83
amplify cli aws-exports.js is not generated
Even after the third read I find it utterly confusing and I have usability issues too ...
83
amplify js Auth Error: Amplify has not been configured correctly using Nuxt.js
I 'm having the same issue in aws-amplify: ^3.0.11 I found out Auth module didn't load configs of aw...
82
react native firebase Firebase dependency updates are required to fix gradle v4+ builds
@DeepaSriramRR as a temporal workaround you can disable version check of the Google plugin At the en...
82
react native firebase AndroidX support
Play Services just shipped AndroidX breaking changes - if you must upgrade your android Firebase SDK...
80
react native firebase RNFirebase core module was not found natively on ios
I will just leave it here in case someone comes and it's still struggling with this ...
70
amplify js How to refresh Cognito tokens
It will refresh if you call the SDK for it e.g. with Auth.currentSession() and it finds an expired t...
64
react native firebase [πŸ“š] AdMob - use @invertase/react-native-google-ads
Just FYI we're getting close here AdMob documentation availability Hi there! I've noticed that AdMob...
63
react native firebase [SOLVED with v2.1.1] Undefined symbols for architecture x86_6: _OBJC_CLASS_$_RNFirebaseDatabaseReference
Okay.. if anybody comes accross this magic error try this It solved the problem (for now) Close Xcod...
60
react native firebase [android] No Firebase app '[DEFAULT]' has been created - call firebase.initializeApp(), js engine: hermes
I spent a lot of hours for found who was the problem most setup issues such as default app has not b...
57
react native firebase iOS: Firebase.h not found
Can you first close xcode then delete the xcworkspace & Podfile.lock files in the ios directory then...
56
amplify cli @auth public/private IAM roles and other Providers
ok my bad was actually quite easy just do : and add a auth provider in my case was IAM ...
56
react native firebase Android: JDK 10 not supported for v4.1.0
+1 4.1.0 broke builds for Java10 As a temporary workaround on MacOS Issue Upon upgrading to v4.1.0 a...
55
amplify js Sign up multiple different accounts with the same email
The pre-signup trigger can be used to prevent the new signup from being created when there's an exis...
53
ClickHouse DB::Exception: Too many parts (600). Merges are processing significantly slower than inserts
Each insert create a folder in /var/lib/clickhouse/.../table_name/ Inside that folder there are 2 fi...
52
amplify js Amplify Console 200 (Rewrite) fails on SPA React (Router) Application
This worked for me source: </^((?!.(css|gif|ico|jpg|js|png|txt|svg|woff|ttf)$).)*$/> target address:...
52
react native firebase Notification not showing up in foreground after triggering displayNotification() (Android)
It seems as if the requirements for displaying a foreground notification are stricter on Android ...
51
amplify js aws-amplify 0.3.0: "Uncaught ReferenceError: require is not defined" when packaged with webpack
Ok I found something that helped Graphql-js uses .mjs as file extension which caused issues with the...
49
react native firebase Unable to instantiate service io.invertase.firebase.messaging.MessagingService: java.lang.ClassNotFoundException
My problem was solved by removing these lines from my AndroidManifest.xml Issue I've updated both RN...
48
react native firebase pod install fails after npm install @react-native-firebase/firestore
I tried the above solutions and didn't work for me I solved the issue by deleting the ./ios/Podfile....
46
react native firebase cannot find symbol BuildConfig.APPLICATION_ID
@mikehardy seems like it should be enough to rename APPLICATION_ID to LIBRARY_PACKAGE_NAME in /app/a...
45
amplify js Identity providers authentication against User Pools WITHOUT hosted UI
@martimarkov we find a solution for you to use the customized button to do that ...
45
react native firebase Error: You attempted to use a firebase module that's not installed on your Android project by calling firebase.notifications()
The solution for this is to add those to app/build.gradle : implementation 'com.google.firebase:fire...
45
react native firebase IOS unable to receive notification
It appears as if RNFirebase documentation is missing a few required steps Issue: Not able to receive...
44
amplify cli jest-haste-map: Haste module naming collision: -> namefunction <-
For React Native 0.6x configure the blacklist in metro.config.js instead of rn-cli.config.js as per ...
44
amplify js Getting "no current user" after successful login to Cognito UserPool
I had the same problem but for me removing the cookie storage configuration in aws-exports.js solved...
43
amplify js How to add user to Group
πŸ‘† Also we have achieved this using the Post Confirmation Lambda trigger Very simplified from our im...
43
react native firebase RNFirebaseNotifications.h file not found
I think in the header search paths it should be set to recursive for $(SRCROOT)/../node_modules/reac...
39
amplify js RFC: Amplify Library Modularization and Bundle Size Improvement
Modularization is available in Preview Hi all With us launching Modularization in the coming weeks ...
39
react native firebase Multiple dex files define Lcom/google/firebase/iid/zzb;
I got the same conflict with react-native-device-info and solved it with : After adding this package...
38
react native firebase Document how to mock react-native-firebase for jest
For now I've got this going on and it works for me (you may need to mock more or fewer modules or mo...
37
nativescript plugin firebase new iOS error on google auth - presenting view controller must be set
Google Auth was previously working on iOS but now I am receiving this error with no changes to codeb...
37
react native firebase Support Firebase SDK v5.0.0+
There are a number of breaking changed in v5 of the iOS pods: (https://firebase.google.com/support/r...
37
react native firebase πŸ”₯ [πŸ›] Crashlytics could not determine stripped/unstripped native library directories for project ':app'
Not sure if it is suitable for everyone but with my team with spotted that if in this step : https:/...