Skip to content
This repository has been archived by the owner on Apr 2, 2024. It is now read-only.

SandboxDecoder / nginx_access.lua doesn't work with SSL log #1974

Open
wwalker opened this issue Aug 2, 2016 · 4 comments
Open

SandboxDecoder / nginx_access.lua doesn't work with SSL log #1974

wwalker opened this issue Aug 2, 2016 · 4 comments
Labels

Comments

@wwalker
Copy link

wwalker commented Aug 2, 2016

Using the SandboxDecoder with nginx_access.lua, our logs don't parse. nothing comes out.
I can parse the same data with the PayloadRegexDecoder and it works.

Using 0.10.0 from the RPM on my workstation and from the .deb on my server. Both fail the same way. silently don't write anything to the output from nginx, but do write things from heka.{memstats,statmetrics,all-report}

This is my config:

  [hekad]
  maxprocs = 8

  [Dashboard]
  type = "DashboardOutput"
  address = ":4352"
  ticker_interval = 15

  [TestWebserver]
  type = "LogstreamerInput"
  log_directory = "/var/log/nginx"
  file_match = 'access\.log'
  decoder = "CombinedLogDecoder"

  [CombinedLogDecoder]
  type = "SandboxDecoder"
  filename = "lua_decoders/nginx_access.lua"

  [CombinedLogDecoder.config]
  type = "combined"
  user_agent_transform = true
  # combined log format
  #log_format = '$remote_addr - $remote_user [$time_local $time_zone] $server_port $ssl_protocol/$ssl_cipher $upstream_addr "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" -- [$request_time]'
  log_format = '$remote_addr - $remote_user [$time_local] $server_port $ssl_protocol/$ssl_cipher $upstream_addr "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" -- [$request_time]'

  [ESJsonEncoder]
  index = "%{Type}-%{%Y.%m.%d}"
  es_index_from_timestamp = true
  type_name = "%{Type}"
      [ESJsonEncoder.field_mappings]
      Timestamp = "@timestamp"
      Severity = "level"

  [ElasticSearchOutput]
  server = "http://10.20.30.50:9200"
  flush_interval = 5000
  flush_count = 10
  encoder = "ESJsonEncoder"
  message_matcher = "TRUE"

This is some log data:
10.20.30.40 - - [01/Aug/2016:06:31:59 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "POST /api/v2/samples HTTP/1.1" 200 446 "-" "Ruby" -- [0.078]
10.20.30.40 - - [01/Aug/2016:06:31:59 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "POST /api/v2/samples/state HTTP/1.1" 200 4161 "-" "Ruby" -- [0.054]
10.20.30.40 - - [01/Aug/2016:06:32:03 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "GET /api/v2/samples/2130a46fddf4fde50b7ff0d64f6d9706/video.webm?api_key=REdactED HTTP/1.1" 200 256537 "-" "Ruby" -- [8.951]
10.20.30.40 - - [01/Aug/2016:06:32:03 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "GET /api/v2/samples/4a62f6ba29c285a0afc7430922dfac07/video.webm?api_key=REdactED HTTP/1.1" 200 39615 "-" "Ruby" -- [4.038]
10.20.30.40 - - [01/Aug/2016:06:32:03 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "GET /api/v2/samples/2130a46fddf4fde50b7ff0d64f6d9706/processes.json?api_key=REdactED HTTP/1.1" 200 583852 "-" "Ruby" -- [0.390]

@wwalker
Copy link
Author

wwalker commented Aug 2, 2016

I've narrowed the problem down to this: $ssl_protocol/$ssl_cipher

that is straight from the nginx.conf as it should be, but build_grammar doesn't work right.
If I change it to $ssl_protocol it works, including the data that should be in the ssl_cipher in $ssl_protocol.

@robison
Copy link

robison commented Aug 2, 2016

$ssl_protocol/$ssl_cipher aren't defined fields per the common_log_format[0] module (which is used here by nginx_access.lua) to decode the field. It simply sees a single field without space or quote delimiting, and interprets it as a single field. Here's an example; you may have some success by space-delimiting the two fields you want:

http://lpeg.trink.com/share/11180597858328222060

[0]: https://github.com/mozilla-services/lua_sandbox_extensions/blob/master/lpeg/modules/lpeg/common_log_format.lua

Also, as an aside, you may have more success asking these sorts of questions on the mailing list or IRC channel, rather than opening an issue. Please lemme know if this doesn't at least put you on the right track. Thanks!

@wwalker
Copy link
Author

wwalker commented Aug 2, 2016

I asked on both the mailing list and IRC. I only reported it here when I was sure it was a bug. The documentation says to paste the log_format directly from the nginx.conf. I did that and it doesn't work; therefore it is a bug that should either be fixed, or being a documented known bug. I had already worked around it both with PayloadRegexDecoder and with modifying the log_format in SandboxDecoder/nginx_access.lua (losing a bit of data granularity).

@trink trink added the feature label Aug 3, 2016
@trink
Copy link
Contributor

trink commented Aug 3, 2016

As @robison pointed out that grammar has not had support added for the ssl module so it falls back to a basic space delimited or quoted string parsing. There are two options: alter your log_format configuration (as suggested above) or extend the grammar to include the ssl module variables.

Note: it is possible to construct a log_format string with ambiguous output (i.e. not parse able) even with 'known' variables.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants