Skip to content

Add JSON logging to Fluentd TCP socket for Bedrock#2499

Open
rushatgabhane wants to merge 10 commits intoExpensify:mainfrom
rushatgabhane:fluentd
Open

Add JSON logging to Fluentd TCP socket for Bedrock#2499
rushatgabhane wants to merge 10 commits intoExpensify:mainfrom
rushatgabhane:fluentd

Conversation

@rushatgabhane
Copy link
Member

@rushatgabhane rushatgabhane commented Feb 6, 2026

Details

adds the ability to log directly to Fluentd's TCP socket in native JSON format. Since logs are pre formatted as JSON, fluentd can forward them without regex parsing : )

Recconnects on failure.

Supports three log destinations via -logDestination

  • rsyslog - existing behavior
  • fluentd - JSON logging to Fluentd only
  • both - dual logging

Fluentd options:

  • fluentdHost (default: 127.0.0.1)
  • fluentdPort (default: 24224)
  • fluentdTag (default: bedrock)

Fixed Issues

Fixes https://github.com/Expensify/Expensify/issues/594121

Tests

  1. Run Bedrock with -logDestination both
  2. Verify logs go to both rsyslog and Fluentd
  3. Test reconnection by restarting Fluentd while Bedrock is running
  4. Verify JSON format matches expected Fluentd protocol

Internal Testing Reminder: when changing bedrock, please compile auth against your new changes

@rushatgabhane
Copy link
Member Author

@MelvinBot review this PR please

1 similar comment
@rushatgabhane
Copy link
Member Author

@MelvinBot review this PR please

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional Fluentd TCP JSON logging to Bedrock so logs can be forwarded without regex parsing, with support for -logDestination rsyslog|fluentd|both and basic reconnect behavior.

Changes:

  • Introduces a Fluentd TCP logging backend (SFluentdInitialize, SFluentdLog) that emits JSON frames.
  • Adds SSyslogNoop to disable rsyslog logging when Fluentd-only is selected.
  • Updates Bedrock startup arg handling to configure Fluentd and optionally disable rsyslog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
libstuff/libstuff.h Adds Fluentd logging declarations; updates SSYSLOG macro to also emit to Fluentd; introduces SSyslogNoop.
libstuff/libstuff.cpp Implements Fluentd TCP connection + JSON payload emission; adds no-op syslog implementation.
BedrockServer.cpp Adds -logDestination and Fluentd config args, wiring initialization and rsyslog disabling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 297 to 313
SFluentdSocketFD = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (SFluentdSocketFD == -1) {
return false;
}

struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(SFluentdPort);
inet_pton(AF_INET, SFluentdHost.c_str(), &addr.sin_addr);

if (connect(SFluentdSocketFD, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
close(SFluentdSocketFD);
SFluentdSocketFD = -1;
return false;
}

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFluentdConnect calls inet_pton(AF_INET, SFluentdHost.c_str(), …) without checking the return value, and it only supports numeric IPv4 literals. Passing a hostname like localhost (or IPv6) will silently produce an invalid address and connection attempts will fail. Use getaddrinfo (already available via <netdb.h>) or at least validate inet_pton and fail fast with SFluentdSocketFD = -1 when parsing fails.

Suggested change
SFluentdSocketFD = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (SFluentdSocketFD == -1) {
return false;
}
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(SFluentdPort);
inet_pton(AF_INET, SFluentdHost.c_str(), &addr.sin_addr);
if (connect(SFluentdSocketFD, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
close(SFluentdSocketFD);
SFluentdSocketFD = -1;
return false;
}
struct addrinfo hints;
memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_INET; // Preserve existing IPv4-only behavior
hints.ai_socktype = SOCK_STREAM;
hints.ai_protocol = IPPROTO_TCP;
struct addrinfo* res = nullptr;
string portStr = to_string(SFluentdPort);
int rc = getaddrinfo(SFluentdHost.c_str(), portStr.c_str(), &hints, &res);
if (rc != 0 || !res) {
SFluentdSocketFD = -1;
if (res) {
freeaddrinfo(res);
}
return false;
}
SFluentdSocketFD = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (SFluentdSocketFD == -1) {
freeaddrinfo(res);
return false;
}
if (connect(SFluentdSocketFD, res->ai_addr, res->ai_addrlen) == -1) {
close(SFluentdSocketFD);
SFluentdSocketFD = -1;
freeaddrinfo(res);
return false;
}
freeaddrinfo(res);

Copilot uses AI. Check for mistakes.
record[key] = value;
}

string json = "[\"" + SFluentdTag + "\"," + to_string(time(nullptr)) + "," + SComposeJSONObject(record) + "]\n";
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Fluentd tag is injected into the JSON payload via string concatenation (["" + SFluentdTag + "", ...]) without JSON escaping. If -fluentdTag contains quotes/backslashes/control characters, the emitted JSON becomes invalid and can be abused for log injection. Build the outer JSON using the existing JSON helpers (e.g., SToJSON(SFluentdTag, /*forceString=*/true)) so the tag is correctly escaped.

Suggested change
string json = "[\"" + SFluentdTag + "\"," + to_string(time(nullptr)) + "," + SComposeJSONObject(record) + "]\n";
string json = "[" + SToJSON(SFluentdTag, /*forceString=*/true) + "," + to_string(time(nullptr)) + "," + SComposeJSONObject(record) + "]\n";

Copilot uses AI. Check for mistakes.
Comment on lines +358 to +360
if (send(SFluentdSocketFD, json.c_str(), json.size(), MSG_NOSIGNAL) == -1) {
close(SFluentdSocketFD);
SFluentdSocketFD = -1;
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send() on a TCP socket can return a short write; the current code treats any non--1 return as success, which can truncate the JSON frame and break Fluentd parsing. Track the number of bytes sent and loop until the full buffer is written, or treat partial writes as an error and reconnect.

Suggested change
if (send(SFluentdSocketFD, json.c_str(), json.size(), MSG_NOSIGNAL) == -1) {
close(SFluentdSocketFD);
SFluentdSocketFD = -1;
size_t totalSent = 0;
const size_t totalSize = json.size();
while (totalSent < totalSize) {
ssize_t bytesSent = send(SFluentdSocketFD, json.c_str() + totalSent, totalSize - totalSent, MSG_NOSIGNAL);
if (bytesSent <= 0) {
// Error or connection closed; close the socket so it will reconnect on the next attempt
close(SFluentdSocketFD);
SFluentdSocketFD = -1;
break;
}
totalSent += static_cast<size_t>(bytesSent);

Copilot uses AI. Check for mistakes.
Comment on lines +317 to +325
void SFluentdInitialize(const string& host, int port, const string& tag)
{
lock_guard<mutex> lock(SFluentdSocketMutex);
SFluentdHost = host;
SFluentdPort = port;
SFluentdTag = tag;
SFluentdConnect(lock);
SFluentdConfigured.store(true);
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFluentdInitialize / SFluentdConnect will overwrite SFluentdSocketFD with a new socket() result without closing any previously-open Fluentd socket. If initialization can happen more than once in a process (e.g., tests, reconfiguration), this will leak file descriptors. Close the existing FD (if != -1) before creating a new socket, or make initialization explicitly one-shot and enforce it.

Copilot uses AI. Check for mistakes.
Comment on lines +335 to +346
record["timestamp"] = to_string(time(nullptr));
record["priority"] = to_string(priority);
record["thread_name"] = SThreadLogName;
record["thread_prefix"] = SThreadLogPrefix;
record["process"] = SProcessName;
record["message"] = message;

for (const auto& [key, value] : params) {
record[key] = value;
}

string json = "[\"" + SFluentdTag + "\"," + to_string(time(nullptr)) + "," + SComposeJSONObject(record) + "]\n";
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFluentdLog calls time(nullptr) twice (once for record["timestamp"] and again for the Fluentd event time). If a second boundary is crossed between calls, the two timestamps can disagree. Capture the timestamp once and reuse it for both fields to keep the record consistent.

Copilot uses AI. Check for mistakes.
// Fluentd JSON logging stuff
// --------------------------------------------------------------------------
// Initialize Fluentd TCP socket connection. Call once at startup.
void SFluentdInitialize(const string& host, int port, const string& tag);
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment for SFluentdInitialize says it “Returns true on success”, but the function is declared/defined as void and there’s no success signal. Either change the signature to return bool (and actually reflect connection success), or update the comment to match the current behavior (e.g., best-effort configure + lazy reconnect).

Suggested change
void SFluentdInitialize(const string& host, int port, const string& tag);
// Performs best-effort initialization. If initialization fails, SFluentdLog will be a no-op.

Copilot uses AI. Check for mistakes.
Comment on lines +285 to 292
const string s = addLogParams(string(__rawMsg), ## __VA_ARGS__); \
const string prefix = SWHEREAMI; \
for (size_t i = 0; i < s.size(); i += 7168) { \
(*SSyslogFunc)(_PRI_, "%s", (prefix + s.substr(i, 7168)).c_str()); \
} \
SFluentdLog(_PRI_, prefix + __rawMsg, ## __VA_ARGS__); \
} \
} while (false)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When -logDestination is set to fluentd, SSyslogFunc becomes SSyslogNoop, but this macro still builds s, appends params, and loops over 7k chunks calling the no-op function. This adds avoidable per-log CPU overhead in fluentd-only mode; consider gating the rsyslog formatting/chunking path behind a separate enabled flag (or skip the chunk loop when rsyslog is disabled).

Copilot uses AI. Check for mistakes.
record["message"] = message;

for (const auto& [key, value] : params) {
record[key] = value;
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When merging params into record, user-supplied keys can overwrite reserved metadata fields like timestamp, priority, process, or even message. To keep the structured log schema stable, consider preventing overwrites (only insert when absent) or namespacing user params under a dedicated sub-object key (e.g., params).

Suggested change
record[key] = value;
// Prevent user-supplied parameters from overwriting reserved metadata fields.
if (!record.count(key)) {
record[key] = value;
}

Copilot uses AI. Check for mistakes.
Comment on lines +333 to +346
// Build JSON before acquiring lock to avoid doing heavy stuff in the critical section
STable record;
record["timestamp"] = to_string(time(nullptr));
record["priority"] = to_string(priority);
record["thread_name"] = SThreadLogName;
record["thread_prefix"] = SThreadLogPrefix;
record["process"] = SProcessName;
record["message"] = message;

for (const auto& [key, value] : params) {
record[key] = value;
}

string json = "[\"" + SFluentdTag + "\"," + to_string(time(nullptr)) + "," + SComposeJSONObject(record) + "]\n";
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior builds a specific Fluentd JSON frame ([tag, time, record]\n) and reconnection logic, but there are existing libstuff unit tests (e.g., JSON helpers) and nothing here validates the emitted frame or escaping. Consider adding a unit test that exercises the JSON payload composition (including escaping and reserved-field behavior) without requiring a real Fluentd instance (e.g., by extracting payload formatting into a helper function).

Copilot uses AI. Check for mistakes.
Comment on lines 297 to 309
SFluentdSocketFD = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (SFluentdSocketFD == -1) {
return false;
}

struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(SFluentdPort);
inet_pton(AF_INET, SFluentdHost.c_str(), &addr.sin_addr);

if (connect(SFluentdSocketFD, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
close(SFluentdSocketFD);
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Fluentd socket is created in blocking mode and connect()/send() are called while holding SFluentdSocketMutex. If Fluentd is slow/unreachable, these operations can block and stall logging (and potentially threads that are trying to log). Consider setting reasonable connect/send timeouts (e.g., SO_SNDTIMEO / non-blocking connect with poll) and/or avoiding holding the mutex across potentially-blocking syscalls.

Copilot uses AI. Check for mistakes.
@rushatgabhane
Copy link
Member Author

@MelvinBot review this PR please

@rushatgabhane rushatgabhane marked this pull request as ready for review February 16, 2026 16:29
@rushatgabhane
Copy link
Member Author

@MelvinBot please review this PR

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15d6e4f078

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +342 to +343
for (const auto& [key, value] : params) {
record[key] = value;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Redact params before writing Fluentd records

SFluentdLog currently copies every entry from params directly into the JSON record, which bypasses the existing redaction path used by addLogParams (libstuff/SLog.cpp lines 81-98). When -logDestination is fluentd or both, any sensitive fields passed in logging params (that are intentionally redacted in rsyslog output) will be emitted in cleartext to Fluentd, creating a data-leak regression.

Useful? React with 👍 / 👎.

}

// Try to send the log over TCP. Close the socket on failure. It'll try to reconnect on next attempt
if (send(SFluentdSocketFD, json.c_str(), json.size(), MSG_NOSIGNAL) == -1) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle partial TCP writes when sending Fluentd JSON

The send path treats any return value other than -1 as success, but send() on a TCP socket may return a short byte count under backpressure; in that case the remaining bytes are dropped and the emitted Fluentd frame is truncated. This can corrupt JSON log records intermittently for larger messages or busy sockets, even though no reconnect/error path is triggered.

Useful? React with 👍 / 👎.

memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(SFluentdPort);
inet_pton(AF_INET, SFluentdHost.c_str(), &addr.sin_addr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve non-literal fluentdHost values before connect

Connection setup uses inet_pton(AF_INET, SFluentdHost.c_str(), ...) and ignores its return value, so hostnames like fluentd.service are parsed as invalid and leave the destination as 0.0.0.0; every connect attempt then fails silently. Since -fluentdHost is exposed as a host option, this breaks Fluentd logging whenever operators provide a DNS/service name instead of a raw IPv4 literal.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant