Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit logs in JSON format #101

Merged
merged 1 commit into from
Aug 22, 2023
Merged

Emit logs in JSON format #101

merged 1 commit into from
Aug 22, 2023

Conversation

kotyara1005
Copy link
Contributor

@kotyara1005 kotyara1005 commented May 15, 2023

Description:
This PR adds structured logging support for lambda functions.
Log format and level now encoded into message header.
User are able to set format and level through the environment variables AWS_LAMBDA_LOG_FORMAT and AWS_LAMBDA_LOG_LEVEL.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@briensea briensea requested review from briensea and alex-pewpew May 16, 2023 07:17
@kotyara1005
Copy link
Contributor Author

kotyara1005 commented May 16, 2023

ping @luben

def format_stacktrace(self, exc_info):
if not exc_info:
return None
return self.formatException(exc_info)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be:

return traceback.format_tb(exc_info[2])

so we get the trace as an array of frames, not as a string



def lambda_instance_logs_update(key: str, value: str):
INSTANCE_LOGGING_CONTEXT[str(key)] = str(value)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why str(value)? I think the value should not be stringified - customer may want to attach an object/dict there that will be rendered as such in JSON

@@ -367,9 +368,29 @@ def create_log_sink():
return StandardLogSink()


AWS_LAMBDA_LOG_FORMAT = os.environ.get("AWS_LAMBDA_LOG_FORMAT", "TEXT")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also take into account the AWS_LAMBDA_LOG_LEVEL envvar ans set the logger verbosity accordingly

@kotyara1005 kotyara1005 requested a review from luben May 16, 2023 15:20
@@ -367,9 +368,31 @@ def create_log_sink():
return StandardLogSink()


AWS_LAMBDA_LOG_FORMAT = os.environ.get("AWS_LAMBDA_LOG_FORMAT", "TEXT")
AWS_LAMBDA_LOG_LEVEL = os.environ.get("AWS_LAMBDA_LOG_LEVEL", "INFO")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is envvar is not set, we should not default the filtering to info. I the envvar is not set we will not apply the fiter

if log_format == "JSON":
logger_handler.setFormatter(JsonFormatter())
logging._levelToName[logging.CRITICAL] = "FATAL"
logger.setLevel(log_level)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log level should be set for both Text and JSON is it's defined

@kotyara1005 kotyara1005 requested a review from luben May 18, 2023 08:12
@thenewguy39
Copy link

thenewguy39 commented May 18, 2023

Please set flag from Text to JSON in frame header.


@staticmethod
def set_log_format(frame_type, log_format):
if log_format == "JSON":
Copy link

@luben luben May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not create new enum here:

class TextFormat(Enum):
   JSON=0
   TEXT=1

...

def log(self, msg, log_level=logging.NOTSET, log_format: LogFormat:
   ...
   frame_type = frame_type | log_format.value

@@ -347,7 +379,7 @@ def log(self, msg):

def log_error(self, message_lines):
error_message = "\n".join(message_lines)
self.log(error_message)
self.log(error_message, log_level=logging.FATAL)
Copy link

@luben luben May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be ERROR - at least this is what we use today, e.g. [ERROR] ZeroDivisionError: division by zero

@kotyara1005 kotyara1005 requested a review from luben May 24, 2023 15:39
@luben
Copy link

luben commented May 31, 2023

LGTM

@@ -333,20 +385,27 @@ def __enter__(self):
def __exit__(self, exc_type, exc_value, exc_tb):
self.file.close()

def log(self, msg):
def set_log_level(self, frame_type, log_level):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you inline this function? I think if we expose that name, customer will think that they can change the log level.

@kotyara1005 kotyara1005 requested a review from luben June 12, 2023 16:33
@@ -333,20 +395,22 @@ def __enter__(self):
def __exit__(self, exc_type, exc_value, exc_tb):
self.file.close()

def log(self, msg):
def log(self, msg, log_level=logging.NOTSET, log_format: int = _TEXT_FORMAT):
frame_type = FRAME_TYPES.get((log_format, log_level), DEFAULT_FRAME_TYPE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing customers this is going to create new tuple objects and do hashtable lookups. Not too big of a problem, but can we avoid it for the sake of cleaner implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, moved to formatter.

+ len(encoded_msg).to_bytes(4, "big")
+ timestamp.to_bytes(8, "big")
+ encoded_msg
)
self.file.write(log_msg)

def log_error(self, message_lines):
def log_error(self, message_lines, log_level=logging.ERROR):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With feature enabled, is there need in this method? Are jsons multiline too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used by log_error(error_result, log_sink) function.

@@ -370,6 +434,28 @@ def create_log_sink():
_GLOBAL_AWS_REQUEST_ID = None


def setup_logging(log_format, log_level, log_sink):
logging._levelToName[logging.CRITICAL] = "FATAL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to affect existing customers who are structuring their logging themselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need it anymore.

@@ -13,8 +13,8 @@
# simplejson's Decimal encoding allows '-NaN' as an output, which is a parse error for json.loads
# to get the good parts of Decimal support, we'll special-case NaN decimals and otherwise duplicate the encoding for decimals the same way simplejson does
class Encoder(json.JSONEncoder):
def __init__(self):
super().__init__(use_decimal=False)
def __init__(self, ensure_ascii=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this overload used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

)

if log_level:
logger.setLevel(log_level)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we put this part under if log_format == LogFormat.JSON: condition?
  2. In case of incorrect value, should we gracefully handle it?

Example:
Code looks like

logging.addLevelName(33, "PEW")
def lambda_handler():
  logging.addLevelName(42, "PEWPEW")

Natural desire would be to use env variable to limit log level to PEWPEW or PEW, but at the moment of running setup_logging it's not yet registered.
Adding levels in a handler looks quite strange, but natural at INIT phase. If the intention is to support custom log levels, need to setLevel at least after INIT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah let's move it
  2. Let's restrict to standard levels only

self.log_sink.log(msg)

self.log_sink.log(
msg, frame_type=getattr(record, "_frame_type", _FRAME_TYPES.get(()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should _FRAME_TYPES.get(()) be replaced with just None ?
  2. Nit: Is there any value for attribute lookup for non-JSON logging? If no, would be cleaner to avoid doing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah, it's a leftover
  2. There is no value in attribute lookup here. But I'm not sure how to avoid it.

@@ -370,6 +416,27 @@ def create_log_sink():
_GLOBAL_AWS_REQUEST_ID = None


def setup_logging(log_format, log_level, log_sink):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hide this behind an _

Copy link

@luben luben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alex-pewpew alex-pewpew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please

  1. Squash your feature commits to make history cleaner
  2. Improve PR description

super().__init__(datefmt=_DATETIME_FORMAT)

@staticmethod
def format_stacktrace(exc_info):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method and below - are they supposed to be private? If yes, let's have them mangled with __

@kotyara1005 kotyara1005 force-pushed the struct-logging branch 2 times, most recently from 0fa37f2 to f7e8115 Compare August 21, 2023 16:18
@alex-pewpew
Copy link
Contributor

LGTM

@briensea briensea merged commit 8b34c48 into aws:main Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants