-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook does not save with rather unclear message about a string being too long #6006
Comments
Hi @sidviny! Thanks for taking the time to report this. I have a few questions to get a better idea of how to reproduce what you're seeing:
|
Our code for parsing this file is below: def read_event_logging(
start_date: str,
filename: str = "Security.evtx"
):
"""Read auditing logging.
Parameters
----------
start_date : str
Start date from which to read the logging.
filename : str, optional
Which event logging to read.
Returns
-------
pandas DataFrame
The dataframe only contains logging about files that were changed.
Examples
--------
dfs = read_event_logging()
dfs = read_event_logging('2024-01-11')
"""
# Imports specified here as only this function requires PyEvtxParser
import json
from evtx import PyEvtxParser
print(f"Reading file {filename}")
parser = PyEvtxParser(filename)
print("Converting to dataframe")
df = pd.json_normalize(parser.records_json())
# Add extra columns
df["object_name"] = ""
df["object_type"] = ""
df["user_name"] = ""
df["access_mask"] = ""
print("Start parsing...")
for index, row in df.iterrows():
# We skip the loggins before the start_date
if row["timestamp"] < start_date:
continue
# For the later events, we read the data
item = row["data"]
data = json.loads(item)
event = data["Event"]
print(item)
print(event)
print("-------------------")
if "EventData" in event.keys():
event_data = event["EventData"]
if isinstance(event_data, dict):
if "ObjectName" in event_data.keys():
object_name = event_data["ObjectName"]
object_type = event_data.get("ObjectType", "")
user_name = event_data.get("SubjectUserName", "")
access_mask = event_data.get("AccessMask", "")
info = {
"event_id": [row["event_record_id"]],
"timestamp": [row["timestamp"]],
"object_name": [object_name],
"object_type": [object_type],
"user_name": [user_name],
"access_mask": [access_mask],
}
df.loc[index, "object_name"] = object_name
df.loc[index, "object_type"] = object_type
df.loc[index, "user_name"] = user_name
df.loc[index, "access_mask"] = access_mask
print(info)
# Only take records where a file was touched
# dfs = df[df.object_name.str.len() > 1].copy()
# Only take records where a file with an extension was touched
dfs = df[df.object_name.str.find(".") > 0].copy()
dfs = dfs[dfs.object_type == "File"].copy()
access_rights = pd.DataFrame(
{
"access_mask": {
0: "0x10",
1: "",
2: "0x10000",
3: "0x2",
4: "0x6",
5: "0x20000",
6: "0xc0000",
7: "0x1000000",
8: "0x10c0000",
9: "0x100",
10: "0x0",
11: "0x1",
12: "0x80",
13: "0x10080",
14: "0x4",
15: "0x40000",
},
"action": {
0: "write_ea",
1: "",
2: "delete",
3: "write",
4: "0x6",
5: "read_control",
6: "0xc0000",
7: "0x1000000",
8: "0x10c0000",
9: "write", # 'write_attributes',
10: "0x0",
11: "read_data",
12: "read_attributes",
13: "delete", # 'append_data',
14: "add_subdirectory",
15: "write_dac",
},
}
)
# merge data
dfs = dfs.merge(access_rights)
# return the data
return dfs |
This is the code used in that function. I tested with the Security-big-sample.evtx. Well, that file does not seem to be big enough. Our file is way larger. As said, I cannot upload the file here. It's our Security.evtx, which contains too much information for the whole world to see. However, I can send you this file via WeTransfer. We were already in contact with Posit before. I work for ArcelorMittal in Ghent (Belgium). My email is [email protected]. |
Even if the dataframe is very large, notebooks should only contain source code and cell outputs so they should typically be small especially when there are no images. Either:
Here's one way to differentiate: Are you able to save the notebook after first running the "Notebook: Clear All Outputs" command? If so, you could try to narrow down to the specific notebook cell that has the problematic output by clearing all outputs, running a cell, saving, running the next cell, saving, and so on. |
System details:
Positron and OS details:
Positron Version: 2024.12.0 (system setup) build 96
Code - OSS Version: 1.93.0
Commit: c5ce275
Date: 2024-11-28T02:50:45.229Z
Electron: 30.4.0
Chromium: 124.0.6367.243
Node.js: 20.15.1
V8: 12.4.254.20-electron.0
OS: Windows_NT x64 10.0.26100
Interpreter details:
name: am2412
channels:
dependencies:
prefix: C:\ProgramData\miniforge3\envs\am2412
Describe the issue:
I was converting a Windows Event Log file into a dataframe with own code. That all works, we use the code since a couple of months. The final dataframe is about 95k of rows, 9 columns. When I tried to save the notebook, I got the following error:
I suppose this is because of the size of the dataframe it's trying to include in the notebook, but the message is unclear.
Steps to reproduce the issue:
Expected or desired behavior:
It should either save the file or show a better message.
Were there any error messages in the UI, Output panel, or Developer Tools console?
The text was updated successfully, but these errors were encountered: