-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error reading coredump from flash in ESP32-S3 (IDFGH-14443) #53
Comments
You can verify the coredump data by reading it with OpenOCD. For example:
Connect OpenOCD from telnet
Then analyze the elf file Print elf header
Print note section
Print program headers
|
I have Secure Boot v2 enabled so JTAG debugging is disabled - I'm not sure OpenOCD would work? As per the code above, I'm manually reading the first 16 bytes of the partition and printing them to console before sending the partition contents - and can verify that they match (and are wrong on both). |
No, Unfortunatelly, OpenOCD will not work. In cases other than abort(), I wonder if there was another exception during the coredump, preventing the write operation from being completed. Were you able to reproduce it manually? If yes I can try it here also. I haven't tried it before with the secure boot + flash encryption scenario. I need to check if there's something affecting the coredump process. By the way, in your code above, why do you read/encode the entire partition instead of only the written coredump data? You can get the data length from the first 4 bytes of the header. |
Many thanks for this. Most of my 'real' crashes at the moment are due to heap issues - so I changed my panic function to be as follows rather than abort():
I was expecting this to fail, but it actually also work and I'm currently able to download the cordump file in the correct format. I'll try and leave my device connected via UART and hopefully catch an unplanned crash at some point to see if the log over UART provides any clues. |
Here's an example of the output to UART for a core dump that just happened. When I then read the coredump partition content after a reboot, the first 16 bytes are |
Looking at the log nothing wrong with the CoreDump write. It is completed without an error.
Does your coredump partition is also encrypted, right? |
It is also reading the summary. |
OK. So does the summary rely on the coredump partition being valid? I assume that helps us identify whether the partition itself is correct and it's the reading / decrypting of this that may be the error? Please let me know if you need any further logs / data from my side to support debugging? |
Please share the full reboot logs after coredump write and the logs from your function so we can verify that the coredump-related address matches. |
Attached a file from a recent core dump (forced by feeing static allocated char*), which shows the UART output of the dump itself, the subsequent reboot and also the output from the esp_core_dump_get_summary and encoding of the core dump partition ready to send. I've also attached the associated coredump.bin and .txt files downloaded from the partition after reboot. Please let me know if you need anything else? |
I didn't expect to see this. #if CONFIG_ESP_INSIGHTS_COREDUMP_ENABLE https://github.com/espressif/esp-insights/blob/main/components/esp_insights/src/esp_insights.c#L301 You need to do coredump encoding before |
Ah OK, that's a bit of a problem as I was hoping to use ESP_insights alongside being able to manually download the core dump data. Is there a specific reason that ESP_Insights needs to delete the coredump data once it's sent to the dashboard? For the moment I'll try just manually comment out that line in esp_insights - will that work? |
I believe the reason could be to avoid sending old coredump data after every boot. As long as you delete the image after encoding, it should be fine @vikramdattu @shahpiyushv Could you please comment? |
Ok, so this may be a niche use-case, however my manual encoding of the core dump partition is on-demand using an RPC function over MQTT to allow secure remote debugging in VSCode. I don’t automatically encode and upload the partition data to a server. In this case, I’m wanting at any time to be able to make and RPC call and download the most recent core dump for a device. Perhaps my only option is to disable core dump download in ESP Insights? However, I do find this a very useful feature, particularly for a summary analysis of older firmware versions, where the .elf file on my development machine may be a later version. |
@erhankur On further review, I'm not sure this is the problem. In the example files I uploaded above, I called the coredump RPC function immediately after boot, and as you'll see from the archive file, the core dump summary was OK, but the encoding of the coredump partition failed. However, if I wait for Insights to transmit the coredump info, if I think call the coredump RPC function I get a simple So Insights is an issue for sure, but I think the encoding problem with the coredump partition may be a separate issue. |
I don't think so. I have checked your files from Archive.zip. The
Could you send me your latest version of coredump encoder function? In the first version above, you will not get |
Here is the updated function which outputs the coredump summary, then encodes the partition data for sending over MQTT / UART to the requestor. Interesting that you're able to parse the coredump.bin file - whenever I run Am I doing something stupid? `
} |
Yes you are skipping only 4 bytes in the header. Skip full header
In summary, you can read the core dump data at boot time. However, on the 2nd attempt, |
Thanks, I'll check this. Is there anyway for Insights to save the SHA of the transmitted core dump to NVS, then only send a new summary if the SHA changes, rather than having to delete the cord ump partition after the data is sent to the dashboard? |
Hi @gadget-man the problem with that is we will then miss on the crashes which are exactly same. e.g., if the board crashed at the same code point multiple times, the logic will skip this reporting thinking it's the same one. |
In which case, could the insights component not check |
In that case, if we fail to send the dump in first attempt for whatever reason, may it be hard/soft reset after crash, crash dump will never be sent. |
OK, so in that case and if there's no option to use NVS flags etc and the only way is to send the partition data if it exists, can I suggest you update the ESP Insights documents to make it clear that, if For the moment, I'll have to stop using ESP Insights for core dump reporting. |
Hi @gadget-man thanks for the suggestion. We will add this in the README. |
Thanks for updating the readme. Just out of interest, is there a way I could manually call the function to send the core dump to the insights dashboard? Although I may miss a few, I’m comfortable that the vast majority of crashes would be caught by checking boot reason, and I could then send the data across without having to subsequently delete the core dump partition. |
Hi @gadget-man the crash dump is sent in the boot time data message, which is sent at the start. So, it is not straight forward to send the crash-dump at later than with the boot info. You can, however, achieve your use-case, by modifying the esp-insight component to not erase the crash dump after it is sent. Re-enable core dump from menuconfig and remove calls to |
Answers checklist.
General issue report
I have an ESP32-S3 with secure boot and encryption enabled. I'm saving coredump data to flash. IDF version 5.4.0.
If I manually call abort() within the code, once the device has rebooted I can call a command to read the cordump partition to UART, which works successfully and I can see that the base64 encoded core dump starts with:
f0VMRgEBAQAAAAAAAAAAAAQAXgABAAAAAAAAADQAAAAAAAAAAAAAADQAIAAsACgAAAAAAAQAAAC0BQAAAAAAAAAAAADgMQAA4DEAAAYAAAAAAAAAAQAAAJQ3AADYycw/2MnMP1wBAABcAQAABgAAAAAAAAABAAAA8DgAALBaGzywWhs8QAMAAEADAAAGAAAAAAAAAAEAAAAwPAAA+GjLP/hoyz9cAQAAXAEAAAYAAAAAAAAAAQAAAIw9AAAAZss/AGbLP6ACAACgAgAABgAAAAAAAAABAAAALEAAAPBwyz/wcMs/XAEAAFwBAAAGAAAAAAAAAAEAAACIQQAAAG7LPwBuyz+gAgAAoAIAAAYAAAAAAAAAAQAAAChEAAAwI84/MCPOP1wBAABcAQAABgAAAAAAAAABAAAAhEUAAIBsGjyAbBo80AMAANADAAAGAAAAAAAAAAEAAABUSQAAxDDMP8QwzD9cAQAAXAEAAAYAAAAAAAAAAQAAALBKAACgK8w/oCvMPxADAAAQAwAABgAAAAAAAAABAAAAwE0AAFyYzD9cmMw/XAEAAFwBAAAGAAAAAAAAAAEAAAAcTwAAcJTMP3CUzD/gAgAA
However, if the firmware crashes for some other reason (e.g. Heap Memory poisoning) and after rebooting I then call the same command to read the coredump partition, I get a very different looking response, starting as follows:
630b1pcoDKbNIrI76sCdTgUUVt2r4/oYDPIn6rDrtCKhZomDTMDG62Cu/BobAezrz1WLZQzA9ogCy/75VLC9stc4BHg9GnaW/pwcb8ZoEZvmEYLUFy0shsZ4Lf66YslIIVBahj0+eqaMNAGM4jzNZn1EwJLg/kbxR8VGd0Ou7mcvw0n4cfE/TjjxjCbu6sKAbFhLDCtmfo1ADX5Nbay3wvwkegrfw3cOcSBy8cVMUh0RSsF/NqyL62PCDy1ZYyrDr2RFxpdwkqlB2S40Lhf8hQ20Qx4IMxIAm0YR6Zf+Gpe830ZSkIwLbYYh5Ll8ZcWNT2hsxRtf2ir80aVTL4/7wtaGqBy1VMilCwb9oXuSfmtLPERjOyZN5x80HxB8ShHJ5NZvjOZBJr/BqBsTnNWxwcdr5XiwnM6BwqwyfP5hAryLnyC+eYGc6UVTdx7Q3dv8xzIK99XTzZu8yPm66kTwejWcZQbYvWLgPmCXbg5jrNLd0ryBCQ39Vci7iL3zEfuUf/hfx2UxHtwk0IeHtKJscoA8fZCXYOYPBTKyurGyb7aYHTcidfIloySul4YHsZxdFJsmNBOQbR5RfK0LkL0Ixm0a8HdzA6
Note that currently I'm trimming the first 24 bytes of the partition data, so that when working correctly, the decoded data starts with
\x7fELF
Here is the function I'm using to read the coredump data from the partition:
`
// Find the coredump partition
const esp_partition_t *core_dump_partition = esp_partition_find_first(
ESP_PARTITION_TYPE_DATA, ESP_PARTITION_SUBTYPE_DATA_COREDUMP, "coredump");
`
this is the relevant section of sdkconfig:
CONFIG_ESP_COREDUMP_ENABLE_TO_FLASH=y CONFIG_ESP_COREDUMP_DATA_FORMAT_ELF=y CONFIG_ESP_COREDUMP_CHECKSUM_CRC32=y CONFIG_ESP_COREDUMP_CHECK_BOOT=y CONFIG_ESP_COREDUMP_ENABLE=y CONFIG_ESP_COREDUMP_LOGS=y CONFIG_ESP_COREDUMP_MAX_TASKS_NUM=64 CONFIG_ESP_COREDUMP_USE_STACK_SIZE=y CONFIG_ESP_COREDUMP_STACK_SIZE=2048
Is this a bug?
The text was updated successfully, but these errors were encountered: