Skip to content

Conversation

HertzDevil
Copy link
Collaborator

Resolves #5.

This is a very primitive binary dump and, as mentioned in the issue, requires compiler support to emit the appropriate type info at build time so that other tools can comprehend those dumps (CRYSTAL_DUMP_TYPE_ID=1 will let you visually identify some allocations in a hex editor at the moment).

Apart from these custom formats, maybe we could try to produce industry or community standard memory dumps in the future...?

@HertzDevil HertzDevil added the enhancement New feature or request label Jun 29, 2025
Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty nice! I've been pondering about this for a while.

I see just a couple issues:

  1. should stop the world while we dump the heap so it's MT compatible (that one's easy).

  2. We can't allocate anything, but stdlib will happily allocate anywhere... Maybe we could introduce Crystal::System.read(fd, slice) and .write(fd, slice) methods that would directly read from and write to a system fd or handle?

# not have this list.
#
# All the records are then terminated by a single `UInt64::MAX` field.
def self.graph(io : IO) : Nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: maybe call it #compact as the description states? It would be more aligned with the #full version.

@HertzDevil
Copy link
Collaborator Author

2. We can't allocate anything, but stdlib will happily allocate anywhere

Do you mean that GC.lock_write + sync = true + stop-the-world are still insufficient because there could still be fiber switches while dumping?

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it looks great to me for the initial step 🙇

@HertzDevil
Copy link
Collaborator Author

It is possible to eliminate GC allocations from the IO itself like this: (Win32)

class RawIO < IO
  def initialize(@handle : LibC::HANDLE)
  end

  def read(slice : Bytes)
    LibC.WriteFile(@handle, slice, slice.size, out read_count, nil)
    read_count.to_i32
  end

  def write(slice : Bytes) : Nil
    LibC.WriteFile(@handle, slice, slice.size, nil, nil)
  end
end

macro open_raw_file(path, mode = "r", &block)
  {% write = mode == "w" ? true : mode == "r" ? false : mode.raise "Unknown file mode: #{mode.id}" %}

  # if `path.to_utf16` uses a fixed-size stack buffer then
  # this macro could even be turned into a regular method
  %handle = ::LibC.CreateFileW(
    {{ path.to_utf16 }},
    {% if write %} ::LibC::FILE_GENERIC_WRITE {% else %} ::LibC::FILE_GENERIC_READ {% end %},
    ::LibC::DEFAULT_SHARE_MODE,
    nil,
    {% if write %} ::LibC::CREATE_ALWAYS {% else %} ::LibC::OPEN_EXISTING {% end %},
    ::LibC::FILE_FLAG_BACKUP_SEMANTICS,
    nil,
  )

  if %handle == ::LibC::INVALID_HANDLE_VALUE
    ::raise(::RuntimeError.new("CreateFileW"))
  end

  begin
    %raw_io_buf = uninitialized ::ReferenceStorage(::RawIO)
    {{ block.args[0] }} = ::RawIO.unsafe_construct(pointerof(%raw_io_buf), %handle)
    {{ block.body }}
  ensure
    ::LibC.CloseHandle(%handle)
  end
end

open_raw_file("heap.bin", "w") do |f|
  PerfTools::DumpHeap.full(f)
end

straight-shoota pushed a commit to crystal-lang/crystal that referenced this pull request Jul 28, 2025
If the environment variable `CRYSTAL_DUMP_TYPE_INFO` is set, at build time the compiler will emit a bunch of type information to a JSON file at that path. The JSON looks something like:

```json
{
    "types": [
        {
            "name": "Regex",
            "id": 46,
            "min_subtype_id": 46,
            "supertype_id": 188,
            "has_inner_pointers": true,
            "size": 8,
            "align": 8,
            "instance_size": 56,
            "instance_align": 8,
            "instance_vars": [
                {
                    "name": "@re",
                    "type_name": "Pointer(LibPCRE2::Code)",
                    "offset": 8,
                    "size": 8
                },
                {
                    "name": "@jit",
                    "type_name": "Bool",
                    "offset": 16,
                    "size": 1
                },
                {
                    "name": "@source",
                    "type_name": "String",
                    "offset": 24,
                    "size": 8
                },
                {
                    "name": "@match_data",
                    "type_name": "Crystal::ThreadLocalValue(Pointer(LibPCRE2::MatchData))",
                    "offset": 32,
                    "size": 16
                },
                {
                    "name": "@options",
                    "type_name": "Regex::Options",
                    "offset": 48,
                    "size": 8
                }
            ]
        }
    ]
}
```

At the moment, this is intended to be an internal tool that supplements the similarly named `CRYSTAL_DUMP_TYPE_ID` environment variable. I originally made this to generate human-readable reports from [GC heap dumps](crystal-lang/perf-tools#30), but there are probably other good uses like enhancing the debugger support scripts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dumping the entire dynamic heap
3 participants