pnnx numpy file input #6285

AtomAlpaca · 2025-08-26T15:03:03Z

AtomAlpaca
Aug 26, 2025

Changes Introduced

Numpy file prasing implementation in utils.cpp
Add parameters input and input, allows to get the shape and contents of tensor from numpy file.
Add tests under tests/numpy.

The structure of a numpy file

The .npy format is the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. The format stores all of the shape and dtype information necessary to reconstruct the array correctly even on another machine with a different architecture.

A numpy file is built up from two parts, separated by a newline character(\n).The first part stores the information of the file and array, and the second part is the pure binary data itself.

Header data

For any numpy file, the first 6 bytes are a magic string: \x93NUMPY

The next 2 bytes are the major and minor version number of this file. e.g. \x01\x00.

As for now, numpy file has three major version, for version 1.0, the next 2 bytes forms a little-endian unsigned short int HEADER_LEN, representing the length of header data, and as for version 2.0, it becomes a 4-bytes little-endian unsigned int.

In version 1.0 and 2.0 the next HEADER_LEN bytes is an ASCII string which contains a Python literal expression of a dictionary. In version 3.0 the string becomes utf8-encoded to supports structured types with any unicode field names. Considering the low actual demand we didn't implement this version.

The dictionary contains three keys:

"descr: dtype.descr", an object can be passed as an argument to numpy.dtype. It is formated as <data type><endian><type length>, e.g. f<4 represents "4 bytes long, little endian, float number".
fortran_order: bool, whether the array is stored as Fortran-contiguous.
shape: tuple fo number the shape of the array.

In real numpy files, these keys may not be sorted in alphabetic order.

After this dictionary are some spaces(\x20) and finally the \n, which makes the len(magic string) + 2 + len(length) + HEADER_LEN can be evenly divisible by 64 for alignment purposes.

Data

Following the header comes the array data. If the dtype contains Python objects (i.e. dtype.hasobject is True), then the data is a Python pickle of the array, which will not appears in our program. Otherwise the data is the contiguous (either C- or Fortran-, depending on fortran_order) bytes of the array.

Implementation Details

Given the simplicity of the structure of the numpy format, we made a minimalist implementation instead of including a third-party library.

Most of the implementation are trivial such as the parsing jobs. We only talk about the challenging and noteworthy parts here.

Endian

The endianess of the data maybe different from the system's.

To get the system endianess we have a trick:

char get_system_endian()
{
    uint16_t i = 1;
    return (*(char*)&i) ? '<' : '>';
}

In little-endian machine i in memory will be like 0x01 0x00 0x00 0x00 and in big-endian we have 0x00 0x00 0x00 0x01, so we can simply check if the first byte of i is 0x01.

To convert the endianess we just do std::swap(bytes[j], bytes[type_size - j - 1]); where j loops from 0 to type_size / 2 - 1 to every single data.

Fortran order

In the simplest way, the difference between the two orders is that the coordinates grow differently when they are flatly stored in contiguous memory.

For example we have a array a[2][2][2], in c order it stores like a[0][0][0], a[0][0][1], a[0][1][0]..., a[1][0][0]... but in fortran order it becomes a[0][0][0], a[1][0][0], a[0][1][0]..., a[0][0][1]..., just the opposite way.

To convert fortran order to c order, we enumerate each coordinate, compute the address of this coordinate in memory in both orders, and copy it.

To do this we compute the "stride" of each dimension in each order, which means, "to Increase the coordinates of this dimension by 1, how many time I should move through memory". e.g. For a[2][2][2], in c order stride of the third dimension is simply one, but in fortran order it would be 4.

Handling the boundary conditions carefully, we can easily compute the stride.

c_strides[dims - 1] = 1;
for (int i = dims - 2; i >= 0; --i)
{
    c_strides[i] = c_strides[i + 1] * shape[i + 1];
}

f_strides[0] = 1;
for (int i = 1; i <= dims - 1; ++i)
{
    f_strides[i] = f_strides[i - 1] * shape[i - 1];
}

Then we can easily compute the index.

for (int i = 1; i <= content_len; ++i)
{
	int64_t c_index = 0;
    int64_t f_index = 0;
    for (int j = 0; j <= dims - 1; ++j)
    {
        c_index += c_strides[j] * index[j];
        f_index += f_strides[j] * index[j];
    }

    memcpy((char*)dst + c_index * type_size, (char*)src + f_index * type_size, type_size);

    ++index[dims - 1];
    for (int j = dims - 1; j >= 0; --j)
    {
        index[j]++;
        if (index[j] < shape[j])
        {
            break;
        }
        index[j] = 0;
    }
}

Notes

strtok() won't avoid data races.
Two tests may read the same file at the same time when testing in multiple threads， use different file name.

Add tests

In torch we can use the numpy() method to save a Tensor as a numpy file, so that we can simply save and let pnnx read it.

# ...
torch.manual_seed(0)
x = torch.rand(1, 3, 224, 224)

np = x.numpy()
npy.save("test_convnext_tiny_input1.npy", np)

a = net(x)

mod = torch.jit.trace(net, x)
mod.save("test_convnext_tiny.pt")

import os
os.system("../../src/pnnx test_convnext_tiny.pt input=test_convnext_tiny_input1.npy")
# ...

To test if fortran_order and endian convert work well, we can use asfortranarray() and astype().

# ...
np = x.numpy()
f_np = npy.asfortranarray(np);
# ...

# ...
np = x.numpy()
r_np = np.astype('>f4')
# ...

To run these tests we create a folder pnnx/tests/numpy and add these tests:

find_package(Python3 REQUIRED COMPONENTS Interpreter)

macro(pnnx_numpy_add_test name)
    add_test(NAME test_${name} COMMAND ${CMAKE_COMMAND} -DPYTHON_EXECUTABLE=${Python3_EXECUTABLE} -DPYTHON_SCRIPT=${CMAKE_CURRENT_SOURCE_DIR}/test_${name}.py -P ${CMAKE_CURRENT_SOURCE_DIR}/../run_test.cmake)
endmacro()

pnnx_numpy_add_test(convnext_tiny)
pnnx_numpy_add_test(convnext_tiny_endian)
pnnx_numpy_add_test(convnext_tiny_fortran_array)
pnnx_numpy_add_test(nn_Conv3d)
pnnx_numpy_add_test(pnnx_eliminate_noop_expand)
pnnx_numpy_add_test(pnnx_fuse_multiheadattention)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pnnx numpy file input #6285

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

pnnx numpy file input #6285

Uh oh!

AtomAlpaca Aug 26, 2025

Changes Introduced

The structure of a numpy file

Header data

Data

Implementation Details

Endian

Fortran order

Notes

Add tests

Replies: 0 comments

AtomAlpaca
Aug 26, 2025