Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper message length issues with Rigol DS1102E #472

Open
aWZHY0yQH81uOYvH opened this issue Nov 27, 2024 · 14 comments
Open

Improper message length issues with Rigol DS1102E #472

aWZHY0yQH81uOYvH opened this issue Nov 27, 2024 · 14 comments

Comments

@aWZHY0yQH81uOYvH
Copy link

aWZHY0yQH81uOYvH commented Nov 27, 2024

I have a Rigol DS1102E connected over USB using the pyvisa-py backend. I am unable to receive waveform data correctly. This manifests in a few ways.

In normal memory depth mode, the scope should return 16384 bytes of data with an IEEE header.

scope.timeout = 10000 # required or hangs forever
scope.write(':ACQ:MEMD NORMAL')
scope.write(':RUN')
data = scope.query_binary_values(':WAV:DATA? CHAN1', datatype = 'B')
# timeout exception

I think this used to give the exception show below. It seems to be very inconsistent.

I can get some data using read_raw

scope.write(':WAV:DATA? CHAN1')
data = scope.read_raw()
len(data) # 16384

But this sometimes doesn't work (throws the exception shown below), and it returns the incorrect amount of data (should be 16394 bytes including the header).

#470 seems to fix this issue. query_binary_values works as expected with that patch, and read_raw returns the correct amount of data.

However, if I request more data by switching into long memory mode (should return 1048576 samples),

scope.write(':ACQ:MEMD LONG')
scope.write(':RUN')
data = scope.query_binary_values(':WAV:DATA? CHAN1', datatype = 'B')

I get the following exception, where it appears to be trying to parse a zero-length packet.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa/resources/messagebased.py", line 750, in query_binary_values
    return self.read_binary_values(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa/resources/messagebased.py", line 574, in read_binary_values
    block = self._read_raw(chunk_size)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa/resources/messagebased.py", line 441, in _read_raw
    chunk, status = self.visalib.read(self.session, size)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa_py/highlevel.py", line 513, in read
    data, status_code = self.sessions[session].read(count)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa_py/usb.py", line 168, in read
    return self._read(
           ^^^^^^^^^^^
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa_py/sessions.py", line 792, in _read
    current = reader()
              ^^^^^^^^
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa_py/usb.py", line 152, in _usb_reader
    return self.interface.read(count)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa_py/protocols/usbtmc.py", line 481, in read
    response = BulkInMessage.from_bytes(resp)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/.local/venv/lib/python3.12/site-packages/pyvisa_py/protocols/usbtmc.py", line 114, in from_bytes
    msgid, btag, btaginverse = struct.unpack_from("BBBx", data)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: unpack_from requires a buffer of at least 4 bytes for unpacking 4 bytes at offset 0 (actual buffer size is 0)

Sometimes it takes a very long time for this to finally show up. It used to happen much faster; not sure what is different now.

Attempting to use read_raw results in the same error.

Using @Jimmyvandenbergh's #470, I get a timeout error using query_binary_values and get a bunch of 20480-byte blocks using read_raw. Attempting to use scope.read_bytes(1048586) also results in a timeout error.

From the GitHub issues, it seems like a lot of Rigol/Siglent instruments are triggering this zero-length packet exception, so not sure if what I'm encountering is the same bug as the others. #470 fixes this but appears to be losing data somewhere given the timeout exceptions.

Output of pyvisa-info

Machine Details:
   Platform ID:    macOS-14.6.1-x86_64-i386-64bit
   Processor:      i386

Python:
   Implementation: CPython
   Executable:     /Users/me/.local/venv/bin/python3.12
   Version:        3.12.7
   Compiler:       Clang 15.0.0 (clang-1500.3.9.4)
   Architecture:   ('x86', 64)
   Build:          Oct  1 2024 02:05:46 (#main)
   Unicode:        UCS4

PyVISA Version: 1.14.1

Backends:
   ivi:
      Version: 1.14.1 (bundled with PyVISA)
      Binary library: Not found
   py:
      Version: 0.7.2
      USB INSTR: Available via PyUSB (1.2.1). Backend: libusb1
      USB RAW: Available via PyUSB (1.2.1). Backend: libusb1
      TCPIP INSTR: Available 
         Resource discovery:
         - VXI-11: ok
         - hislip: ok
      TCPIP SOCKET: Available 
      ASRL INSTR:
         Please install PySerial (>=3.0) to use this resource type.
         No module named 'serial'
      VICP INSTR:
         Please install PyVICP to use this resource type.
      GPIB INSTR:
         Please install linux-gpib (Linux) or gpib-ctypes (Windows, Linux) to use this resource type. Note that installing gpib-ctypes will give you access to a broader range of functionalities.
         No module named 'gpib'
      GPIB INTFC:
         Please install linux-gpib (Linux) or gpib-ctypes (Windows, Linux) to use this resource type. Note that installing gpib-ctypes will give you access to a broader range of functionalities.
         No module named 'gpib'


@aWZHY0yQH81uOYvH
Copy link
Author

aWZHY0yQH81uOYvH commented Nov 27, 2024

With the main branch, I get all timeout exceptions for every message, including *IDN?.

pyvisa-info
Machine Details:
   Platform ID:    macOS-14.6.1-x86_64-i386-64bit
   Processor:      i386

Python:
   Implementation: CPython
   Executable:     /Users/me/.local/venv/bin/python3.12
   Version:        3.12.7
   Compiler:       Clang 15.0.0 (clang-1500.3.9.4)
   Architecture:   ('x86', 64)
   Build:          Oct  1 2024 02:05:46 (#main)
   Unicode:        UCS4

PyVISA Version: 1.14.1

Backends:
   ivi:
      Version: 1.14.1 (bundled with PyVISA)
      Binary library: Not found
   py:
      Version: 0.7.3.dev35+g5147749
      USB INSTR: Available via PyUSB (1.2.1). Backend: libusb1
      USB RAW: Available via PyUSB (1.2.1). Backend: libusb1
      TCPIP INSTR: Available 
         Resource discovery:
         - VXI-11: ok
         - hislip: ok
      TCPIP SOCKET: Available 
      ASRL INSTR:
         Please install PySerial (>=3.0) to use this resource type.
         No module named 'serial'
      VICP INSTR:
         Please install PyVICP to use this resource type.
      GPIB INSTR:
         Please install linux-gpib (Linux) or gpib-ctypes (Windows, Linux) to use this resource type. Note that installing gpib-ctypes will give you access to a broader range of functionalities.
         No module named 'gpib'
      GPIB INTFC:
         Please install linux-gpib (Linux) or gpib-ctypes (Windows, Linux) to use this resource type. Note that installing gpib-ctypes will give you access to a broader range of functionalities.
         No module named 'gpib'

@aWZHY0yQH81uOYvH
Copy link
Author

Problems Identified

  1. This causes a new read to be done even when the first packet is exactly response.transfer_size. Perhaps this is unique behavior to this scope since I've read talk of another short packet being used to indicate the end of a transfer? This explains the timeouts on the main branch.
    while (
    len(resp) == self.usb_recv_ep.wMaxPacketSize
    or len(received_transfer) < response.transfer_size
    ):
    Seems like it would make sense to break out of all reading loops once the received_transfer has reached response.transfer_size?
  2. This scope sends an incorrect length for the first packet. The first packet should contain (and does contain) 62 bytes of data, but it sends a length of 52 in the header. Presumably they forgot to count the 10-byte IEEE header, and so this truncates 10 bytes off of the first block of data. This explains the 16384 length instead of 16394.
    data = data[12 : 12 + transfer_size]
    (also here)
    received_message.extend(received_transfer[: response.transfer_size])
    Perhaps worth addressing with a "quirky" handler? Or just don't truncate? I'm only learning of this codebase and USBTMC today, so I have no idea if other equipment requires the truncation to work correctly.
  3. Random packets are returned with length zero from raw_read. For example, of the 20165 packets required to transfer the 1048586 bytes for the long memory capture, 13 randomly placed packets were empty in one run, 23 in another, etc. This causes the zero length exception when it tries to unpack the USBTMC header. The number missing is different each time. Yellow lines are missing packets. Seems like it might have something to do with CPU load, so maybe a race condition or threading issue somewhere in the USB stack. missing This seems like a more difficult problem. I'll take a look and see what I can figure out.

@Jimmyvandenbergh
Copy link
Contributor

Jimmyvandenbergh commented Nov 27, 2024

@aWZHY0yQH81uOYvH

first thanks for all information. i need a little time to go through all information in depth. but just to clarify some of your comments in the last message.

  1. I think the error you receive is that a new 'transfer' that is being requested, line 474 from Support partial usbtmc reads and request the size amount of bytes from the device #470 this due to the fact EOM is not set on the last transaction and the requested bytes is larger then received bytes. Therefore according to spec, we want to read a new transfer. This would include the 12 byte header of the USBTMC spec, we should expect a zero packet in the while loop, as one of two while checks should be fulfilled. I have seen this issue occurring when a message on these device is end and it is equal to the wMaxPacketSize they do not send a new transaction. But Support partial usbtmc reads and request the size amount of bytes from the device #470 should solve this issue, unless there is more data requested and 'EOM' is not send.
    Exit if the response.transfer_size is reached, is not the solution, per following quote of the spec, this will leave the short package to be in buffer in other devices that comply to this:

The device must always terminate a Bulk-IN transfer by sending a short packet. The short packet may be zero-length or non zero-length. The device may send extra alignment bytes (up to wMaxPacketSize – 1) to avoid sending a zero-length packet. The alignment bytes should be 0x00-valued, but this is not required. A device is not required to send any alignment bytes.

  1. transfer_size which is send in the header should not include the 12 byte header or alignment bytes therefore if the message is 52 bytes we would expect one wMaxPacketSize, if wMaxPacketSize = 64 , (52bytes + 12byte header) and another short usb packet with 0 - 4 bytes see quote of the spec above:

  2. Are you able to provide a full data log of the bytes that are returned? so the header of the first message, and all new headers that are requested, including all data bytes received. Maybe this can give an insight in what they sending back and if we are missing some case in the USBTMC implementation. (or if we can update the routine conforming). please test this with Support partial usbtmc reads and request the size amount of bytes from the device #470
    .

@aWZHY0yQH81uOYvH
Copy link
Author

aWZHY0yQH81uOYvH commented Nov 27, 2024

@Jimmyvandenbergh Hi!

  1. That makes sense. Just for testing with my scope, I'm existing as soon as the packet is fully received. I'll play with Support partial usbtmc reads and request the size amount of bytes from the device #470 later, hopefully that'll fix it properly as you say.
  2. The header I'm describing is something of the format #801048576 in ASCII that's part of the data (saying that there is 1048576 bytes of stuff coming -- see here), not the USBTMC header. As far as I understand that's above the level of the protocol we're discussing and therefore should not be counted/touched by the functions we're talking about here. The problem is the USBTMC header coming from the scope says the length of the data including that string above is 52 bytes when it really should be 62 (we're receiving 74 bytes from raw_read for this packet, and 64 for the rest). Again, I see this is a problem with the scope itself at this time, but perhaps would warrant a workaround on the pyvisa-py side.
  3. With some hacks in place to deal with the aforementioned things, it seems like the remaining big problem is the random empty packets. I think the total number of packets is correct, so none are being dropped, just some are empty. So far, I've traced this all the way back to libusb, whose libusb_bulk_transfer is returning zero (no error), while also returning zero length. Interestingly, I'm also seeing it return -7 (LIBUSB_ERROR_TIMEOUT) on a completely different set of random packets (98 out of 20165). Seems unrelated (not before/after/same number as zero-length packets) and isn't a major problem yet since they at least have the correct amount of data in them (64). Though if a timeout really is happening I'd imagine that's slowing things down massively.

I'll continue investigating with #470 applied and see if I can figure out what's going on, including logging the packet data. Maybe something can be learned from the progression of _btag which looks like some kind of sequence number?

@aWZHY0yQH81uOYvH
Copy link
Author

  1. Fixed by Support partial usbtmc reads and request the size amount of bytes from the device #470.
  2. Still present, but now with read_raw reading 20480 bytes at a time, the scope returns 20490 bytes but has a USBTMC header length field matching the request of 0x00050000 (LE) = 20480, so we're still losing 10 bytes to truncation.
  3. We're still losing packets. However, now I don't see zero-length packets, they're just gone. See the attached text file. All sections (starting with "sending request") contain 320x64 byte (wMaxPacketSize) reads plus a 12-byte one at the end to make up for the header, totaling the expected 20480 (except for the first which has the aforementioned extra 10 bytes). However, the last one ending with a timeout error only has 319x64 byte reads, indicating that one of the 64-byte reads was lost somewhere. It keeps trying to read since it hasn't gotten the 20480 bytes it wanted, and times out. Which read_raw call this happens on is random.
    packets.txt

@aWZHY0yQH81uOYvH
Copy link
Author

aWZHY0yQH81uOYvH commented Nov 27, 2024

One more log with LIBUSB_DEBUG=4 env var enabled and a shorter (128) chunk_size. I don't see anything out of the ordinary other than the USB packets not being there. Strange that before they existed but had zero length, and now just don't exist. This might be as far as I can go on macOS without disabling SIP to get kernel-level logging of USB packets. I suppose I could try on a native Linux machine next.

With a 128-byte read size, it's strangely consistent at failing on the second read_raw most of the time.

Also nothing strange in the system log other than the reporting of the USB timeout.

AppleUSBIORequest: AppleUSBIORequest::complete: device 21 (DS1000 SERIES @14300000) endpoint 0x82: status 0xe00002d6 (timeout): 0 bytes transferred

libusb debug log.txt

@aWZHY0yQH81uOYvH
Copy link
Author

Same behavior in Linux. Packets just aren't there in Wireshark either, so maybe yet another bug in this 10 year old scope. Perhaps only a certain packet size or timing works.

@Jimmyvandenbergh
Copy link
Contributor

  1. Fixed by Support partial usbtmc reads and request the size amount of bytes from the device #470.

good to hear this fixes the problem you are facing

  1. Still present, but now with read_raw reading 20480 bytes at a time, the scope returns 20490 bytes but has a USBTMC header length field matching the request of 0x00050000 (LE) = 20480, so we're still losing 10 bytes to truncation.

Here I think the bytes you get is actually correct. You are requesting 20480 bytes, which are also returned. see below:

  1. a request is made to the scop to send 20480 bytes.
  2. the scope returns with a header, saying it will return 20480 bytes.
  3. you receive 20480 bytes.

Now you have read 10 bytes less then the scope would have returned when requested enough bytes from the device.
These, missing 10 bytes, should then be read again with a read_raw() which, should send a new USBTMC header, and then should receive a new USBTMC header, which marks 10 bytes in transfer_size. This last part seems not always to be supported. But if you would request for example 20544 bytes from the device. this would send back all the data. Can you try and confirm?

  1. We're still losing packets. However, now I don't see zero-length packets, they're just gone. See the attached text file. All sections (starting with "sending request") contain 320x64 byte (wMaxPacketSize) reads plus a 12-byte one at the end to make up for the header, totaling the expected 20480 (except for the first which has the aforementioned extra 10 bytes). However, the last one ending with a timeout error only has 319x64 byte reads, indicating that one of the 64-byte reads was lost somewhere. It keeps trying to read since it hasn't gotten the 20480 bytes it wanted, and times out. Which read_raw call this happens on is random.
    packets.txt

I will have a look into packets.txt and your debug log when I have some additional time.

@Jimmyvandenbergh
Copy link
Contributor

@aWZHY0yQH81uOYvH

2. The header I'm describing is something of the format #801048576 in ASCII that's part of the data (saying that there is 1048576 bytes of stuff coming -- see here), not the USBTMC header. As far as I understand that's above the level of the protocol we're discussing and therefore should not be counted/touched by the functions we're talking about here. The problem is the USBTMC header coming from the scope says the length of the data including that string above is 52 bytes when it really should be 62 (we're receiving 74 bytes from raw_read for this packet, and 64 for the rest). Again, I see this is a problem with the scope itself at this time, but perhaps would warrant a workaround on the pyvisa-py side.

Got it this is a definite arbitrary block format read from the device. where # denotes the start of the block, 8 stands for the size of number of bytes to read that will actually be send. making you expect 01048576 bytes in data. this is excluding the USBTMC header and or any alignment bytes. Personally i don't use the code in py-visa as we have set this up different then py-visa does. as we first read #<number> then we read number amount of bytes which represent then we read the number of bytes transferred

see code snippet. this is different then py-visa does as they try to read_raw(20 * 1024)

def readArbitraryBlock(self)  -> bytes:
        """Read an Arbitrary block from the device with bytes as output from the instrument. 

        Returns:
            List[float]: bytes
        """
        DATA_BYTES_SIZE_SIZE = 2
        DATA_BYTES_SIZE_OFFSET = 1

        data_bytes_size_array = self.base_instrument.read_bytes(DATA_BYTES_SIZE_SIZE)
        if data_bytes_size_array[0] != ord('#'):
            raise Exception("Invalid arbitrary block Read data does not start with #")
        
        data_bytes_size = int(data_bytes_size_array[DATA_BYTES_SIZE_OFFSET]) - ord('0')
        data_bytes_array = self.base_instrument.read_bytes(data_bytes_size)
        data_bytes = int(data_bytes_array.decode('utf-8'))
        # we adjust the chuck size to required buffer!
        if data_bytes < 20 * 1024: # 20KB is default chunck size of PyVISA
            self.base_instrument.chunk_size = 20 * 1024
        elif self.base_instrument.chunk_size < data_bytes:
            self.base_instrument.chunk_size = data_bytes
        data = self.base_instrument.read_bytes(data_bytes + 1)
        data = data[:-1]
        return data ****

@MatthieuDartiailh
Copy link
Member

For small block that do not contain a termchar it means you do more query than you could could. I would accept a PR to change the chunk_size for subsequent reading in pyvisa. And I would be interested ion whether or not you see a difference in term of performance.

@Jimmyvandenbergh
Copy link
Contributor

@MatthieuDartiailh I do not follow your comment read more data than you could. As this problem seems to be related to arbitrary block read (IEEE 488.2 specification), which is specified as shown in this comment. #<number of bytes><number of bytes to transferr><data><\n><END>

chunk_size is changed to wMaxPaccketSize but have not tested performance in #470`.

@aWZHY0yQH81uOYvH
Copy link
Author

Ok, I actually have some interesting findings to report. Setting chunk_size to wMaxPacketSize (64) or 52 does not work, as the oscilloscope does not respond to some of the bulk IN requests for 64 bytes for whatever reason, data is lost, and the read times out.

Setting chunk_size to 1048586 (1048576 + 10 byte IEEE data header) does work, and all of the data is transferred successfully (bar issue 2). Of course the problem with this is that you need to start transferring data to know the data length to request in the first place.

So, I propose something like this with two transfers that should be the most efficient for reading data of this specific format.

scope.write(':ACQ:MEMD LONG')
scope.write(':RUN')
time.sleep(1)
scope.write(query)

# First read enough to see the IEEE data header (plus some to fill wMaxPacketSize)
read_len = 52

data = scope.read_raw(read_len)

# Read expected length from IEEE header we just got
# This obviously assumes the header is a certain format so is not robust
data_len = int(data[2:10])
print(f'header data length = {data_len}') # 1048576
read_len = data_len - (len(data) - 10) # IEEE header takes up 10 bytes of data already read -> need to get 1048534 more bytes

# Read the rest of the data all in one go
data += scope.read_raw(read_len)

print(f'actual data length = {len(data)}') # 1048576 incorrect -- see below, but at least no more timeout issues and lost packets

Now I still have the problem that the scope does not send the correct length in the first USBTMC header and the total data length is 1048576 instead of 1048586. If I remove the truncation based on the USBTMC header length, this is the first time I have been able to transfer all of the long memory data successfully.

To further explain, this is what's happening with the truncation over the two transfers:

  1. Ask for 52 bytes in the first transfer. Scope responds with a USBTMC header saying it will have 52 bytes, but it responds with 62 bytes (64 byte wMaxPacketSize read + 10 byte second read - 12 byte USBTMC header = 62 bytes). The code then throws out the 10 extra bytes the scope sends over the expected 52 length field in the header.
  2. I ask for the remaining 1048576 - (52 - 10) = 1048534 bytes in a second transfer, but the scope responds with a header saying 1048524 byes are coming, and does only return that many, ending the transfer early. Clearly the 10 bytes that were thrown out by the truncation of the first transfer were the one chance we had to actually see them. Again, this is a problem with the scope.

Takeaway here is that splitting up the transfer into many smaller ones (chunk_size) seems to be bad, and it would be nice to be able to disable the truncation/strict enforcement of the USBTMC length field for weird devices like mine.

The former would be something to be implemented in pyvisa, and the latter something to be implemented in pyvisa-py.

@Jimmyvandenbergh
Copy link
Contributor

  1. If i understand correctly : Asking for wMaxPacketSize in the while loop is not something the device supports.
    If above true:
    I will check if we can exploit just a read that takes full multitudes of the transfer_size when more data is to be read in the while loop (read_bytes = ((transfer_size % wMaxPacketSize) + 1) * wMaxPacketSize. This would require all data to be fetched from USB in one read, if missing data we will then update read_bytes - len(resp) and continue reading until transfer_size is reached. (still starting with one time resp = raw_read(self.usb_recv_ep.wMaxPacketSize)). This still enables us to detect short packages if i am correct but needs extra logic.

  2. The data format of scope:
    The device send the USBTMC header (the 12 byte header that does not belong to the data) that it will send 1048576 (0x100000) bytes. Which is actually correct, as I don't think Rigol DS1102E does send a IEEE 488.2 compliant arbitrary block data format (as you mentioned). But is just fixed to 1048576. As looking through their manual and example code. See picture below. But then also NORMAL memory deviates from the expected 20480 as you mentioned.
    image
    image

please let me know if there is other documentation that specifies the return data of the Rigol. (as rigol downloads seem not to be working)

@aWZHY0yQH81uOYvH
Copy link
Author

  1. I think I agree with what you're saying here. We want these lines running only twice, first for a small amount, then again for the rest of the data.
    req = BulkInMessage.build_array(self._btag, recv_chunk, None)
    raw_write(req)
    Not sure about your first calculation for read_bytes there since it should count the data already received in the first read, and that modulo should probably be an integer division, but then you get an off-by-one error if transfer_size is an exact multiple of wMaxPacketSize if your intention is to round up to the next multiple of wMaxPacketSize.
  2. That programming manual is the only reference I've been using from Rigol. I'm very clearly seeing the IEEE header at the start of the data since it's ASCII, so I think their documentation is just omitting that. The data lengths in that table are what's reported in the IEEE header, and it returns that many raw 8-bit ADC samples after the header. The 20480 length was a result of the default chunk_len that I was using before I started supplying a length argument to read_raw. I was doing many read_raws at that point in my testing and concatenating their return. Sorry for the confusion about that, but the scope has always been trying to return the amount of data shown in that table (raw/half-channel entries).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants