Skip to content

Modbus Exception 3 #599

@HenriqueAurelio

Description

@HenriqueAurelio

Intermittent Modbus Exception 3 in Production - Critical Coil Operations Lost

Description

I'm experiencing intermittent Modbus Exception 3 (Illegal data value) errors when reading from specific PLCs in production environments. This issue is critical because missing coil reads causes my application to lose core functionality (gate control actions are not executed).

The same application code works reliably with most clients (15+ PLCs), but with some specific clients, Exception 3 errors occur frequently, disrupting operations.

Environment

  • Library Version: [email protected]
  • Node.js Version: v18
  • Protocol: Modbus TCP
  • Runtime: Node.js in Docker containers
  • Network: Production environments with varying network quality
  • Scale: 20+ PLCs running the same codebase

The Problem

Error Message

Error: Modbus exception 3: Illegal data value (value cannot be written to this register)
    at ModbusRTU._onReceive (/app/node_modules/modbus-serial/index.js:474:21)
    at TcpPort.emit (node:events:517:28)
    at TcpPort.emit (node:domain:489:12)
    at Socket.<anonymous> (/app/node_modules/modbus-serial/ports/tcpport.js:123:22)
    ...

Frequency & Pattern

  • Occurs frequently on ~5 specific client PLCs (approximately 25% of deployments)
  • Rarely/never occurs on 15+ other client PLCs using identical code
  • Most common when reading coils, but also happens with discrete inputs and holding registers
  • Intermittent - not every request fails, making it hard to reproduce consistently
  • ⚠️ More frequent during peak hours or suspected network congestion periods

Critical Impact

I cannot afford to lose coil operations - when a coil read fails with Exception 3, my application misses gate control actions, which is the core functionality of the system. This results in:

  • Gates not opening when vehicles arrive
  • Operational delays and safety concerns
  • Lost revenue and customer complaints

Code Sample

const ModbusRTU = require("modbus-serial");
const client = new ModbusRTU();

// Connection setup
client.setTimeout(5000); // Increased from default 1000ms
await client.connectTCP(plcHost, { port: 502 });
client.setID(1);

// Polling loop (runs every 1 second)
setInterval(async () => {
  try {
    // Read discrete inputs
    const inputs = await client.readDiscreteInputs(0, 7);
    await new Promise(resolve => setTimeout(resolve, 100)); // Delay between reads

    // Read coils - THIS IS WHERE EXCEPTION 3 OCCURS MOST
    const coils = await client.readCoils(0, 7);
    await new Promise(resolve => setTimeout(resolve, 100));

    // Read holding registers
    const registers = await client.readHoldingRegisters(0, 7);
  } catch (error) {
    console.error('Modbus error:', error.message, error.modbusCode);
    // Exception 3 here means lost gate control action!
  }
}, 1000);

Investigation & Testing

Network Quality Simulation

I used Clumsy (Windows network emulator) to simulate poor network conditions on my local development environment:

Condition Setting Result
Latency (Lag) 200-500ms ✅ No errors - works fine
Packet loss (Drop) 5-10% ✅ No errors - works fine
Data corruption (Tamper) Enabled ❌ Reproduced some problem but not exactly the same issue occurring on production!

Key Finding: With Tamper enabled (random byte corruption), I got similar errors:

Data length error, expected 7 got 71
Modbus exception 3: Illegal data value

Analysis: The byte 0x07 was corrupted to 0x47, causing the length field to be interpreted incorrectly. This strongly suggests network packet corruption is the root cause in production environments.

What I've Tried

1. Increased Timeout ⚠️ Minimal Improvement

client.setTimeout(5000); // From default 1000ms

Result: Reduced frequency slightly, but Exception 3 errors still occur regularly

2. Added Delays Between Reads ⚠️ Minimal Improvement

await new Promise(resolve => setTimeout(resolve, 100));

Result: No significant improvement - errors persist

3. Connection Management ❌ No Change

  • Tried reconnecting on errors
  • Tried closing/reopening connection periodically
  • Tried reducing polling frequency

Result: None of these strategies eliminated the issue

4. Error Handling (Current Workaround) ⚠️ Not Acceptable

catch (error) {
  if (error.modbusCode === 3) {
    // Keep previous values - BUT THIS MEANS MISSING COIL ACTIONS!
    logger.warn('Exception 3 - maintaining previous state');
  }
}

Problem: This prevents crashes but I lose critical coil operations, which is unacceptable for gate control systems

Detailed Error Logs from Production

=== DETAILED ERROR DURING COIL READ ===
Requested address: 0
Requested quantity: 7
Message: Modbus exception 3: Illegal data value (value cannot be written to this register)
Modbus Code: 3
errno: undefined
Connection state isOpen: true
=========================================

Observation: The connection remains open (isOpen: true), suggesting this isn't a connection failure but rather a protocol-level issue.

Questions & Feature Requests

1. Does the library validate TCP frame integrity?

  • Is there CRC or checksum validation for Modbus TCP frames?
  • If corruption is detected, does the library automatically request a retransmission?

2. Could there be a race condition or buffer issue?

  • Multiple reads in quick succession (every 1 second)
  • Could previous response data remain in the buffer and interfere with current requests?
  • Should I be explicitly flushing buffers between reads?

3. Network-specific configurations?

Are there any undocumented settings or best practices for:

  • Unreliable networks with potential data corruption
  • High-latency connections
  • Industrial environments with electrical interference

4. Frame-level debug logging?

Is there a way to enable detailed logging to capture:

  • Raw bytes sent (request frame)
  • Raw bytes received (response frame)
  • Parsing/validation steps
  • Exact point where Exception 3 is determined

This would help identify if corruption happens in transit or if the PLC is actually rejecting valid requests.

5. Feature Request: Automatic Retry Logic ⭐

For TCP connections (where transient errors are expected), would it be possible to add:

Option A - Built-in Retry Configuration:

client.setRetryConfig({
  maxRetries: 3,
  retryDelay: 50, // ms
  retryOnExceptions: [3], // Retry on exception 3
  exponentialBackoff: true
});

Option B - Per-Operation Retry:

const coils = await client.readCoils(0, 7, {
  retry: { attempts: 3, delay: 50 }
});

Why this is critical:

  • Exception 3 due to network corruption is transient - a retry would likely succeed
  • This would prevent lost coil operations in mission-critical applications
  • Other Modbus libraries have similar retry mechanisms

6. Better Error Differentiation

Can the library distinguish between:

  • "PLC rejected the request" (legitimate Exception 3 - should not retry)
  • "Network corruption detected" (should retry)

For example, if a Data length error precedes Exception 3, that's clearly network corruption and should trigger an automatic retry.

Additional Context

  • Same codebase deployed across 20+ client sites
  • Works perfectly at 75% of sites - no issues for months
  • 25% of sites experience frequent Exception 3 errors
  • Issues correlate with older network infrastructure and industrial environments
  • Running in production where I cannot control network quality
  • Gate control is mission-critical - missed coil operations = operational failure
  • Budget constraints prevent network infrastructure upgrades at problematic sites

Request for Guidance

Would greatly appreciate:

  1. Confirmation that this is likely network corruption based on the Clumsy testing
  2. Any configuration options I might be missing
  3. Consideration of adding retry logic to the library
  4. Frame-level debugging capabilities to capture exact corruption patterns

Thank you for maintaining this excellent library! 🙏


Related Information:

  • Connection type: Modbus TCP (persistent connection, not reconnecting per read)
  • Docker container network mode: bridge
  • No firewalls or proxies between application and PLCs
  • PLCs are from various manufacturers (Siemens, Allen-Bradley, Schneider)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions