-
-
Notifications
You must be signed in to change notification settings - Fork 256
Description
Intermittent Modbus Exception 3 in Production - Critical Coil Operations Lost
Description
I'm experiencing intermittent Modbus Exception 3 (Illegal data value) errors when reading from specific PLCs in production environments. This issue is critical because missing coil reads causes my application to lose core functionality (gate control actions are not executed).
The same application code works reliably with most clients (15+ PLCs), but with some specific clients, Exception 3 errors occur frequently, disrupting operations.
Environment
- Library Version:
[email protected] - Node.js Version:
v18 - Protocol: Modbus TCP
- Runtime: Node.js in Docker containers
- Network: Production environments with varying network quality
- Scale: 20+ PLCs running the same codebase
The Problem
Error Message
Error: Modbus exception 3: Illegal data value (value cannot be written to this register)
at ModbusRTU._onReceive (/app/node_modules/modbus-serial/index.js:474:21)
at TcpPort.emit (node:events:517:28)
at TcpPort.emit (node:domain:489:12)
at Socket.<anonymous> (/app/node_modules/modbus-serial/ports/tcpport.js:123:22)
...
Frequency & Pattern
- ❌ Occurs frequently on ~5 specific client PLCs (approximately 25% of deployments)
- ✅ Rarely/never occurs on 15+ other client PLCs using identical code
- ❌ Most common when reading coils, but also happens with discrete inputs and holding registers
- ❌ Intermittent - not every request fails, making it hard to reproduce consistently
⚠️ More frequent during peak hours or suspected network congestion periods
Critical Impact
I cannot afford to lose coil operations - when a coil read fails with Exception 3, my application misses gate control actions, which is the core functionality of the system. This results in:
- Gates not opening when vehicles arrive
- Operational delays and safety concerns
- Lost revenue and customer complaints
Code Sample
const ModbusRTU = require("modbus-serial");
const client = new ModbusRTU();
// Connection setup
client.setTimeout(5000); // Increased from default 1000ms
await client.connectTCP(plcHost, { port: 502 });
client.setID(1);
// Polling loop (runs every 1 second)
setInterval(async () => {
try {
// Read discrete inputs
const inputs = await client.readDiscreteInputs(0, 7);
await new Promise(resolve => setTimeout(resolve, 100)); // Delay between reads
// Read coils - THIS IS WHERE EXCEPTION 3 OCCURS MOST
const coils = await client.readCoils(0, 7);
await new Promise(resolve => setTimeout(resolve, 100));
// Read holding registers
const registers = await client.readHoldingRegisters(0, 7);
} catch (error) {
console.error('Modbus error:', error.message, error.modbusCode);
// Exception 3 here means lost gate control action!
}
}, 1000);Investigation & Testing
Network Quality Simulation
I used Clumsy (Windows network emulator) to simulate poor network conditions on my local development environment:
| Condition | Setting | Result |
|---|---|---|
| Latency (Lag) | 200-500ms | ✅ No errors - works fine |
| Packet loss (Drop) | 5-10% | ✅ No errors - works fine |
| Data corruption (Tamper) | Enabled | ❌ Reproduced some problem but not exactly the same issue occurring on production! |
Key Finding: With Tamper enabled (random byte corruption), I got similar errors:
Data length error, expected 7 got 71
Modbus exception 3: Illegal data value
Analysis: The byte 0x07 was corrupted to 0x47, causing the length field to be interpreted incorrectly. This strongly suggests network packet corruption is the root cause in production environments.
What I've Tried
1. Increased Timeout ⚠️ Minimal Improvement
client.setTimeout(5000); // From default 1000msResult: Reduced frequency slightly, but Exception 3 errors still occur regularly
2. Added Delays Between Reads ⚠️ Minimal Improvement
await new Promise(resolve => setTimeout(resolve, 100));Result: No significant improvement - errors persist
3. Connection Management ❌ No Change
- Tried reconnecting on errors
- Tried closing/reopening connection periodically
- Tried reducing polling frequency
Result: None of these strategies eliminated the issue
4. Error Handling (Current Workaround) ⚠️ Not Acceptable
catch (error) {
if (error.modbusCode === 3) {
// Keep previous values - BUT THIS MEANS MISSING COIL ACTIONS!
logger.warn('Exception 3 - maintaining previous state');
}
}Problem: This prevents crashes but I lose critical coil operations, which is unacceptable for gate control systems
Detailed Error Logs from Production
=== DETAILED ERROR DURING COIL READ ===
Requested address: 0
Requested quantity: 7
Message: Modbus exception 3: Illegal data value (value cannot be written to this register)
Modbus Code: 3
errno: undefined
Connection state isOpen: true
=========================================
Observation: The connection remains open (isOpen: true), suggesting this isn't a connection failure but rather a protocol-level issue.
Questions & Feature Requests
1. Does the library validate TCP frame integrity?
- Is there CRC or checksum validation for Modbus TCP frames?
- If corruption is detected, does the library automatically request a retransmission?
2. Could there be a race condition or buffer issue?
- Multiple reads in quick succession (every 1 second)
- Could previous response data remain in the buffer and interfere with current requests?
- Should I be explicitly flushing buffers between reads?
3. Network-specific configurations?
Are there any undocumented settings or best practices for:
- Unreliable networks with potential data corruption
- High-latency connections
- Industrial environments with electrical interference
4. Frame-level debug logging?
Is there a way to enable detailed logging to capture:
- Raw bytes sent (request frame)
- Raw bytes received (response frame)
- Parsing/validation steps
- Exact point where Exception 3 is determined
This would help identify if corruption happens in transit or if the PLC is actually rejecting valid requests.
5. Feature Request: Automatic Retry Logic ⭐
For TCP connections (where transient errors are expected), would it be possible to add:
Option A - Built-in Retry Configuration:
client.setRetryConfig({
maxRetries: 3,
retryDelay: 50, // ms
retryOnExceptions: [3], // Retry on exception 3
exponentialBackoff: true
});Option B - Per-Operation Retry:
const coils = await client.readCoils(0, 7, {
retry: { attempts: 3, delay: 50 }
});Why this is critical:
- Exception 3 due to network corruption is transient - a retry would likely succeed
- This would prevent lost coil operations in mission-critical applications
- Other Modbus libraries have similar retry mechanisms
6. Better Error Differentiation
Can the library distinguish between:
- "PLC rejected the request" (legitimate Exception 3 - should not retry)
- "Network corruption detected" (should retry)
For example, if a Data length error precedes Exception 3, that's clearly network corruption and should trigger an automatic retry.
Additional Context
- Same codebase deployed across 20+ client sites
- Works perfectly at 75% of sites - no issues for months
- 25% of sites experience frequent Exception 3 errors
- Issues correlate with older network infrastructure and industrial environments
- Running in production where I cannot control network quality
- Gate control is mission-critical - missed coil operations = operational failure
- Budget constraints prevent network infrastructure upgrades at problematic sites
Request for Guidance
Would greatly appreciate:
- Confirmation that this is likely network corruption based on the Clumsy testing
- Any configuration options I might be missing
- Consideration of adding retry logic to the library
- Frame-level debugging capabilities to capture exact corruption patterns
Thank you for maintaining this excellent library! 🙏
Related Information:
- Connection type: Modbus TCP (persistent connection, not reconnecting per read)
- Docker container network mode: bridge
- No firewalls or proxies between application and PLCs
- PLCs are from various manufacturers (Siemens, Allen-Bradley, Schneider)