Panic in retryDelay due to integer overflow on x86 architecture

### **Title: Panic in `retryDelay` due to integer overflow on x86 architecture**

**Bug Report**

### Description

When using the `openai-go` client with a high number of retries, a panic can occur due to an integer overflow in the `retryDelay` function. This issue appears to be specific to the x86 architecture; it does not occur on ARM (e.g., Apple M-series chips).

The panic occurs with the message: `panic: invalid argument to Int63n`

Here is the stack trace from our server logs:

```
panic: invalid argument to Int63n

goroutine 74428 [running]:
math/rand.(*Rand).Int63n(0x6a80?, 0x3ff0000000000000?)
/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.1.linux-amd64/src/math/rand/rand.go:122 +0xcb
math/rand.Int63n(0xe000000000000000)
/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.1.linux-amd64/src/math/rand/rand.go:454 +0x25
github.com/openai/openai-go/internal/requestconfig.retryDelay(0x1cca420?, 0x23)
/root/go/pkg/mod/github.com/openai/openai-go@v1.12.0/internal/requestconfig/requestconfig.go:373 +0x91
github.com/openai/openai-go/internal/requestconfig.(*RequestConfig).Execute(0xc004e2d040)
/root/go/pkg/mod/github.com/openai/openai-go@v1.12.0/internal/requestconfig/requestconfig.go:466 +0x5a5
github.com/openai/openai-go/internal/requestconfig.ExecuteNewRequest({0x21855b0?, 0xc0046b2fc0?}, {0x1e7df52?, 0x521518?}, {0x1e98cb3?, 0x5?}, {0x1e55720?, 0xc004964c08?}, {0x19c6780, 0xc00020a0a8}, ...)
/root/go/pkg/mod/github.com/openai/openai-go@v1.12.0/internal/requestconfig/requestconfig.go:562 +0x9b
github.com/openai/openai-go.(*ChatCompletionService).New(_, {_, _}, {{0xc006281400, 0x2, 0x2}, {0xc000813590, 0xe}, {0x0, 0x0, ...}, ...}, ...)
/root/go/pkg/mod/github.com/openai/openai-go@v1.12.0/chatcompletion.go:66 +0x16a
...
```

### Root Cause Analysis

The panic originates in the `retryDelay` function in `internal/requestconfig/requestconfig.go`:

```go
func retryDelay(res *http.Response, retryCount int) time.Duration {
    // ...
	maxDelay := 8 * time.Second
	delay := time.Duration(0.5 * float64(time.Second) * math.Pow(2, float64(retryCount)))
	if delay > maxDelay {
		delay = maxDelay
	}

	jitter := rand.Int63n(int64(delay / 4)) // Panics here
	delay -= time.Duration(jitter)
	return delay
}
```

The `math/rand.Int63n` function panics if its argument is less than or equal to 0. The issue arises from this line:
`delay := time.Duration(0.5 * float64(time.Second) * math.Pow(2, float64(retryCount)))`

When `retryCount` is a large number (e.g., 48 or higher), `math.Pow` returns a very large `float64`. When this float is converted to `time.Duration` (which is an `int64`), the behavior differs by architecture:
*   **On x86 (amd64):** The conversion overflows, resulting in a large negative `int64` value for `delay`. Consequently, `delay / 4` is also negative, causing `rand.Int63n` to panic.
*   **On ARM (arm64):** The conversion from a large float to `int64` "saturates" at `math.MaxInt64` instead of overflowing. This prevents `delay` from becoming negative, and the code does not panic.

We encountered this in a long-running offline processing service where we set `MaxRetries` to a high value (e.g., 100) to ensure completion despite potential rate limiting.

### Steps to Reproduce

This panic can be reliably reproduced on an x86 machine using a fuzz test.

1.  Create a test file (e.g., `retry_fuzz_test.go`):

    ```go
    package requestconfig_test

    import (
    	"math"
    	"math/rand"
    	"testing"
    	"time"
    )

    // Simplified version of the internal retryDelay for testing
    func retryDelay(t *testing.T, retryCount uint) time.Duration {
    	maxDelay := 8 * time.Second
    	// This is the problematic line
    	delay := time.Duration(0.5 * float64(time.Second) * math.Pow(2, float64(retryCount)))
    	if delay > maxDelay {
    		delay = maxDelay
    	}

    	if delay/4 <= 0 {
    		// This demonstrates the overflow on x86.
    		// On x86, for retryCount=48, delay becomes a large negative number.
    		t.Logf("retryCount=%d, delay=%v, delay/4=%v", retryCount, delay, delay/4)
    	}

    	jitter := rand.Int63n(int64(delay / 4))
    	delay -= time.Duration(jitter)
    	return delay
    }

    func FuzzRetryDelay(f *testing.F) {
    	f.Fuzz(func(t *testing.T, a uint) {
    		// Limit 'a' to a reasonable range to find the issue faster.
    		retryCount := a % 100 
    		retryDelay(t, retryCount)
    	})
    }
    ```

2.  Run the fuzz test on an **x86/amd64** machine. It will quickly fail.

    ```bash
    go test -fuzz=Fuzz -fuzztime=10s .
    ```

3.  **Failing Output (on x86):**

    ```
    --- FAIL: FuzzRetryDelay (0.00s)
        --- FAIL: FuzzRetryDelay (0.00s)
            rand_fuzz_test.go:21: retryCount=48, delay=-2562047h47m16.854775808s, delay/4=-2305843009213693952
            testing.go:1693: panic: invalid argument to Int63n
                goroutine 24 [running]:
                ...
                math/rand.Int63n(0xe000000000000000)
                ...
    FAIL
    exit status 1
    ```

### Suggested Fix

The exponential backoff calculation should guard against this overflow. A simple fix would be to cap the `retryCount` used in the `math.Pow` calculation to a safe value that won't overflow `int64` when converted to nanoseconds.

For example, capping `retryCount` at 30 would prevent the overflow:

```go
func retryDelay(res *http.Response, retryCount int) time.Duration {
	// ...
    effectiveRetryCount := retryCount
    // Cap retryCount to prevent int64 overflow from math.Pow
    if effectiveRetryCount > 30 {
        effectiveRetryCount = 30
    }
	delay := time.Duration(0.5 * float64(time.Second) * math.Pow(2, float64(effectiveRetryCount)))
	// ...
}
```
Alternatively, check if `delay` is negative after the cast and clamp it to `maxDelay`.

### Environment

*   **`openai-go` version:** `v1.12.0`
*   **Go version:** `go1.24.1`
*   **Failing Architecture:** `linux/amd64`
*   **Passing Architecture:** `darwin/arm64` (Apple M1/M2/M3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Panic in retryDelay due to integer overflow on x86 architecture #489

Title: Panic in `retryDelay` due to integer overflow on x86 architecture

Description

Root Cause Analysis

Steps to Reproduce

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Panic in retryDelay due to integer overflow on x86 architecture #489

Description

Title: Panic in retryDelay due to integer overflow on x86 architecture

Description

Root Cause Analysis

Steps to Reproduce

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Title: Panic in `retryDelay` due to integer overflow on x86 architecture