AI Gateway Management for Agent Manager


### Discussed in https://github.com/wso2/agent-manager/discussions/285

<div type='discussions-op-text'>

<sup>Originally posted by **menakaj** February  5, 2026</sup>
### Problem

The Agent Manager currently lacks a comprehensive gateway management system that can:

1. **Support multiple deployment models**: Organizations need flexibility to deploy gateways either on-premise (self-managed) or in the cloud (managed service), but the current architecture is tightly coupled to a single deployment approach.

2. **Manage gateway lifecycle**: There is no centralized system to register, configure, monitor, and decommission gateway instances across different environments (development, staging, production).

3. **Enable environment-based organization**: Gateways need to be logically grouped by deployment stage (dev/staging/prod) to support environment-specific deployment strategies and configurations.

4. **Support future gateway types**: The system needs to handle both EGRESS gateways (for AI traffic) and future INGRESS gateways (for traditional APIs) without architectural changes.

5. **Abstract deployment complexity**: Business logic should not be tightly coupled to specific gateway deployment mechanisms (HTTP REST, cloud APIs, etc.), making the system inflexible and hard to evolve.



### User Stories

### Platform Administrator

- As a platform administrator, I want to register on-premise gateway instances with their control plane URLs so that Agent Manager can orchestrate deployments to them.

- As a platform administrator, I want to provision cloud-managed gateways through Agent Manager so that I don't need to interact with cloud provider APIs directly.

- As a platform administrator, I want to organize gateways into environments (dev, staging, prod) so that I can deploy resources to all gateways in an environment with a single operation.

- As a platform administrator, I want to monitor gateway health and status from a central location so that I can quickly identify and troubleshoot issues.

- As a platform administrator, I want to switch between on-premise and cloud deployment modes via configuration so that I can migrate deployment models without code changes.

### Development Team

- As a developer, I want to deploy AI resources to specific environments (e.g., "deploy to all staging gateways") so that I can test changes before production.

- As a developer, I want the system to validate gateway connectivity before allowing registration so that I don't accidentally configure unreachable gateways.

- As a developer, I want to query gateway metrics (resource count, request rates) so that I can understand gateway utilization.

### SRE/Operations

- As an SRE, I want to decommission gateways safely with checks for active deployments so that I don't accidentally break running services.

- As an SRE, I want to update gateway configurations (URLs, display names) so that I can respond to infrastructure changes.

- As an SRE, I want to see which AI resources are deployed to which gateways so that I can troubleshoot deployment issues.



### Existing Solutions

N/A

### Proposed Solution

## Overview

This proposal introduces a **pluggable gateway management architecture** for Agent Manager that abstracts gateway operations behind a unified interface (`IGatewayAdapter`). The design supports multiple deployment models (on-premise, cloud, custom) through adapter implementations, while maintaining a single, clean API for gateway lifecycle management.

### Key Concepts

**Central Control Plane Pattern**: Agent Manager serves as the single source of truth for gateway configurations and orchestrates all gateway operations, regardless of deployment model.

**Pluggable Adapter Architecture**: Gateway operations (register, deploy, health check) are defined through a common interface. Different adapter implementations (on-premise, cloud) handle deployment-specific details, selected at startup via configuration.

**Environment-Based Organization**: Gateways are logically grouped into environments (development, staging, production) with many-to-many relationships, enabling environment-level deployment strategies.

**Gateway Types**: The system distinguishes between EGRESS gateways (AI/LLM traffic) and future INGRESS gateways (traditional API traffic) at the data model level.

**Single Active Adapter**: Only one adapter type runs at a time (either on-premise OR cloud), configured at application startup. No runtime switching between deployment models.

## Design

### Architecture Changes

#### Three-Layer Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    AGENT MANAGER CORE                        │
│              (Deployment-Agnostic Business Logic)            │
│                                                              │
│  ┌────────────────────────────────────────────────────┐    │
│  │  Gateway Management Service                         │    │
│  │  - Gateway CRUD operations                          │    │
│  │  - Environment management                           │    │
│  │  - Health monitoring                                │    │
│  │  - Metrics aggregation                              │    │
│  └────────────────────────────────────────────────────┘    │
│                         ↓                                    │
│  ┌────────────────────────────────────────────────────┐    │
│  │  Gateway Abstraction Layer (IGatewayAdapter)       │    │
│  │  - RegisterGateway()                                │    │
│  │  - ListGateways()                                   │    │
│  │  - CheckHealth()                                    │    │
│  │  - GetMetrics()                                     │    │
│  └────────────────────────────────────────────────────┘    │
│                         ↓                                    │
│  ┌────────────────────────────────────────────────────┐    │
│  │  Adapter Selection (Configuration-Based)            │    │
│  │  ┌──────────────┐  OR  ┌──────────────┐           │    │
│  │  │ On-Premise   │      │    Cloud     │           │    │
│  │  │   Adapter    │      │   Adapter    │           │    │
│  │  └──────────────┘      └──────────────┘           │    │
│  └────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────┘
```

**Layer 1 - Business Logic**: Environment management, gateway CRUD operations, health monitoring. This layer is completely agnostic to deployment model.

**Layer 2 - Abstraction Interface**: `IGatewayAdapter` interface defines all gateway operations. Business logic depends only on this interface, never on concrete implementations.

**Layer 3 - Adapter Implementations**: Concrete adapters handle deployment-specific details. Only one adapter is active at runtime, selected via configuration.

#### Adapter Pattern

The adapter pattern enables multiple deployment models without coupling business logic to deployment details:

- **IGatewayAdapter Interface**: Defines 20+ methods covering gateway lifecycle, health checks, and metrics
- **AdapterFactory**: Creates the appropriate adapter based on configuration (`type: on-premise | cloud | custom`)
- **Common Data Models**: Gateway, HealthStatus, GatewayMetrics shared across all adapters
- **Dependency Injection**: Single adapter instance injected into all services at startup

### API Surface

#### Environment Management Endpoints

```
POST   /api/v1/environments
  Body: { name, displayName, description }
  Response: { uuid, name, displayName, createdAt }

GET    /api/v1/environments
  Query: ?organizationId=<uuid>
  Response: { environments: [...] }

GET    /api/v1/environments/{id}
  Response: { uuid, name, displayName, description, createdAt, updatedAt }

PUT    /api/v1/environments/{id}
  Body: { displayName, description }
  Response: { uuid, name, displayName, updatedAt }

DELETE /api/v1/environments/{id}
  Response: 204 No Content
```

#### Gateway Management Endpoints

```
POST   /api/v1/gateways
  Body: {
    name, displayName, gatewayType: "EGRESS" | "INGRESS",
    environmentIds: [<uuid>],
    adapterConfig: {
      // On-premise: { controlPlaneUrl: "http://gw:9090" }
      // Cloud: { region: "us-east-1", tier: "premium", autoScaling: {...} }
    }
  }
  Response: { uuid, name, status, endpoint, createdAt }

GET    /api/v1/gateways
  Query: ?type=EGRESS&environment=<uuid>&status=ACTIVE
  Response: { gateways: [...] }

GET    /api/v1/gateways/{id}
  Response: { uuid, name, type, status, endpoint, environments: [...] }

PUT    /api/v1/gateways/{id}
  Body: { displayName, adapterConfig }
  Response: { uuid, name, updatedAt }

DELETE /api/v1/gateways/{id}
  Response: 204 No Content (fails if active deployments exist)

POST   /api/v1/gateways/{id}/environments/{envId}
  Response: 201 Created

DELETE /api/v1/gateways/{id}/environments/{envId}
  Response: 204 No Content

GET    /api/v1/gateways/{id}/environments
  Response: { environments: [...] }

GET    /api/v1/environments/{id}/gateways
  Response: { gateways: [...] }

GET    /api/v1/gateways/{id}/health
  Response: { status, lastHeartbeat, responseTime, errorMessage }

GET    /api/v1/gateways/{id}/metrics
  Response: { resourceCount, providerCount, proxyCount, requestRate, errorRate }
```

#### Request/Response Schemas

**Create Gateway Request** (On-Premise):
```json
{
  "name": "prod-gateway-1",
  "displayName": "Production Gateway 1",
  "gatewayType": "EGRESS",
  "environmentIds": ["env-prod-uuid"],
  "adapterConfig": {
    "controlPlaneUrl": "http://gateway-1.internal:9090"
  }
}
```

**Create Gateway Request** (Cloud):
```json
{
  "name": "cloud-gateway-us-east",
  "displayName": "Cloud Gateway US East",
  "gatewayType": "EGRESS",
  "environmentIds": ["env-prod-uuid"],
  "adapterConfig": {
    "region": "us-east-1",
    "tier": "premium",
    "autoScaling": {
      "minInstances": 2,
      "maxInstances": 10
    }
  }
}
```

**Gateway Response**:
```json
{
  "uuid": "gw-uuid-123",
  "organizationId": "org-uuid",
  "name": "prod-gateway-1",
  "displayName": "Production Gateway 1",
  "type": "EGRESS",
  "status": "ACTIVE",
  "endpoint": "http://gateway-1.internal:9090",
  "region": "us-east-1",
  "environments": [
    {
      "uuid": "env-prod-uuid",
      "name": "production",
      "displayName": "Production"
    }
  ],
  "metadata": {},
  "createdAt": "2025-02-05T10:00:00Z",
  "updatedAt": "2025-02-05T10:00:00Z"
}
```

**Health Status Response**:
```json
{
  "gatewayId": "gw-uuid-123",
  "status": "ACTIVE",
  "lastHeartbeat": "2025-02-05T10:05:00Z",
  "responseTime": "45ms",
  "errorMessage": null,
  "checkedAt": "2025-02-05T10:05:30Z"
}
```

**Gateway Metrics Response**:
```json
{
  "gatewayId": "gw-uuid-123",
  "resourceCount": 15,
  "providerCount": 5,
  "proxyCount": 8,
  "mcpCount": 2,
  "requestRate": 125.5,
  "errorRate": 0.8,
  "averageLatency": "120ms",
  "timestamp": "2025-02-05T10:05:30Z"
}
```

### Data Model Changes

#### New Tables

**environments**
```sql
CREATE TABLE environments (
    uuid UUID PRIMARY KEY,
    organization_uuid UUID NOT NULL REFERENCES organizations(uuid),
    name VARCHAR(64) NOT NULL,              -- "development", "staging", "production"
    display_name VARCHAR(128) NOT NULL,
    description TEXT,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW(),

    UNIQUE(organization_uuid, name)
);

CREATE INDEX idx_environments_org ON environments(organization_uuid);
```

**Purpose**: Logical grouping of gateways by deployment stage. Enables environment-level operations like "deploy to all production gateways."

#### Modified Tables

**gateways** (Extended)
```sql
-- Add new columns to existing gateways table
ALTER TABLE gateways
  ADD COLUMN gateway_type VARCHAR(16) NOT NULL DEFAULT 'INGRESS'
    CHECK (gateway_type IN ('INGRESS', 'EGRESS')),
  ADD COLUMN control_plane_url TEXT,
  ADD COLUMN status VARCHAR(32) NOT NULL DEFAULT 'ACTIVE'
    CHECK (status IN ('ACTIVE', 'INACTIVE', 'PROVISIONING', 'ERROR')),
  ADD COLUMN region VARCHAR(64),
  ADD COLUMN adapter_config JSONB,
  ADD COLUMN endpoint TEXT;

-- Make control_plane_url required for EGRESS gateways
-- (enforced at application level, not database constraint)

CREATE INDEX idx_gateways_type ON gateways(gateway_type);
CREATE INDEX idx_gateways_status ON gateways(status);
CREATE INDEX idx_gateways_org_type ON gateways(organization_uuid, gateway_type);
```

**New Fields**:
- `gateway_type`: INGRESS (future traditional APIs) or EGRESS (AI resources)
- `control_plane_url`: Gateway controller endpoint (on-premise mode)
- `status`: Current gateway state (ACTIVE, INACTIVE, PROVISIONING, ERROR)
- `region`: Geographic region (cloud mode)
- `adapter_config`: JSON blob for adapter-specific configuration
- `endpoint`: Gateway API endpoint (may differ from control plane URL)

#### New Junction Tables

**gateway_environment_mappings**
```sql
CREATE TABLE gateway_environment_mappings (
    id SERIAL PRIMARY KEY,
    gateway_uuid UUID NOT NULL REFERENCES gateways(uuid) ON DELETE CASCADE,
    environment_uuid UUID NOT NULL REFERENCES environments(uuid) ON DELETE CASCADE,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),

    UNIQUE(gateway_uuid, environment_uuid)
);

CREATE INDEX idx_gem_gateway ON gateway_environment_mappings(gateway_uuid);
CREATE INDEX idx_gem_environment ON gateway_environment_mappings(environment_uuid);
```

**Purpose**: Many-to-many relationship between gateways and environments. Supports:
- Single gateway in multiple environments (shared dev/test gateway)
- Multiple gateways in one environment (horizontal scaling)

### Component Interactions

#### Gateway Registration Flow (On-Premise)

```
User/CLI → Agent Manager API → Gateway Service → OnPremiseAdapter
                                                        ↓
                                      1. Validate control plane URL reachable
                                      2. Perform health check (GET /health)
                                      3. Create gateway record in PostgreSQL
                                      4. Create environment mappings
                                      5. Return gateway details
```

#### Gateway Registration Flow (Cloud)

```
User/CLI → Agent Manager API → Gateway Service → CloudAdapter
                                                        ↓
                                      1. Call cloud provider API
                                      2. Wait for gateway provisioning
                                      3. Create gateway record in PostgreSQL
                                      4. Create environment mappings
                                      5. Return gateway details with endpoint
```

#### Health Check Flow

```
Background Job (every 30s) → Gateway Service → Adapter.CheckHealth()
                                                        ↓
                          On-Premise: HTTP GET to control plane /health
                          Cloud: Query cloud provider status API
                                                        ↓
                                      Update gateway status in PostgreSQL
                                      Record last heartbeat timestamp
```

#### Environment-Based Query Flow

```
API Request: GET /api/v1/gateways?environment=prod&type=EGRESS
                ↓
Gateway Service → Database Query:
  SELECT g.* FROM gateways g
  JOIN gateway_environment_mappings gem ON g.uuid = gem.gateway_uuid
  JOIN environments e ON gem.environment_uuid = e.uuid
  WHERE e.name = 'production'
    AND g.gateway_type = 'EGRESS'
    AND g.status = 'ACTIVE'
                ↓
Return filtered gateway list
```

#### Adapter Selection Flow (Startup)

```
Application Startup → Load Configuration (YAML/ENV)
                                ↓
                    Gateway Adapter Type = "on-premise" | "cloud"
                                ↓
                    AdapterFactory.CreateAdapter(config)
                                ↓
                    Initialize adapter (HTTP client / cloud SDK)
                                ↓
                    Inject adapter into GatewayService
                                ↓
                    Application ready with single active adapter
```

### Configuration Schema

**On-Premise Configuration**:
```yaml
gateway:
  adapter:
    type: on-premise
    onPremise:
      defaultTimeout: 30s
      retryPolicy:
        maxAttempts: 3
        backoffMultiplier: 2
        maxBackoff: 30s
      healthCheck:
        interval: 30s
        timeout: 5s
```

**Cloud Configuration**:
```yaml
gateway:
  adapter:
    type: cloud
    cloud:
      provider: wso2-cloud
      apiEndpoint: https://api.wso2.cloud/v1
      authentication:
        type: oauth2
        clientId: ${CLOUD_CLIENT_ID}
        clientSecret: ${CLOUD_CLIENT_SECRET}
      defaultRegion: us-east-1
```

### Database Migration Strategy

**Migration 001: Add Environments**
- Create `environments` table
- Add indexes

**Migration 002: Extend Gateways**
- Add `gateway_type`, `control_plane_url`, `status`, `region`, `adapter_config`, `endpoint` columns
- Add indexes on new columns

**Migration 003: Gateway-Environment Mappings**
- Create `gateway_environment_mappings` table
- Add foreign keys and indexes

**Migration 004: Backfill Existing Gateways**
- Set default `gateway_type` = 'INGRESS' for existing gateways
- Create default "production" environment
- Map all existing gateways to production environment

**Migration 005: Add Triggers**
- Add trigger for `updated_at` timestamp on gateways and environments

## Out of Scope

### Not Included in This Proposal

1. **Gateway Configuration Management**: Detailed gateway configuration (routes, policies, rate limits) is handled by AI Resource Management. This proposal only covers gateway instance registration and lifecycle.

2. **xDS Protocol Details**: Agent Manager uses gateway-controller's REST API. xDS implementation remains in gateway-controller.

3. **Gateway-to-Gateway Communication**: This proposal does not cover service mesh or gateway federation scenarios.

4. **Multi-Tenancy Isolation**: Organization-level isolation is assumed to exist. This proposal does not add new multi-tenancy mechanisms.

5. **Gateway Autoscaling**: While cloud adapters support autoscaling configuration, the autoscaling logic itself is handled by cloud providers.

6. **Gateway Monitoring/Observability**: Advanced monitoring (logs, traces, detailed metrics) is out of scope. Only basic health checks and resource counts are included.

7. **Gateway Authentication/Authorization**: This proposal assumes gateways have existing auth mechanisms. It does not introduce new auth flows between Agent Manager and gateways.

8. **WebSocket Connections**: Current implementation focuses on REST API interactions. WebSocket-based real-time updates are not included.

9. **Gateway Backup/Restore**: Disaster recovery and gateway configuration backup/restore are not covered.

10. **Custom Adapter Plugin System**: While the architecture supports custom adapters, a formal plugin mechanism (dynamic loading, versioning) is not included in the initial implementation.



### Alternatives Considered

_No response_

### Open Questions

_No response_

### Milestones

## Milestones

| Phase | Scope | Target |
|-------|-------|--------|
| **Phase 1: Database & Models** | Create environments table, extend gateways table, create gateway-environment mappings, write migrations, create Go models |Database schema deployed |
| **Phase 2: Adapter Interface** | Design IGatewayAdapter interface, create common types (Gateway, HealthStatus, GatewayMetrics), implement AdapterFactory | Adapter interface complete |
| **Phase 3: On-Premise Adapter** | Implement OnPremiseAdapter with HTTP client, gateway registration, lifecycle operations, health checks, retry logic | On-premise adapter functional |
| **Phase 4: Services & APIs** | Implement Environment and Gateway repositories/services, create REST controllers (17 endpoints), add validation | All APIs live |
| **Phase 5: Configuration** | Create YAML configuration, implement adapter selection at startup, Docker Compose setup, deployment scripts | System deployable |


---

## Success Criteria

- [ ] On-premise adapter communicates with gateway-controller
- [ ] All 17 API endpoints functional
- [ ] Environment-gateway many-to-many relationship working
- [ ] Application starts with configured adapter type
- [ ] End-to-end tests pass

---

## Tasks

- [x] #291
- [x] #292
- [x] #293
- [x] #294

---

## Risks

| Risk | Mitigation |
|------|------------|
| Gateway-controller API changes | Version API calls, backward compatibility |
| On-premise connectivity issues | Retry logic, clear error messages |</div>

Phase	Scope	Target
Phase 1: Database & Models	Create environments table, extend gateways table, create gateway-environment mappings, write migrations, create Go models	Database schema deployed
Phase 2: Adapter Interface	Design IGatewayAdapter interface, create common types (Gateway, HealthStatus, GatewayMetrics), implement AdapterFactory	Adapter interface complete
Phase 3: On-Premise Adapter	Implement OnPremiseAdapter with HTTP client, gateway registration, lifecycle operations, health checks, retry logic	On-premise adapter functional
Phase 4: Services & APIs	Implement Environment and Gateway repositories/services, create REST controllers (17 endpoints), add validation	All APIs live
Phase 5: Configuration	Create YAML configuration, implement adapter selection at startup, Docker Compose setup, deployment scripts	System deployable

Risk	Mitigation
Gateway-controller API changes	Version API calls, backward compatibility
On-premise connectivity issues	Retry logic, clear error messages

AI Gateway Management for Agent Manager #287

Description

Discussed in #285

Problem

User Stories

Platform Administrator

Development Team

SRE/Operations

Existing Solutions

Proposed Solution

Overview

Key Concepts

Design

Architecture Changes

Three-Layer Architecture

Adapter Pattern

API Surface

Environment Management Endpoints

Gateway Management Endpoints

Request/Response Schemas

Data Model Changes

New Tables

Modified Tables

New Junction Tables

Component Interactions

Gateway Registration Flow (On-Premise)

Gateway Registration Flow (Cloud)

Health Check Flow

Environment-Based Query Flow

Adapter Selection Flow (Startup)

Configuration Schema

Database Migration Strategy

Out of Scope

Not Included in This Proposal

Alternatives Considered

Open Questions

Milestones

Milestones

Success Criteria

Tasks

Risks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions