This tutorial will guide you through setting up an AI and MCP Gateway using KubeLB with KGateway to securely manage Large Language Model (LLM) requests and MCP tool servers.
Overview
KubeLB leverages KGateway, a CNCF Sandbox project (accepted March 2025), to provide advanced AI Gateway capabilities. KGateway is built on Envoy and implements the Kubernetes Gateway API specification, offering:
- AI Workload Protection: Secure applications, models, and data from inappropriate access
- LLM Traffic Management: Intelligent routing to LLM providers with load balancing based on model metrics
- Prompt Engineering: System-level prompt enrichment and guards
- Multi-Provider Support: Works with OpenAI, Anthropic, Google Gemini, Mistral, and local models like Ollama
- Model Context Protocol (MCP) Gateway: Federates MCP tool servers into a single, secure endpoint
- Advanced Security: Authentication, authorization, rate limiting tailored for AI workloads
Key Features
AI-Specific Capabilities
- Prompt Guards: Protect against prompt injection and data leakage
- Model Failover: Automatic failover between LLM providers
- Function Calling: Support for LLM function/tool calling
- AI Observability: Detailed metrics and tracing for AI requests
- Semantic Caching: Cache responses based on semantic similarity
- Token-Based Rate Limiting: Control costs with token consumption limits
Gateway API Inference Extension
KGateway supports the Gateway API Inference Extension which introduces:
InferenceModel
CRD: Define LLM models and their endpointsInferencePool
CRD: Group models for load balancing and failover- Intelligent endpoint picking based on model performance metrics
Setup
Step 1: Enable KGateway AI Extension
Update values.yaml for KubeLB manager chart to enable KGateway with AI capabilities:
kubelb:
enableGatewayAPI: true
kubelb-addons:
enabled: true
kgateway:
enabled: true
gateway:
aiExtension:
enabled: true
Step 2: Create Gateway Specific Resources
- Deploy a Gateway resource to handle AI traffic:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: ai-gateway
namespace: kubelb
labels:
app: ai-gateway
spec:
gatewayClassName: kgateway
infrastructure:
parametersRef:
name: ai-gateway
group: gateway.kgateway.dev
kind: GatewayParameters
listeners:
- protocol: HTTP
port: 8080
name: http
allowedRoutes:
namespaces:
from: All
- Deploy a GatewayParameters resource to enable the AI extension:
apiVersion: gateway.kgateway.dev/v1alpha1
kind: GatewayParameters
metadata:
name: ai-gateway
namespace: kubelb
labels:
app: ai-gateway
spec:
kube:
aiExtension:
enabled: true
ports:
- name: ai-monitoring
containerPort: 9092
image:
registry: cr.kgateway.dev/kgateway-dev
repository: kgateway-ai-extension
tag: v2.1.0-main
service:
type: LoadBalancer
OpenAI Integration Example
This example shows how to set up secure access to OpenAI through the AI Gateway.
Step 1: Store OpenAI API Key
Create a Kubernetes secret with your OpenAI API key:
export OPENAI_API_KEY="sk-..."
kubectl create secret generic openai-secret \
--from-literal=Authorization="Bearer ${OPENAI_API_KEY}" \
--namespace kubelb
Step 2: Create Backend Configuration
Define an AI Backend that uses the secret for authentication:
apiVersion: gateway.kgateway.dev/v1alpha1
kind: Backend
metadata:
name: openai
namespace: kubelb
spec:
type: AI
ai:
llm:
provider:
openai:
authToken:
kind: SecretRef
secretRef:
name: openai-secret
namespace: kubelb
model: "gpt-3.5-turbo"
Step 3: Create HTTPRoute
Route traffic to the OpenAI backend:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-route
namespace: kubelb
spec:
parentRefs:
- name: ai-gateway
namespace: kubelb
rules:
- matches:
- path:
type: PathPrefix
value: /openai
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplaceFullPath
replaceFullPath: /v1/chat/completions
backendRefs:
- name: openai
namespace: kubelb
group: gateway.kgateway.dev
kind: Backend
Step 4: Test the Configuration
Get the Gateway’s external IP:
kubectl get gateway ai-gateway -n kubelb
export GATEWAY_IP=$(kubectl get svc -n kubelb ai-gateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
Send a test request:
curl -X POST "http://${GATEWAY_IP}/openai" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
Rate Limiting (Optional)
Add rate limiting to control costs and prevent abuse:
apiVersion: gateway.kgateway.dev/v1alpha1
kind: RateLimitPolicy
metadata:
name: openai-ratelimit
namespace: kubelb
spec:
targetRef:
kind: HTTPRoute
name: openai-route
namespace: kubelb
limits:
- requests: 100
unit: hour
MCP Gateway
Similar to the AI Gateway, you can also use agentgateway to can connect to one or multiple MCP servers in any environment.
Please follow this guide to setup the MCP Gateway: MCP Gateway
Further Reading
For advanced configurations and features: