Modern cloud environments require defense layers that scale seamlessly without introducing unacceptable latency. When dealing with hundreds of millions of requests daily, traditional proxies often become the architectural bottleneck.
"Security at the edge should not come at the cost of performance. If your WAF introduces 50ms of latency, developers will inevitably find a way to bypass it." — Core Architectural Principle
To solve this, we moved away from static configurations and built a unified API Gateway leveraging Cloudflare's Pingora framework. By writing our proxy layer entirely in Rust, we achieved memory safety guarantees out-of-the-box.
The Architectural Shift
Moving from an NGINX-based legacy system to a custom pingora-proxy implementation required rethinking our entire ingress pipeline. We needed a system that could handle:
- Dynamic Routing: Configuration updates without reloading the daemon.
- Memory Safety: Eliminating buffer overflows and dangling pointers common in C/C++ proxies.
- Deep Customization: Integrating our custom x509 validation logic directly at the socket level.
System Topology
Below is a high-level overview of how traffic flows through the new edge layer before hitting the internal Kubernetes clusters. Notice how the anomaly detection engine sits parallel to the main traffic flow.
(Note: In the live layout, this image breaks out of the reading column to span across the screen, providing a wide canvas for your architectural diagrams.)
Implementation Details
One of the core requirements was centralizing authentication. Let's look at how a custom authentication filter is implemented within the request_filter lifecycle hook. Notice how the auth_token variable is extracted and evaluated dynamically without blocking the main thread.
use async_trait::async_trait;
use pingora_core::Result;
use pingora_proxy::{ProxyHttp, Session};
pub struct SecurityGateway;
#[async_trait]
impl ProxyHttp for SecurityGateway {
type CTX = ();
fn new_ctx(&self) -> Self::CTX {}
async fn request_filter(&self, session: &mut Session, _ctx: &mut Self::CTX) -> Result<bool> {
let headers = session.req_header();
// Extract and validate x509 certificate or JWT token
if let Some(auth_header) = headers.headers.get("Authorization") {
if validate_token(auth_header).await {
return Ok(false); // Token valid, proceed to upstream
}
}
// Reject unauthorized attempts directly at the edge
let _ = session.respond_error(401).await;
Ok(true)
}
}Performance Comparison
The transition yielded significant improvements in both resource consumption and tail latency (p99). Here is the benchmark comparison running on c2-standard-16 GCP instances:
| Metric | Legacy Gateway (C++) | Pingora Gateway (Rust) | Delta |
|---|---|---|---|
| RPS per Node | 12,500 | 45,000 | +260% |
| Memory Footprint | 2.4 GB | 350 MB | -85% |
| p99 Latency | 18 ms | 4 ms | -77% |
| CVEs (Last 2 Yrs) | 14 | 0 | -100% |
Deployment Phases
Rolling this out to a global infrastructure serving 20M+ users required a meticulous, phased approach to ensure zero downtime:
- Shadow Mode: Duplicating 10% of ingress traffic to the new Rust nodes without returning responses to the client.
- Canary Release: Routing internal staging traffic through the new gateway.
- Regional Rollout: Shifting live production traffic in the EU-West region first, monitoring error rates using our Wazuh SIEM integration.
- Global Enforcement: Full deprecation of the legacy pipeline.
Building with Rust and Pingora didn't just solve our scaling issues; it fundamentally transformed our edge from a static router into a programmable, intelligent defense layer.