Edge
Frontend
Cloudflare
Argo routing, HTTP/3
App
Streaming SSR, 500ms latency
SERP
Search results, knowledge panels, AI assistant
Ingestion
Crawling
Fleet
50K pages/sec, multi-level stochastic queues
Queue
queued
Custom RocksDB-based queue, 300K ops/sec
Processing
Text
Parser
WHATWG spec, semantic extraction
Normalizer
Chrome removal, canonicalization
Chunker
spaCy sentencizer, context preservation
Statement chainer
DistilBERT classifier, anaphora resolution
Compute
GPU
Runpod
200+ RTX 4090s, SBERT model, 100K embeddings/sec, 90% utilization
CPU
Query vectorizer
Real-time query embeddings
Storage
Records + blobs
Distributed RocksDB
64 shards, 82TB SSDs, BlobDB, 200K writes/sec, xxHash routing
Vector index
Distributed HNSW
64 shards, 4TB RAM, 3B embeddings, parallel queries
Knowledge graph
Wikipedia
Extracts, images
Wikidata
Entity graph, properties
Infrastructure
Service mesh
mTLS + HTTP/2
Universal auth, encryption, MessagePack RPC
CoreDNS
Internal DNS resolution, service discovery
systemd
cgroups, journald