GenixBit Genius
Architecture Whitepaper
Technical specifications documenting our fallback routing loops, latency penalty profiles, and semantic indexing systems.
Latency Exponential Moving Average (EMA)
The AI Routing layer utilizes Redis to track model latencies dynamically. Upon every successful API call, we calculate a moving average response time using an exponential decay filter ($\alpha = 0.2$):
If a model endpoint throws a rate-limit error, timeout, or HTTP 5xx exception, the router applies a temporary 10.0-second latency penalty. This automatically routes subsequent user queries away from the offline provider during failover windows.
Failover Traversal Queue
Each model ID contains a fallback chain mapping to alternative models in similar capability classes:
- gpt-4o → gemini-1.5-pro → claude-3-5-sonnet → deepseek-v3
- llama-3-70b → qwen-2.5-72b → mixtral-8x7b
When a query is dispatched, the backend traverses this queue sequentially. If a provider call fails, the transaction is immediately caught and retried on the next provider inside the current task execution context, avoiding client-side error propagation.
Vector Retrieval Mechanics
Semantic searches are executed inside PostgreSQL using the pgvector cosine distance operator (<=>). Chunks are sorted by distance and filtered by user access keys:
SELECT c.content, c.chunk_index, d.filename,
(c.embedding <=> :query_embedding) as distance
FROM document_chunks c
JOIN documents d ON c.document_id = d.id
WHERE d.user_id = :user_id
ORDER BY distance ASC
LIMIT :limit