Divine Igbinoba / Case Study / Stellus

Stellus.

Backend Systems & Web3 Engineer · Nov 2024 — Apr 2025

Product
Atlas Blockchain API Platform
Domain
Blockchain Infrastructure-as-a-Service
Chains
Polygon · BSC · EVM
Duration
6 months
FastAPI Python Celery Web3.py Redis PostgreSQL Prometheus Docker Azure AES-GCM PBKDF2 gevent Alchemy QuickNode

Overview

Atlas Project

At Stellus I built Atlas, a blockchain API platform where developers deploy, manage, and interact with smart contracts across EVM networks.

We needed an asynchronous system that absorbs extreme blockchain volatility and RPC rate limits without propagating latency, a transaction pipeline that reliably executes on-chain operations despite slow block confirmations and network fragmentation, and a cryptographic custody mechanism that secures EVM private keys with zero margin for error.

I owned the entire backend and protocol layer from inception, with no existing infrastructure to build on.

System Architecture

System Flow

Atlas is built around three independent execution flows: a modular authentication system supporting both OAuth and Web3 wallet signatures; an async smart contract deployment pipeline that never blocks the HTTP layer; and a cache-first charting system that absorbs high-frequency dashboard queries without flooding the database or RPC providers.

A — Modular Authentication Flow

Client Auth Router (v1) FastAPI Auth Service DB (Postgres) Redis (Session) POST /auth/login (username, password) authenticate_user() Fetch and verify User hash Valid User Record Generate JWT & Refresh Tokens Return Tokens Store/Validate Session States Server sets secure HttpOnly cookie for Refresh Token 200 OK + Set-Cookie (Refresh Token) + Access JWT Trigger handle_auth_notification() → WS Client Auth Router (v1) FastAPI Auth Service DB (Postgres) Redis (Session)

Auth flow — JWT + HttpOnly refresh cookie with Redis session validation and WebSocket notification on login

B — Smart Contract Deployment Flow

User Deploy Router (v1) FastAPI DeployErc20 Service · Celery DB Web3 Node WebSocket Manager POST /erc20/deploy (name, symbol) create() → deploy(...) Record intent & update transaction state Inject logic / load pre-compiled ABI & Bytecode sign_and_send_transaction() Transaction Hash / Receipt Update Tx Status as "Deployed" Background task (signed_notification.delay) Broadcast 'Deploy Success' Notification Real-Time Event: Deploy Completed 200 OK (Contract Address Data) User Deploy Router (v1) FastAPI DeployErc20 Service · Celery DB Web3 Node WebSocket Manager

Deployment flow — async Celery task handles on-chain signing; WebSocket push confirms completion in real time

C — Contract Event & Charting Flow

Dashboard Widget Contracts Router (v1) FastAPI GeneralChart / ContractChat Service Redis Cache Postgres GET /contract/chart?freq=1d&contract_id=x get_data_chart(freq='d') Check for cached chart state alt [Cache Hit] Return fast payload Format payload Cache-Control: public (Cached Response) [Cache Miss] Aggregate time-series blocks & events Event aggregation data set(chart_state) Format payload Cache-Control: no-cache (Fresh Response) Dashboard Widget Contracts Router (v1) FastAPI GeneralChart / ContractChat Service Redis Cache Postgres

Charting flow — Redis cache-first with Cache-Control headers; on miss, Postgres aggregates time-series and primes the cache

Impact

Shipped

The numbers reflect the operational state of the system during QA.

100+
Concurrent blockchain transactions sustained
Celery / gevent worker isolation
<0.5%
RPC lockout rate under sustained load
Exponential backoff + dynamic jitter
390K
PBKDF2 iterations per key derivation
AES-GCM cryptographic key management
0
PoA consensus failures post-middleware fix
ExtraDataToPOAMiddleware injection

Technical Decisions

Solutions

Each of these was a failure mode or threat vector. None of them had an obvious default solution.

01 Celery / gevent — isolating blockchain latency +

Async Architecture · Python · Celery · gevent

Deploying a contract via HTTP triggered a Web3.py call to an external RPC provider. Response times ranged from 200ms to 45s depending on congestion. While waiting, the FastAPI worker was blocked — unavailable to serve other requests. At scale, a few slow blockchain calls exhausted the worker pool and made the API unresponsive.

The solution was task queue isolation: the API enqueues a Celery task and immediately returns a job ID. Blockchain work runs asynchronously in a dedicated worker pool, so the HTTP tier is never blocked.

I used gevent as the Celery concurrency model. gevent patches Python I/O to be non‑blocking, so a worker waiting on an RPC response yields to other tasks instead of blocking a thread. This lets one worker handle many pending on‑chain calls without spawning OS threads, preserving performance at scale.

One subtlety: Web3.py's AsyncHTTPProvider is not thread-safe. Creating a new instance per task exhausts connection pools. The solution was a Singleton pattern enforced via asyncio.Lock(): one provider instance shared across all workers, with initialization guarded by the lock.

The constraint was blockchain operations must never affect HTTP latency. The worker pool is the firewall between them.
02 AES-GCM + PBKDF2 — treating private keys as nuclear material +

Cryptography · Key Management · EVM Wallets

Atlas holds private keys for users. A private key gives absolute wallet control — if compromised, all assets are lost with no recovery. Key storage is therefore the highest‑stakes engineering decision.

Two attack surfaces: Data at rest: : an attacker with database access must not be able to use stored key material. Brute force: an attacker with an encrypted key and knowledge of the scheme must not be able to recover the plaintext in practical time.

AES-GCM secures data at rest with authenticated encryption: ciphertext is both encrypted and integrity‑verified, and dynamic nonces prevent pattern analysis.

PBKDF2 secures against brute force by deriving the AES key with 390,000 iterations and dynamic salts. Each guess requires 390,000 operations, making brute‑forcing impractical even with dedicated hardware.

The 390,000 iteration count follows OWASP’s 2023 guidance for PBKDF2‑HMAC‑SHA256, calibrated against modern GPU benchmarks.

Database encryption alone is insufficient. The encryption key derivation must also be computationally expensive, so that access to the encrypted data yields nothing without infeasible compute effort.
Attack Vector Mitigation Why It Works
DB dump / data breach AES-GCM encryption at rest Ciphertext is useless without the derived key
Brute-force on ciphertext PBKDF2 at 390,000 iterations Each guess costs 390K hash operations — GPU cracking is impractical
Pattern analysis / replay Dynamic nonces + dynamic salts Same plaintext → different ciphertext every time
Ciphertext tampering AES-GCM authentication tag Any bit flip in ciphertext causes decryption to fail with an error
03 Exponential backoff with jitter — the thundering herd problem +

Reliability Engineering · RPC Providers · HTTP 429

RPC providers (Alchemy, QuickNode) enforce rate limits. Exceeding them returns HTTP 429. Naively retrying immediately creates a retry storm — repeated failures that lock you out indefinitely.

Exponential backoff Exponential backoff fixes this: each failure doubles the wait before the next attempt (1s, 2s, 4s, 8s…), giving the rate‑limit window time to reset.

But exponential backoff alone causes a thundering herd problem: multiple workers retry on the same schedule, creating another spike that triggers rate-limit again.

Dynamic jitter solves it. Each worker adds randomness to its wait time, staggering retries across time. The jitter range scales with backoff duration, spreading retries more widely at longer waits when it matters most.

Using Jitter I eliminated retry storm, avoided herd spikes and The reduced lockout rate <0.1%.

Read more