JAM ENGINE

Compress what storage can't.

Jam is a software data plane that compresses, deduplicates, and indexes enterprise data end-to-end. 3–8× smaller archives. NVMe-bound throughput. No GPU. Drop into S3, on-prem, or appliance.

01 — THE PROBLEM

The math doesn't work.

Storage gets 25–30% cheaper per year. Enterprise data grows 60–80%. Anyone running the numbers ends up paying more in real terms each cycle to hold what they already own. The hyperscalers turn that gap into margin. Everyone else turns it into burn.

Compression at the codec level — zstd, gzip, 7zip — hit a ceiling years ago. They squeeze bytes inside a single file. They don't understand layouts, they don't deduplicate across files, and they don't index for restore. You compress, you save the file, you forget where it is, and you pay to find it again.

Hyperscaler "cold tier" pricing hides the real cost. Storing the data is cheap; egress and restore are not. You don't pay until you need the data — then you pay a lot, and you wait. The problem is not raw bytes per dollar. It is what the bytes mean and what it costs to get them back.

02 — THE APPROACH

Three principles. One software layer.

Jam sits below the application stack. Customers keep their hardware, their buckets, and their audit trail. We compress, index, and restore — deterministically, end-to-end.

01

Compress end-to-end.

Not just bytes. Layouts, repetitions, and indexes too. Result: 3–8× on enterprise workloads, layered on top of any codec the customer already runs.

02

NVMe-bound.

Encoder and decoder pin the underlying drive at hardware ceilings. Throughput, not algorithm cycles, is the limit. No GPU. Under 400 MB of RAM during encode.

03

Drop in.

S3-compatible gateway, customer-controlled storage, no rebuild required. Restore is deterministic and verified — every byte hash-checked on the way back.

02b — HOW THE TECH ACTUALLY WORKS

No GPU. No CPU bottleneck. No RAM bottleneck. Only the SSD.

Every other compression engine bottlenecks on CPU, GPU, or RAM. Jam writes directly to the storage device. The only ceiling is your SSD's read and write speed — if your drive can write 1 TB per second, Jam writes 1 TB per second. No other engine in the bracket gets out of the way like this.

01

No GPU dependency

Conventional codecs and accelerated compression frameworks pull a GPU into the data path. Jam doesn't. There is no GPU procurement cycle, no driver tax, no power-hungry accelerator on the BOM. Cold-storage pods stay cold.

02

CPU and RAM stay out of the way

Encode runs in under 400 MB of RAM and a fraction of a modern CPU core. The pipeline never serialises through CPU caches or main memory the way other engines do. That is why throughput scales with the storage device, not with how much compute you can throw at the encoder.

03

SSD-speed in, SSD-speed out

Jam reads, compresses, indexes, envelopes, and writes straight to NVMe at the drive's own ceiling. The April 2026 live test pinned an enterprise NVMe throughout: 281 MB/s encode (random-read ceiling), 1.13 GB/s decode (sequential-write ceiling). Faster drive, faster Jam.

The architectural punchline for buyers: there is no part of the data path that is gated on a component you have to procure, scale up, or wait on. Cithorum is the only software data plane in the bracket whose throughput is purely a function of the disk you already own.

03 — LIVE TEST · APRIL 2026

Unedited footage. Real workload. Real numbers.

135 GB of VM snapshots. JAM+ZSTD layered. NVMe-bound throughout. The encoder is limited by the drive's random-read ceiling; the decoder by its sequential-write ceiling.

135 GB of VM snapshots · JAM+ZSTD · April 2026Watch the full clip →
Combined ratio
135 GB → 17.2 GB
7.85× · JAM + ZSTD
JAM only
3.8×
135 GB → 35 GB
Encode
281 MB/s
NVMe random-read ceiling
Decode
1.13 GB/s
NVMe sequential-write ceiling

04 — WHERE IT FITS

Workloads where bytes are the bill.

Virtual machine snapshots

The live-test workload. Snapshots dedupe heavily across versions; Jam captures the wins zstd alone leaves on the floor.

AI training datasets

Compress at rest, accelerate loading. Smaller objects mean fewer reads from object store and faster epoch times.

Genomics / FASTQ

Genome-sequencing partner trial: 12% smaller FASTQ; 22-min alignment vs 2.5 hours next-closest aligner.

IoT / M2M telemetry

Long-tail device archives with high inter-record redundancy. Live in production with M2M TechConnect on a per-TB / month basis — approximately 50% below commercial cloud list rates.

Defence simulations + sensor data

Sovereign workloads, customer-controlled storage, air-gapped deploys. Vetted within the European/NATO defence ecosystem.

Enterprise backup + cold archive

Replace the cold-tier egress tax with a software layer that keeps the data on-prem, indexed, and restorable on demand.

05 — HOW IT INTEGRATES

Three paths. Customer keeps the hardware.

PATH A · S3 GATEWAY

S3-compatible endpoint.

MinIO or Ceph-compatible gateway. Per-customer isolated buckets, quotas, credentials. Existing S3 clients hit a Jam endpoint and write compressed without code changes.

PATH B · DROP-IN SDK

Linux daemon + CLI.

Existing storage stays. Existing apps stay. Drop the Jam daemon next to your workload, point it at a directory, get compressed and indexed output back.

PATH C · APPLIANCE · ROADMAP

Jam Appliance.

Customer-premise hardware with Jam preloaded. For air-gapped, sovereign, and defence workloads where software-only isn't enough.

06 — SECURITY & COMPLIANCE

Architected so the engine never sees your data.

Jam runs on infrastructure telemetry — bytes in, bytes out, throughput, restore-event signal. Application payload stays in customer-controlled storage, encrypted in transit and at rest. The compliance posture is a property of the architecture, not a wrapper around it.

Telemetry-only operation

Encoder and decoder operate over the bytes the customer points at; never reach into application logic, never index customer identifiers. Ratio, throughput, and restore events are the only signal Cithorum receives.

Encrypted everywhere

TLS in transit. AES-class encryption at rest on customer storage. Hash-verified bytes on every restore. Air-gapped deploy option for sovereign and defence workloads.

RBAC + audit log

Per-tenant access control, per-bucket isolation, full audit log of every operation. Exportable for regulator and procurement review.

SOC 2 operationally compliant · regulated-environment ready

SOC 2 + ISO 27001 operationally compliant via continuous control monitoring; formal Type I attestation in flight (Q4 2026), Type II Q3 2027. HIPAA-aligned for clinical data; PCI-DSS-scoped for payments-adjacent estates; CERT-In and DPDP Act 2023 ready for India deployments. See the full compliance surface →

07 — PRICING

Three tiers. One unit: dollars per terabyte.

All tiers include a benchmark report on the customer's own data, 100% restore verification, and customer-controlled storage. Numbers below are reference points, not list price.

STARTER

~50% below commercial cloud (e.g. AWS S3)

The M2M tier. For unattended workloads, OEM, and embedded deployments. Single-tenant daemon. Per-TB / month, priced roughly half of hyperscaler list. Self-serve onboarding once a benchmark is signed off.

Run a benchmark →

DEPLOY

From ~$3–4k / month per customer

Mid-tier with managed deployment, support, and dashboards. S3 gateway included. Heartbeat telemetry and restore-verification reports as standard.

Talk to us →

ENTERPRISE / GOV

Custom

Volume, sovereign data, defence ecosystem, India tender route. Air-gapped deploys, appliance option, named technical contact, scoped SLA.

Contact us →

All tiers include benchmark report, 100% restore verification, and customer-controlled storage. We charge for compressed bytes, not raw — the customer's bill goes down as Jam's ratio goes up.

Run a pilot. Two-to-four weeks to a benchmark report on your own data.