Scaling ClickHouse Ingestion for Millions of Devices: Best Practices and Pitfalls
ClickHousescalingoperations

Scaling ClickHouse Ingestion for Millions of Devices: Best Practices and Pitfalls

UUnknown
2026-02-19
11 min read
Advertisement

Operational guide for ingesting high-cardinality telemetry into ClickHouse: batching, backpressure, schema evolution, and resource planning for millions of devices.

Hook: Why your ClickHouse rollout stalls when device cardinality explodes

If you manage IoT fleets or industrial telemetry, you already know the problem: one million devices is a small proof-of-concept, ten million is a production nightmare. Ingesting high-cardinality telemetry into ClickHouse can break assumptions—small inserts turn into thousands of tiny parts, merges saturate I/O, schema churn causes long ALTER waits, and your dashboards lag behind real-world state. This operational guide cuts through theory and gives you battle-tested patterns for batching, backpressure, schema evolution, and resource planning when scaling ClickHouse for millions of devices in 2026.

Executive summary — the essentials up front

  • Batch writes into ClickHouse to reduce tiny parts and merge overhead — aim for compressed payloads of 1–10 MB or 10k–200k rows depending on row size.
  • Implement backpressure using Kafka/queue depth, ClickHouse Buffer/Kafka engines, and 429-based client throttling so ingestion spikes are smoothed.
  • Design for schema evolution with an ingest-normalized model (time, device_id, metric, value) and sparse columns or Map/JSON fields for new metrics.
  • Plan resources around merge costs, IOPS, replication factor, and SSD endurance—use tiered storage for long-term retention.
  • Monitor the right signals: system.merges, parts count, mutation queue, insert latency, and Kafka consumer lag.

Context: Why 2026 is different (and why ClickHouse)

In late 2025 and early 2026 the ClickHouse ecosystem matured faster: significant funding and cloud expansion accelerated managed offerings and feature work (query performance improvements, projections, and tighter cloud storage integrations). At the same time, hardware dynamics—new PLC NAND and shifting SSD pricing—changed cost calculations for hot storage. Those shifts make it possible and practical to operate ClickHouse at very large scale, but the operational challenges of high-cardinality device telemetry remain primarily architectural and procedural, not just hardware.

Section 1 — Data modeling: wide vs narrow (and why narrow wins at scale)

High-cardinality telemetry often tempts you to create a wide table with thousands of metric columns. That pattern becomes brittle: ALTERs are frequent and storage for many mostly-null columns is wasteful. Instead, prefer a narrow, event-centric schema and use schema patterns that tolerate evolution without heavy ALTER operations.

CREATE TABLE telemetry_events
(
  event_time DateTime64(3) CODEC(ZSTD(3)),
  device_id UInt64,
  metric String,
  value Float64,
  tags Map(String,String),
  event_version UInt8
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (device_id, event_time)
SETTINGS index_granularity = 8192;

Why this helps: storing metrics as rows keeps the schema stable as you introduce new metric names. It also reduces the need for ALTER TABLE ADD COLUMN and works well with materialized views for rollups or per-metric tables.

When to use wide tables

Use a wide table only when a bounded, stable set of metrics exists and low-latency retrieval of many metrics in a single query is required. Otherwise, the narrow model + materialized views/downsampled projections is more operationally resilient.

Section 2 — Batching strategies that prevent parts explosion

Tiny inserts are ClickHouse's biggest operational enemy. Every insert creates a part; too many small parts increase merge churn and I/O. Batch aggressively at the producer side or use buffering in front of ClickHouse.

Batch size guidance (practical rules of thumb)

  • Aim for compressed batches of 1–10 MB per insert for JSON/CSV/Native formats. For large numeric rows, smaller batches by rows are fine.
  • Row-count range: 10k–200k rows depending on row width. If each row is 200 bytes, 50k rows ≈ 10 MB raw (compressed smaller).
  • Max latency budget: batch only up to your acceptable side latency (e.g., 1–5s for near-real-time dashboards, 30–60s for analytics-only systems).

Example: If a device reports once/sec and you have 2M devices, aggregate per ingestion shard by batching with a small buffering window (1–2s) to keep latency low while delivering chunky writes.

Implementation options

  • Producer-side batching: group messages into a buffer by time or size, then POST to ClickHouse HTTP insert using CSV/JSONEachRow or Native format.
  • Kafka + ClickHouse Kafka engine: let Kafka store bursts and configure multiple consumers to write larger batches to MergeTree.
  • ClickHouse Buffer table: a lightweight, built-in buffer that accumulates and periodically flushes into MergeTree (good safety net; configure flush size/time carefully).

HTTP insert sample (Python)

import requests

rows = []
for msg in messages:
    rows.append(f"{msg['ts']},{msg['device_id']},{msg['metric']},{msg['value']}")

payload = "\n".join(rows)
requests.post('http://clickhouse:8123/?query=INSERT+INTO+telemetry_events+FORMAT+CSV', data=payload)

Section 3 — Backpressure: stop bursts from overwhelming ClickHouse

When millions of devices spike (firmware rollouts, events, storms), you need a predictable throttling mechanism. Backpressure has three layers: ingress queuing, application-level throttling, and ClickHouse-side buffering.

Design patterns for backpressure

  1. Front-door queue (Kafka/RabbitMQ): absorb spikes and provide durable buffering with consumer scaling driven by backlog.
  2. Adaptive batching: consumers form batches based on queue depth—when lag is high, increase batch size and slower, larger writes; when lag is low, lower batch and latency.
  3. HTTP 429 + Retry-After: if ClickHouse or the ingestion gateway is overloaded, return 429 to producers with exponential backoff and jitter.
  4. ClickHouse Buffer/Kafka engine: Buffer provides smoothing; Kafka engine lets ClickHouse act as a consumer which you can autoscale separately.

Buffer table example

CREATE TABLE telemetry_buffer
(
  event_time DateTime64(3),
  device_id UInt64,
  metric String,
  value Float64
) ENGINE = Buffer(default, telemetry_events, 16, 10, 60, 100000, 100000, 100000, 100000);

-- Materialized view to move buffer -> MergeTree
CREATE MATERIALIZED VIEW telemetry_buffer_mv TO telemetry_events AS
SELECT * FROM telemetry_buffer;

Tune the Buffer parameters: number of partitions, rows threshold and time window. The goal is to avoid tiny periodic flushes under high cardinality.

Section 4 — Schema evolution: avoid costly ALTERs

Schema churn is a practical challenge for device fleets that add telemetry fields over time. ClickHouse supports online ALTERs, but frequent schema changes—especially to the primary key or index—hit performance and increase downtime windows for large tables.

Best practices for evolving telemetry schema

  • Prefer a stable, narrow schema where new metrics are rows or inside a Map(String,Float64) or JSON column. This minimizes ALTERs.
  • Use a schema registry for producer serialization (Protobuf/Avro) to control versions and avoid consumers sending unknown fields without a plan.
  • When a column must be added, mark it Nullable with DEFAULT to allow fast metadata-only ALTERs: ALTER TABLE ADD COLUMN x Nullable(Float64) DEFAULT NULL;
  • For large historical backfills, write to a staging table and use INSERT SELECT into the target table during low-load windows instead of ALTER + UPDATE.

When to rekey or redesign

If queries become dominated by different access patterns (from device-centric to metric-centric), create materialized views or secondary tables optimized for that access rather than altering the base table's ORDER BY. ORDER BY changes require table rewriting; avoid that on production-sized tables.

Section 5 — Resource planning and cost estimation

Proper resource planning accounts for writes, merges, replication, and retention. Below is a pragmatic approach to sizing and a worked example.

Step 1: Calculate raw ingest bandwidth

Example scenario: 5M devices sending 1 row/sec, row payload ~200 bytes (JSON) → raw writes = 1,000,000 rows/sec? No: 5M × 200B = 1,000,000,000 B/s = ~1 GB/s raw. Compression ratios for telemetry often range 2x–10x depending on schema and codecs; with ZSTD you might see 3–5x for numeric telemetry. So expect 200–500 MB/s compressed into ClickHouse.

Step 2: Account for replication and overhead

  • Replication factor multiplies raw storage (RF=3 → 3×).
  • Part overhead: many small parts increase metadata and merge I/O; aim to keep average part sizes in the 50–200 MB range to minimize merge pressure.
  • Retention: hot vs cold tiers; use tiered storage or ClickHouse cloud storage integrations to move data to object storage after X days.

Step 3: IOPS and CPU for merges

Merges are CPU and I/O heavy. Plan for sustained merge bandwidth of roughly 50–200% of write bandwidth depending on insert pattern. If writes are steady and batched, merges are manageable; if you have bursts, merges can spike and require temporary headroom.

Worked sizing example

  • Devices: 5,000,000
  • Messages per device: 1/sec → 5M rows/sec
  • Row compressed size: 0.04 KB (40 bytes) after tight schema & compression → 200 MB/s compressed
  • Replication: RF=3 → 600 MB/s write bandwidth across cluster
  • Nodes: design for per-node sustained ~100 MB/s → need ~6 usable write nodes (plus replicas and compute redundancy), so cluster size ~12 nodes.

Note: numbers above are illustrative. Always profile your own message shape and compression.

Hardware and storage considerations (2026)

SSD endurance and price trends (PLC NAND becoming more viable) influence how much hot storage you can afford. For ingest-heavy workloads, prioritize high write endurance NVMe for hot partitions and use object storage for long-term cold data. Also, pick nodes with CPU headroom (many merges are CPU-bound) and 10GbE or higher networking between replicas.

Section 6 — Monitoring: metrics that matter

Keep these metrics instrumented and alerting:

  • system.parts: sudden spikes in part count → too many small inserts.
  • system.merges: running merges and queued merges metric values; long queues indicate I/O/CPU bottleneck.
  • system.mutations: long-running schema or TTL mutations indicate deferred work.
  • Kakfa consumer lag: sustained lag → need more consumers or faster writes.
  • Insert latency and 95/99th percentile: informs batching vs latency trade-offs.

Section 7 — Advanced strategies and patterns

1. Using ReplacingMergeTree for deduplication

If duplicates arrive (device retransmits), use ReplacingMergeTree with a version column or event timestamp to dedupe during merges. This shifts dedupe cost into merge time rather than insert time.

2. Pre-aggregation with AggregatingMergeTree

For dashboards and long-term analytics, use materialized views to pre-aggregate per-minute or per-hour metrics into AggregatingMergeTree tables. This reduces query load on raw tables and helps cold storage snapshots.

3. Partitioning strategy

Partition by time grain (monthly or daily) and avoid partitioning by device_id if device cardinality is in the millions. Partitions must remain moderate in count; too many partitions increases metadata overhead.

4. Projections and data skipping

Use data skipping indices and projections for common query patterns (e.g., device-centric time ranges). Projections can dramatically speed reads and reduce load on underlying storage if properly maintained.

Section 8 — Pitfalls and how to avoid them

  • Pitfall: sending single-row inserts from millions of devices. Fix: implement edge buffering and batching, or funnel through Kafka.
  • Pitfall: wide schema with thousands of ALTERs. Fix: use narrow schema + Map/JSON columns and materialized views.
  • Pitfall: underestimating merge cost leading to saturated I/O. Fix: size for merge overhead and monitor system.merges.
  • Pitfall: choosing LowCardinality for massively unique device_id. Fix: use numeric device IDs or dictionary mapping if cardinality is extremely high; LowCardinality helps when the set is relatively small.
  • Pitfall: storing everything as String. Fix: use typed columns and optimal codecs (ZSTD for mixed numerical payloads).

Case study: scaling to 10M devices (operational playbook)

Scenario: 10M devices, 1 msg/min average, bursts up to 100 msg/min during events. Steps taken:

  1. Normalize event schema to narrow model with device_id as UInt64 and metric as String.
  2. Funnel device traffic into regional Kafka clusters; partition by device hash to spread load.
  3. Consumer pool sized dynamically by Kafka lag; consumers perform adaptive batching (1–5s windows or 10–100k rows).
  4. ClickHouse cluster: 12 shards × 2 replicas, NVMe hot tier, object storage for cold; Buffer tables used for extra smoothing.
  5. Materialized views roll up into per-minute AggregatingMergeTree for dashboards; raw events retained for 30 days then archived.

Outcome: steady-state ingest of ~150 MB/s compressed with peek bursts handled by Kafka depth and Buffer tables. Merge queues remained stable after optimizing partition size and index granularity.

Actionable checklist (operational runbook)

  • Implement producer batching with size/time windows; measure batch sizes and adjust to keep average part >50 MB.
  • Deploy Kafka (or durable queue) as ingress buffer; monitor consumer lag and autoscale consumers.
  • Use narrow schema; store sparsely-populated optional metrics in Map/String or JSON fields.
  • Use Buffer or Kafka engines in ClickHouse to smooth writes; configure flush thresholds deliberately.
  • Plan hardware for merge I/O and replication; use tiered storage for retention beyond hot window.
  • Track system.parts, system.merges, mutation queues, and insert latency—alert on anomalies.

Final notes and 2026-forward predictions

ClickHouse adoption continues to rise in 2026 as managed services and feature maturity reduce operational friction, but the core problems for telemetry ingestion—high cardinality, bursty traffic, and schema churn—remain. Expect more first-party integrations for tiered cloud storage and better automatic compaction heuristics throughout 2026, and increasing use of edge-to-cloud gateways that do pre-aggregation and adaptive batching. Infrastructure teams that combine robust queuing, careful batching, and a narrow ingest model will stay ahead of the curve.

"Invest in buffering and smart batching today—it's the simplest lever with the biggest operational payoff when device cardinality explodes."

Call to action

Ready to scale ClickHouse for millions of devices? Start with a short audit: measure your current average batch size, part count, and merge queue. If you want a tailored operational plan, contact our team for a capacity-sizing session and a 30-day ingest tuning playbook that includes producer batching, Buffer/Kafka config, and schema recommendations.

Advertisement

Related Topics

#ClickHouse#scaling#operations
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-22T00:01:54.017Z