Skip to main content

Sizing and Scaling

How big to start, what to watch, when to scale. Sizing depends on how your application uses Flagsmith: pick a pattern, read your tier.

Quick start​

  1. Pick a pattern: A (logged-in users), B (server-side local cache), or C (anonymous flag check).
  2. Estimate peak Flagsmith RPS using the worked examples.
  3. Read your tier from the tier reference.
  4. If you'll run any Server-side cached (B) traffic, set CACHE_ENVIRONMENT_DOCUMENT_SECONDS=60. Off by default; biggest single sizing lever.
  5. Watch the metrics and follow the decision tree when a threshold trips.
How to scale

API: add workers, keep each at 1 vCPU / 2 GB (2 vCPU / 4 GB at Large+). Database: bump CPU / memory / IOPS, add a read replica at Large.

Pick your workload pattern​

Most deployments are one of A, B, or C. Mixed traffic: see Example 5.

A: App with logged-in users​

App sends a user ID (plus traits like country, plan, role) and Flagsmith returns that user's personalised flags. Works the same whether the client is mobile, web, desktop, or a server acting for a user.

You're here if:

  • Web app with sign-in (React, Vue, Angular, server-rendered)
  • iOS, Android, React Native, Flutter app with user accounts
  • Backend evaluating flags for a known end-user in remote-evaluation mode
  • Targeting by user attribute, plan, region, cohort, or A/B bucket

Cost shape:

  • Each call: moderate work. Looks up user, evaluates segments, returns the flag set.
  • Response: usually a few KB. Many segments / traits can push it past 50 KB.
  • Volume scales with sessions per day. Baseline β‰ˆ 2 calls per session (open + auth), plus any refetches your client triggers.

B: Server-side service with local cache​

Backend polls Flagsmith every 60 seconds for the full environment snapshot, then evaluates flags locally. No round-trip per flag check.

You're here if:

  • Node.js, Python, Java, Go, .NET, Ruby, Elixir, or Rust backend using the SDK in local-evaluation mode
  • Batch jobs evaluating flags at high throughput
  • Microservices needing sub-millisecond flag checks

Cost shape:

  • Each poll is the heaviest thing Flagsmith does. It returns the entire environment and runs many database joins to build it.
  • Polling rate drives load, not user volume. 30 pods Γ— 60 s poll = 0.5 RPS to Flagsmith, regardless of how many user requests the backend handles.
  • Hardest on the database by default. Enabling the cache moves most of the cost.
SDK polling defaults
SDKDefault
Python, Node.js, Java, Ruby, .NET, Elixir, Rust60 s
GoOn-demand (no background poll unless opted in)
PHPNo local-evaluation polling

C: Anonymous flag check​

Flag check without a user identity: public pages, marketing experiments, default-vs-variant rollouts.

You're here if:

  • Marketing site with simple A/B tests
  • Public content varying by flag, not by user
  • SDK requests without identity context

Cost shape:

  • Each call: a flag-list lookup. Cheapest of the three.
  • Response: small (1–5 KB).
  • Volume scales with page views.

Worked examples​

Example 1: small SaaS web app (Pattern A)​

"100,000 monthly active users on our web product. Most users open the app once a day on average. About 5% of usage falls in our peak hour."

StepCalculationValue
Daily sessions100,000 MAU Γ— 1 session/day100,000
Peak-hour sessions100,000 Γ— 5%5,000
Peak Flagsmith RPS5,000 sessions Γ— 2 calls/session Γ· 3,600 sβ‰ˆ 2.8
TierPattern A: below 10 RPSSmall

Example 2: backend service polling Flagsmith (Pattern B)​

"30 backend pods running the Node.js SDK in local-evaluation mode with the default 60-second polling interval, all sharing one Flagsmith environment."

StepCalculationValue
Polls per second to Flagsmith30 pods Γ· 60 s0.5 RPS
TierBelow Pattern B's 1 RPS bandSmall

How the numbers move:

If you change…New RPSNew tier
Pods scale up to 300 (same one environment)5 RPSSmall
Poll interval dropped to 10 s (default is 60 s)3 RPSMedium
Both, 300 pods polling every 10 s30 RPSLarge
Watch poll rate, not pod count

A 10Γ— faster poll has the same effect as 10Γ— more pods. With server-side environment-document caching on, both controls matter much less: the database only sees one fetch per cache TTL regardless of how many pods are asking.

Example 3: large consumer app at scale (Pattern A)​

"5 million MAU on our consumer app (web + mobile combined), 2 sessions per user per day average, 5% peak-hour concentration, our SDKs refresh flags after login and on user actions, β‰ˆ 4 Flagsmith calls per session."

StepCalculationValue
Daily sessions5,000,000 Γ— 210 million
Peak-hour sessions10 M Γ— 5%500,000
Peak Flagsmith RPS500,000 Γ— 4 Γ· 3,600β‰ˆ 555
TierPattern A: above 200 RPSExtra-Large

Example 4: marketing landing page (Pattern C)​

"Our marketing site gets 50,000 visits per day. Each visit does one anonymous flag check on the landing page."

StepCalculationValue
Daily flag checks50,00050,000
Peak-hour calls50,000 Γ— 5%2,500
Peak Flagsmith RPS2,500 Γ· 3,600β‰ˆ 0.7
TierPattern C: well below 50 RPSSmall

Example 5: mixed traffic (Patterns A + B)​

"We have a SaaS web app with 500,000 MAU (logged-in users) AND a back-end service running 10 pods in local-evaluation mode at the default 60-second poll. The web app makes ~3 calls per session, peak hour is 5% of daily."

Step 1: estimate each pattern separately:

PatternCalculationPeak RPS
A (web sessions)500,000 MAU Γ— 1 session/day Γ— 5% peak Γ· 3,600 Γ— 3 calls/sessionβ‰ˆ 21 RPS
B (polling)10 pods Γ· 60 sβ‰ˆ 0.17 RPS

Step 2: pick the tier on each axis:

AxisNumbersTier from that axis
API tier (driven by total RPS)21 + 0.17 β‰ˆ 21 RPSMedium (A 10–50 RPS band)
Database tier (driven by combined load)A is light per call; B is heavy per callMedium
Rule of thumb for mixed traffic

Estimate each pattern separately, add the RPS values, then pick the higher of the two tiers, whichever axis is more demanding sets your starting size. Almost always: the API tier is driven by total RPS; the database tier is driven by the heaviest pattern (B if you run any).

Tier reference​

Choose the equivalent compute instance type in your cloud (AWS Aurora, Azure Database for PostgreSQL Flexible Server, Google Cloud SQL, or any self-managed PostgreSQL). The numbers below are minimum non-burstable specs; oversize if in doubt.

What's running in a Flagsmith deployment​

ComponentWhat it does
APIStateless Python web service. Serves SDK and dashboard requests. Each worker is a pod (Kubernetes) or task (ECS). Scaled horizontally with an autoscaler.
DatabasePostgreSQL. Stores flags, segments, environments, identities, and audit data. Scaled vertically. Add a read replica at Large.
Task processorSeparate worker that runs background jobs (webhook delivery, audit log writes, scheduled tasks). Same image as the API, run with a different command. Sized similarly at every tier.
SSE (optional)Server-Sent Events service, pushes real-time flag updates to connected SDKs. Only deployed if you use Flagsmith's real-time updates feature.

Small​

Workload bands: A ≀ 10 RPS Β· B ≀ 1 RPS Β· C ≀ 50 RPS

Entry-level production. A typical first-year self-hosted deployment.

ComponentRecommendation
API2 workers at 1 vCPU / 2 GB Β· Autoscale min 2 / max 5 / target 60% CPU Β· Gunicorn defaults are fine
Database2 vCPU / 8 GB Β· 1,000 IOPS provisioned Β· Non-burstable instance class Β· 30 GB storage Β· HA optional
Task processor1 worker at 1 vCPU / 2 GB
Load balancerStandard cloud LB

Medium​

Workload bands: A 10–50 RPS Β· B 1–10 RPS Β· C 50–300 RPS

Standard production. Most self-hosted deployments serving active user populations land here.

ComponentRecommendation
API4–6 workers at 1 vCPU / 2 GB, or 3 workers at 2 vCPU / 4 GB Β· Autoscale min 4 / max 15 / target 60% CPU Β· Raise gunicorn worker count for large payloads
Database4 vCPU / 16 GB Β· 3,000 IOPS provisioned Β· Non-burstable Β· 50 GB storage Β· HA recommended (multi-AZ writer) Β· Env-document cache mandatory for Pattern B
Task processor1–2 workers at 1 vCPU / 2 GB
Load balancerStandard cloud LB Β· Dedicated SSE pod (1–2) if using real-time updates

Large​

Workload bands: A 50–200 RPS Β· B 10–50 RPS Β· C 300–1,500 RPS

Heavy production. Customer-facing applications at scale.

ComponentRecommendation
API10–15 workers at 1 vCPU / 2 GB, or 5–8 at 2 vCPU / 4 GB Β· Autoscale min 6 / max 25 / target 60% CPU Β· Tune gunicorn workers + timeout for Pattern A large payloads
Database8 vCPU / 32 GB, memory-optimised preferred Β· 6,000 IOPS provisioned Β· HA mandatory Β· Read replica required for Pattern B Β· Cache in PERSISTENT mode
Task processor2 workers at 1 vCPU / 2 GB
Load balancerStandard cloud LB Β· Dedicated SSE pods (2+) if using real-time updates

Extra-Large​

Workload bands: A > 200 RPS Β· B > 50 RPS Β· C > 1,500 RPS

Very heavy production. If you expect to operate at this scale, please contact the Flagsmith team so we can validate the configuration against your specific workload.

ComponentRecommendation
API20–30+ workers at 2 vCPU / 4 GB Β· Autoscale min 10 / max 50 / target 60% CPU
Database16 vCPU / 64 GB+ Β· 10,000+ IOPS Β· Connection pool required (PgBouncer / RDS Proxy / Cloud SQL Auth Proxy) Β· 2+ read replicas; consider cross-region Β· HA mandatory
Task processor2–4 workers at 1 vCPU / 2 GB
Load balancerDedicated SSE pods (3+) Β· Consider CDN / Edge Proxy in front of API for read-heavy paths

Headroom rules​

Apply on top of the tier you've chosen. These are the safety margins that absorb spikes.

  • API: provision β‰₯ 2Γ— your hourly peak RPS. Per-minute spikes typically run 1–2Γ— the hourly average peak. 2Γ— headroom covers them.
  • Database CPU: target ≀ 50% peak. Leaves room for autovacuum, ad-hoc admin queries, and unexpected bursts.
  • IOPS: provision β‰₯ 2Γ— your peak read+write IOPS. IOPS ceilings throttle silently, better to overshoot.
  • Autoscale max: 4Γ— the starting worker count is enough for most cases. Wider range if you expect spikes.

Cache configuration​

Flagsmith ships with several caches, all disabled by default. Enabling them is the cheapest single change you can make to reduce database load, often by an order of magnitude.

Day-1 setting for any production deployment
CACHE_ENVIRONMENT_DOCUMENT_SECONDS=60

With Pattern B traffic, this typically drops database load by ~10Γ— without any other change.

Cache reference​

Environment variableDefaultRecommended (Medium+)What it does
CACHE_ENVIRONMENT_DOCUMENT_SECONDS0 (off)60Cache the heavy server-side SDK environment-document fetch. PostgreSQL hit at most once per TTL per environment.
CACHE_ENVIRONMENT_DOCUMENT_BACKENDDatabaseLocMemCache at Small / Medium, Redis / Memcached at Large+Default keeps the cache in PostgreSQL, cheap hits but still touches the DB. Switch to pod-local memory or an external cache for true off-DB caching.
CACHE_ENVIRONMENT_DOCUMENT_MODEEXPIRINGPERSISTENT at Large+Persistent mode survives pod restarts; warm-up cost amortised across the deployment.
GET_IDENTITIES_ENDPOINT_CACHE_SECONDS0 (off)30–60Cache the personalised response from a GET identity request. POST identity (which updates traits) always bypasses the cache.

Cache backend trade-offs​

  • Database (default). Shared across pods. Cache hits still touch PostgreSQL. Fine through Medium.
  • LocMemCache. Pod-local. Zero DB round-trip, but each pod warms separately and memory cost scales with pod count. Best at Small / Medium with a small number of pods.
  • Redis / Memcached. Shared, fast, off-DB. Adds a service you operate. Right at Large+.

When to keep TTL short or skip the cache​

  • Kill-switch flags. Flagsmith invalidates the cache on flag changes, but TTL is the worst-case wait. For incidents, use TTL ≀ 10 s.
  • Compliance / access-control flags. Stale flags could expose protected functionality. Consider a non-cached path.
  • Apps mutating traits mid-session. The GET-identity cache returns the same response per identifier until TTL expires. Use POST identity (always fresh) or skip the cache.
  • SDKs polling slowly (5+ min). Server cache rarely helps. The SDK won't ask within the TTL anyway.

Metrics to monitor​

Set alerts on these. The thresholds work as starting points; tighten or relax based on your error-budget and customer SLOs.

LayerMetricWhat to watch
APICPU utilisationSustained > 80% for more than 5% of any 30-day window means you're at capacity.
APIMemory utilisationPeak > 70% typically indicates a payload-size or worker-count tuning issue, not a sizing issue.
APIp99 request latencySustained > 1 second (excluding SSE long-poll endpoints) suggests gunicorn worker contention or slow downstream.
DatabaseCPU utilisationPeak > 70% means you should scale the database tier. First check whether enabling cache fixes it.
DatabaseProvisioned IOPSSustained > 80% of your provisioned IOPS = silent throttling. Bump the storage tier (not the CPU SKU).
DatabaseActive connections> 70% of max_connections = add a connection pool (PgBouncer / RDS Proxy / Cloud SQL Auth Proxy).
DatabaseFreeable memory< 5% of instance RAM at peak = memory-bound; bump the instance class.
Load balancer5xx response rate> 0.1% of requests over a 1-hour window is worth investigating. Separate target-side from LB-side errors.
Load balancerRequest count by status classWatch the 2xx / 4xx / 5xx ratio for sudden shifts that aren't backed by traffic changes.
Burstable DB credits (if used)Credit balance minIf your instance class is burstable (AWS t-class, Azure B-series, GCP shared-core) and credits regularly hit 0, you're silently throttled. Move to a non-burstable class.

Scaling decision tree​

When a metric crosses its threshold, follow the action below before reaching for a bigger SKU.

SymptomFirst actionIf that doesn't help
API CPU sustained > 80%Increase worker count by 50% (or bump HorizontalPodAutoscaler min)Move to next API tier
API memory > 70%Increase gunicorn worker count per pod, or bump pod memory if your response payloads are largeTrim segment / trait payloads. Large responses inflate worker memory.
Many 5xx at the load balancer with no corresponding target-side errorsLikely gunicorn worker exhaustion. Raise worker count + timeout per pod.Investigate response payload size and segment / trait fan-out
p99 latency > 1 sCheck gunicorn worker timeout vs payload size; check database CPU + IOPSMove to next tier on whichever layer is bottlenecked
Database CPU > 70% peakTurn on CACHE_ENVIRONMENT_DOCUMENT_SECONDS=60 if it isn't already. Often drops load by 10Γ—.Move to next database tier
Database IOPS > 80% provisionedBump storage tier / provisioned IOPS, not the CPU SKUMove to next database tier
Burstable database credit min = 0Move to a non-burstable instance with the same vCPU / RAMn/a
Database connections > 70% max_connectionsAdd a connection pool (PgBouncer / RDS Proxy / Cloud SQL Auth Proxy)Bump max_connections alongside RAM
SDK polling rate too high for current tierEnable env-document cache, or raise SDK polling intervalMove to next database tier

What not to do​

  • Don't run Medium+ without env-document caching. CACHE_ENVIRONMENT_DOCUMENT_SECONDS defaults to 0. Turning it on drops database load ~10Γ— for Pattern B traffic.
  • Don't use burstable database classes at Medium+. AWS t3 / t4g, Azure B-series, Google Cloud shared-core. They mask sizing problems until CPU credits hit zero, then throttle silently.
  • Don't size the database by HTTP RPS alone. A Pattern B deployment at 2 RPS can produce more database load than a Pattern A deployment at 100 RPS.
  • Don't ignore response payload size. Pattern A responses with many segments / traits can reach tens of kilobytes. Large payloads exhaust gunicorn workers and cause LB-level 5xx. Trim payloads or raise gunicorn worker count + timeout.
  • Don't oversize the task processor. 1 vCPU / 2 GB handles every tier; two replicas for redundancy.

Geographic deployments​

Most Flagsmith deployments operate in a single region. If you need to serve users across regions with lower latency or stricter data-residency requirements, there are two patterns to consider:

  • Flagsmith Edge Proxy. Cache flag evaluations closer to end users without operating a full second Flagsmith deployment. Best when you have many edge locations and a single source-of-truth Flagsmith.
  • Separate Flagsmith deployment per region. Strongest isolation, simplest operational model per region, but trades off central control of flags / segments.

Detailed geographic-expansion guidance is beyond the scope of this page. If you're planning a multi-region deployment, please contact the Flagsmith team so we can validate the trade-offs against your specific requirements.