Skip to content

Swarm deployment

Docker Swarm is Docker's built-in clustering and orchestration layer. SPIRENS ships a parallel set of stack files under compose/swarm/ that mirror the single-host compose tree.

If you're new to Swarm, the short version: it takes the same compose concepts (services, networks, volumes) and adds multi-host scheduling, overlay networks, a routing mesh, and zero-downtime service updates. The API surface is close to compose but differs in a few important ways.

Don't choose Swarm because 'it's more production-y'

For a single box, single-host Compose is strictly simpler and gives you everything SPIRENS needs. Swarm only earns its complexity once you have either (a) multiple hosts you want Traefik's routing mesh to load-balance across, or (b) a state-durability story that needs shared storage (NFS-backed volumes).

When Swarm wins

  • Multi-host ingress. One Traefik replica per node, routing mesh load-balances traffic across all nodes regardless of where a target service happens to run.
  • Shared state via NFS volumes. A service (e.g. Traefik with its acme.json) can land on any node and find its state.
  • Rolling updates. docker service update replaces containers one at a time; single-host docker compose up recreates them in one go.
  • Per-service scaling. docker service scale spirens-erpc=3 works; Compose needs --scale on up and is less predictable.

What changes vs single-host

Concern Single-host Swarm
Entry point docker compose -f compose.yml up -d docker stack deploy -c stack.*.yml <stack-name>
Deploy granularity One compose.yml includes all modules One stack.*.yml per service; deploy each separately
Traefik provider providers.docker providers.swarm
Service labels traefik.docker.network=… traefik.swarm.network=…
Secrets File-backed (secrets: - file: …) external: true via docker secret create
Configs Volume-mounted files Usually docker config create + external: true
Updates spirens up single [-s service] docker service update or stack deploy again
Scale One replica per service by design docker service scale foo=N
Volumes Local named volumes Swap in NFS driver for shared state
Placement n/a deploy.placement.constraints / preferences
Networks Bridge (external: true) Overlay (external: true, attachable: true)

Bringing it up

First-time setup

One manager node bootstraps the cluster:

docker swarm init --advertise-addr <manager-ip>

Join workers with the join token the init command prints (run docker swarm join-token worker on the manager if you've lost it):

docker swarm join --token SWMTKN-1-... <manager-ip>:2377

Verify:

docker node ls

Bootstrap secrets and configs

spirens bootstrap --swarm

This creates:

  • The two overlay networks (spirens_frontend, spirens_backend)
  • docker secret entries from the secrets/ directory
  • docker config entries from config/

Each is created external: true, so the stacks reference them without re-creating them on every deploy.

Deploy the stacks

spirens up swarm

Under the hood this runs one docker stack deploy per shipped stack.*.yml. The stacks are deployed separately on purpose — each is independently updatable, and removing a stack doesn't disturb the others.

Equivalent by hand:

docker stack deploy -c compose/swarm/stack.traefik.yml spirens-traefik
docker stack deploy -c compose/swarm/stack.redis.yml spirens-redis
docker stack deploy -c compose/swarm/stack.erpc.yml spirens-erpc
docker stack deploy -c compose/swarm/stack.ipfs.yml spirens-ipfs
docker stack deploy -c compose/swarm/stack.dweb-proxy.yml spirens-dweb-proxy

Day-two operations

Updating one service

No single-host -s equivalent is needed — on Swarm every stack is independent. To update just one service:

# Option A: re-deploy its stack (picks up image + config changes)
docker stack deploy -c compose/swarm/stack.erpc.yml spirens-erpc

# Option B: force-restart without config changes (e.g. pull latest image)
docker service update --force spirens-erpc_erpc

# Option C: change the image tag inline
docker service update --image ghcr.io/erpc/erpc:v0.0.42 spirens-erpc_erpc

Checking status

docker stack ls                              # all stacks
docker stack services spirens-traefik        # services in a stack
docker service ps spirens-traefik_traefik --no-trunc   # replicas, where they run
docker service logs spirens-traefik_traefik          # aggregated logs

Scaling

docker service scale spirens-erpc_erpc=3

For eRPC this is usually fine — it's stateless. For Traefik, you typically want one replica per node via mode: global in the stack file rather than a fixed count.

Rolling updates

By default Docker Swarm does rolling updates with parallelism: 1 — one container at a time. Tune in the stack file:

services:
  erpc:
    deploy:
      update_config:
        parallelism: 2
        delay: 10s
        order: start-first # zero-downtime; start new before stopping old
      rollback_config:
        parallelism: 1
        delay: 5s

Shared state: NFS volumes

Services with durable state (Traefik's acme.json, IPFS's datastore) need their volume to be reachable from whichever node the container lands on. Two common approaches:

Pin the service to one node

Simplest. Put a placement constraint in the stack:

deploy:
  placement:
    constraints:
      - node.hostname == my-ingress-01

Downside: you've undone some of Swarm's HA story. If that node dies, the service is down until you remove the constraint and let it move.

NFS-backed volume

The proper Swarm-native answer. An NFS server (can be one of your nodes, or a NAS, or a cloud file share) exports the state; every node mounts it; Docker volumes reference it via the local driver with NFS options:

volumes:
  letsencrypt:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=10.0.0.5,rw,nfsvers=4"
      device: ":/export/spirens/letsencrypt"

Trade-off: another piece of infrastructure to run. For a two-node cluster this is often heavier than it's worth — pinning to one node is fine. Three or more nodes with real HA requirements start to earn NFS.

Profiles on Swarm

All three deployment profiles from 04 — Deployment Profiles work on Swarm unchanged. What differs is the network plumbing:

Profile Works on Swarm? Notes
Internal Same story — local DNS points at the cluster VIP or any node's IP
Public Cloudflare A records can point at any node; routing mesh handles it
Tunnel Cloudflared runs on one node; mesh routes to services regardless

The routing mesh is Swarm's killer feature here: a client hitting port 443 on any node gets routed to whichever node is running Traefik. Combined with mode: global on Traefik (one replica per node), you get N-way HA ingress for free.

Swarm-specific troubleshooting

Service stuck in "preparing"

Usually an image pull issue on one or more nodes. docker service ps --no-trunc <service> shows the error per replica.

Routing mesh not routing

Check that the node you're hitting is actually in the swarm (docker node ls) and has published the port (docker service inspect --pretty <service>). On some cloud providers, security-group rules block the ingress network's internal mesh port (7946/tcp+udp, 4789/udp) — it must be open between nodes.

external: true references fail

Every Swarm stack references pre-created networks, secrets, and configs with external: true. If a deploy fails with "network not found" or "secret not found", re-run spirens bootstrap --swarm to create the missing external resources.

Stack lingering after removal

docker stack rm spirens-erpc
# wait ~10 seconds for containers to drain
docker stack ls                     # verify gone

If networks persist because another stack references them, that's expected — the two SPIRENS networks are shared across stacks by design.

When to just use Kubernetes

If you outgrow Swarm — you need PVCs, richer scheduling, cross-node networking policies, or a real operator ecosystem — that's when you move to Kubernetes. SPIRENS doesn't ship Kubernetes manifests, but the services (Traefik, Kubo, eRPC, dweb-proxy, Redis) all have mature Helm charts. The SPIRENS configs (config/, .env shape) port over; the compose files don't.

That migration is out of scope for this repo. If you do it, the Traefik Helm chart and Kubo Helm chart are the right starting points.