Deploying Stirling PDF on EKS with Helm, SSO, and Persistent Storage

Introduction

The client needed a self-hosted PDF processing platform — controllable, auditable, and integrated with their existing identity provider. SaaS PDF tools were off the table for compliance reasons. Stirling PDF was the right fit: open-source, feature-rich, and containerized.

The challenge was not the application itself. It was operationalizing it correctly on EKS: SSO integration, durable storage across restarts, shared ALB ingress, and a Helm chart structure that scales across environments without duplication.

This article covers the architecture I designed and delivered for that deployment.

System Architecture

The deployment is managed via a custom Helm chart that wraps the upstream stirling-pdf-chart as a dependency:

dependencies:
  - name: stirling-pdf-chart
    alias: stirling-pdf
    version: "3.1.0"
    repository: "https://stirling-tools.github.io/Stirling-PDF-chart"

Aliasing the upstream chart as stirling-pdf allows clean value overrides without colliding with the chart’s internal naming. The wrapper chart owns all platform-level concerns: PVCs, NetworkPolicy, SecretStore, and ingress configuration.

Values are layered — a base values.yaml in the chart defines defaults, and environment-specific files (e.g. stage) override only what differs. This keeps drift minimal and promotes consistency across environments.

Core Components

Persistent Volume Claims

Three dedicated PVCs are provisioned via a templated loop:

persistence:
  additionalVolumes:
    - name: configs
      size: 1Gi
    - name: pipeline
      size: 1Gi
    - name: tessdata
      size: 1Gi

Each PVC is named stirling-pdf-<volume-name> and mounted into specific paths inside the container: /configs, /pipeline, and /usr/share/tessdata. The tessdata mount is particularly important — it persists OCR language data downloaded at runtime, preventing re-download on every pod restart.

Storage class defaults to gp3, which is appropriate for workloads with moderate IOPS requirements and no need for multi-attach.

SSO via OAuth2

Security is enforced through OAuth2, injected via an external secret:

envsFrom:
  - secretRef:
      name: stirling-pdf-sso-secret

The secret is not managed inline — it is resolved at deploy time via the SecretStore resource backed by AWS Secrets Manager. This keeps credentials out of the Helm values entirely.

The application is configured with DOCKER_ENABLE_SECURITY=true, SECURITY_OAUTH2_ENABLED=true, and SECURITY_ENABLELOGIN=true. CSRF protection is active. Login is required.

ALB Ingress with Path-Based Routing

The client runs a shared ALB across multiple services in the same environment. Stirling PDF joins the group via:

alb.ingress.kubernetes.io/load-balancer-name: "stage-shared-alb"
alb.ingress.kubernetes.io/group.name: "stage"

Path-based rules expose only specific surface area: static assets, the API, login routes, and a catch-all. Direct access to internal endpoints is not exposed. The ALB is internal-facing (scheme: internal), consistent with compliance posture.

Runtime Behavior

Pod startup is deliberately conservative. Stirling PDF — especially with OCR and security features enabled — takes time to initialize. Probes reflect this:

Liveness: initial delay of 120s, checked every 30s
Readiness: initial delay of 90s, checked every 15s

Both probe against /api/v1/info/status. Failure thresholds are generous (5 for liveness) to prevent premature restarts during slow OCR data loading.

The SERVER_TOMCAT_MAX_HTTP_HEADER_SIZE and form post size are explicitly raised (65536 bytes and 10MB respectively). This is necessary for large PDF uploads and OAuth2 token exchanges, which can carry substantial header payloads.

Key Engineering Decisions

Wrapping upstream chart as a dependency

Rather than forking the upstream chart, I wrapped it. This means upstream fixes and new features can be adopted by bumping a version number. The platform-level concerns remain in the wrapper, cleanly separated.

External secrets over inline credentials

SSO credentials are never committed or templated into values. The SecretStore pointing to AWS Secrets Manager is provisioned as part of the chart. At deploy time, the ExternalSecret (managed separately) syncs credentials into the namespace. This satisfies the client’s secret management requirements without adding Vault or another system.

Shared ALB group

Running a dedicated ALB per service at this scale would be wasteful. The shared ALB group pattern on EKS allows multiple ingress objects to contribute rules to a single ALB, reducing cost and operational surface area.

Trade-offs

Optimized for: operational simplicity, compliance alignment, cost efficiency on shared infrastructure.

Sacrificed:

Pod startup speed — conservative probe delays mean slow rollouts. Acceptable for a PDF utility, not for latency-sensitive services.
Multi-AZ storage resilience — ReadWriteOnce PVCs do not support multi-AZ failover. If the node hosting the pod fails, the PVC cannot be attached elsewhere until the original node is confirmed dead. For a stateful utility service, this is an accepted risk.
NetworkPolicy enforcement is disabled at the stage level. This simplifies debugging during validation but should be revisited before production promotion.

Conclusion

This deployment delivers a production-ready, SSO-enforced, self-hosted PDF processing platform on EKS. The architecture separates upstream chart management from platform concerns, keeps credentials out of version control, and integrates cleanly into shared ingress infrastructure.

The key insight: wrapping upstream Helm charts rather than forking them keeps maintenance overhead low while retaining full control over platform-level behavior.

If you’re working on similar infrastructure challenges — self-hosted tooling, EKS platform design, or Helm chart architecture — feel free to reach out at hello@jakops.cloud.