Improve and simplify AWS and Kubernetes infrastructure management

How we organized infrastructure management of a system in the cloud by utilizing Pulumi, Github Actions and Argo CD
Improve and simplify AWS and Kubernetes infrastructure management
ErisyonErisyon
Company name:
Erisyon
Industry:
Biotechnology
R&D size:
5-10 Engineers
Scale:
 Buldings Icon

1. Initial state

Erisyon is a Life Science tools company developing a single moleucle protein sequencer.

They are commercializing the world’s first single-molecule protein sequencer that promises to transform the way we detect, treat, and track disease.

Their state when they met us:

  • Two Kubernetes clusters provisioned by Pulumi (development & production)
  • Parts of the system not imported to Pulumi
  • No pipeline to execute & review Pulumi infrastructure changes
  • Complex and hard to read/manage JSON files for Helm charts being applied by Pulumi.
  • No staging cluster to test high risk infrastructure changes without affecting ongoing development
  • No strategy how to track and frequently upgrade all platform components

Managing the infrastructure became complex and risky.

Target Icon

3. Project goals

  • Make it easier and safer to provision and manage infrastructure without expanding the software team
Checklist Icon

4. Decisions

To achieve the goals, we made a couple of decisions:

  1. Use Github Actions to execute and validate “Pulumi Preview” output, along with a manual approval process before running “Pulumi Up” and applying infrastructure changes
  2. Create STAGING environment (cluster) and deploy testing instances of components running on development and production clusters
  3. Use the new environment to test both infrastructure and application changes before modifying production.
  4. Use ArgoCD for Helm charts deployment - Pulumi usage is limited to AWS Cloud resources management only, whereas ArgoCD takes over Helm chart deployment part.
  5. Use Renovate to automatically track new helm chart versions available and manage the upgrades automatically via Pull Request creation.
Lock Icon

5. Restrictions

  • There are strict dependencies between different Pulumi projects require a certain order of applying changes across infrastructure projects.
Map Icon

6. Strategy

  • Safeguard changes via GitHub-Actions Pulumi previews, manual approvals, and a staging cluster that gates promotion to dev and prod.
  • Automate upkeep through CI pipelines, on-demand node workflows, and Renovate PRs tracking Helm chart upgrades.
Settings Icon

7. The process

The process of transforming Erisyon's infrastructure was methodical and detailed:

  1. Create the following Github Actions workflows:
    1. run “Pulumi Preview” in a Pull Request and review before merging a change. Continue working on a change until it’s ready.
    2. run “Pulumi Preview” on a merge to the main branch and create a Github Issue with the change details for review.
    3. run “Pulumi Up” when the issue (Pulumi infrastructure change) is reviewed and approved.
  2. Deploy ArgoCD to each cluster via Pulumi and migrate all application and platform Helm chart deployments to ArgoCD (leverage ArgoCD ApplicationSets).
  3. Create STAGING cluster. Update ArgoCD manifests to deploy application and platform services. Create Github Action workflows to provision and terminate EKS Nodes infrastructure to save cost (staging cluster services can be started within minutes when needed).
  4. Set up Renovate configuration/dashboard to track new Helm chart versions and automatically create Pull Requests for new versions (with detailed Change log review etc).
Chart Icon

8. Results

  1. The entire AWS infrastructure is managed using Pulumi, which runs from Github Actions
  2. Helm chart deployments are managed by ArgoCD (applied on a Pull Request merge)
  3. There is new STAGING environment to test both infrastructure and application changes before modifying development and production.
  4. There is automated process in place to track new Helm chart versions and raise Pull Requests to upgrade.

Worth mentioning:

We did other things with Erisyon as well, such as reducing overall infrastructure cost, improving monitoring, streamlining the Kubernetes clusters' upgrades, handling Kubernetes deprecations, etc.

Table Icon

9. Before & After

Before ❌

After ✅

Manual infrastructure changes management via Pulumi Preview and Up commands executed from a developer machine

Automated infrastructure management with Pulumi and Github Actions

Complex JSON files for Helm charts deployed by Pulumi

Eazy management and deployment by leveraging GitOps approach with ArgoCD for Helm charts

Risky infrastructure upgrades on development and production clusters

Staging environment to test both infrastructure and application changes

Highlight Example

Explore how we can achieve something similar with you