Marcus Chen
Senior DevOps / Site Reliability Engineer
Denver, COmarcus.chen@email.com(555) 348-2207linkedin.com/in/marcuschen-sre
Professional Summary
Senior DevOps/SRE with 8 years running production Kubernetes at scale. Cut deploy time 38m → 4m on a 22-service platform; led the response to a Sev-1 cascading control-plane incident. Terraform / EKS / Prometheus depth.
Experience
Senior Site Reliability Engineer
Feb 2022 — Present
Tessera Cloud · Denver, CO
- Designed and rolled out a Kubernetes-based CI/CD platform across 22 services and 4 dev teams, reducing average deploy time from 38 minutes to 4 minutes and eliminating manual deploys entirely.
- Led the response to a 4-hour Sev-1 incident (origin: a misconfigured PodDisruptionBudget cascading to control-plane saturation), and authored the postmortem and follow-up automation that has prevented recurrence for 14 months.
- Built a self-service developer portal (Backstage) for service creation, secrets, dashboards, and runbooks — onboarded 19 services in the first quarter and cut average new-service-bootstrap time from 5 days to 2 hours.
DevOps Engineer
Aug 2018 — Jan 2022
Northbeam Systems · Remote
- Authored Terraform modules for AWS networking, IAM, and EKS adopted as the company's default infra primitive — used by all new services and replacing ~6K lines of legacy CloudFormation.
- Cut cloud spend 28% ($510K/yr) through right-sizing, spot/preemptible adoption, and unused-resource reclamation, while improving reliability SLOs in the same quarter.
- Owned the on-call rotation for a 14-service tier; reduced pages/week from 22 to 6 by writing automation for the three most-paged classes of alert.
Education
Skills
Kubernetes · Terraform · AWS (EKS, IAM, VPC, RDS, S3) · Docker · Helm · ArgoCD · GitHub Actions · Prometheus · Grafana · OpenTelemetry · Python · Go · Bash · Linux · SRE · Incident Response