Site Reliability Engineer
Company: Pelago
Location: San Francisco
Posted on: May 4, 2024
Job Description:
Role Overview:
At Pelago, we run a serverless architecture on AWS, with
infrastructure managed using Terraform. Our system has been built
to deliver our virtual clinic for Substance Use Management, and we
are looking for a talented Site Reliability Engineer to join the
engineering team supporting Pelago.As a HIPAA compliant, HITRUST
certified organization it is essential that our architecture is
built in compliance with information security and data privacy
requirements. Experience and knowledge of security best practices
in the context of AWS is essential.
In this role, you will...
- Maintain Pelago's system built on AWS
- Develop a deep understanding of the development workflow at
Pelago
- Be responsible for the planning, implementation, and growth of
the AWS cloud infrastructure
- Troubleshoot issues on our platform, find the root cause, and
if required, interface with engineering teams to resolve
- Monitor application metrics to proactively raise issues to the
relevant engineering functional team
- Own the reliability, availability, latency, performance and
capacity planning of the Pelago environment
- Perform incident response and blameless post-mortems
- Implement infrastructure as code for provisioning,
configuration and deployment using Terraform
- Build, release, and manage the configuration of all production
systems
- Conduct load testing to identify bottlenecks before they impact
customers
- Work alongside our developers to drive automation, maximize
efficiency and improve reliability
- Occasional on-call shift required on a rotational basis
The background we're looking for...
- 2+ years experience working in SRE or reliability focused
production engineering roles identifying application problems from
monitoring, health checks and application performance
- A solid understanding of supporting AWS, serverless
architecture, and Terraform
- Experience with building / maintaining platforms that adhere to
security standardsworking on a system that has scaled to 50m+
users
- Proficiency in script development and scripting
languages
- Strong troubleshooting background with experience in
identifying and remediating issues
- Team player mentality with strong communication and
collaboration skills
#J-18808-Ljbffr
Keywords: Pelago, Santa Cruz , Site Reliability Engineer, Professions , San Francisco, California
Didn't find what you're looking for? Search again!
Loading more jobs...