Site Reliability Engineer P-051
Company: Smash CR
Location: San Jose
Posted on: April 24, 2024
Job Description:
The roleAs a Site Reliability Engineer (SRE) at company, your
mandate is to ensure the availability and reliability of our most
critical services, and ensure that they meet the requirements of
our customers. Our SRE team is growing, so you'll be a crucial
early member to help establish the team, processes, and best
practices. Success in this role looks like collaborating with other
teams to build and run sustainable production systems that can
evolve and adapt to the changes in our fast-paced environment.This
role is responsible for:
- Working proactively with engineering teams to help them set
SLOs and implement best practices for logging and telemetry
collection
- Design, implement and maintain the tools and systems that
support service reliability, monitoring, and alerting
- Participating in a 24x7 on-call rotation supporting the health
of our services
- Driving the incident management process and support a blameless
post-mortem culture
- Participating in application design consulting and capacity
planning
- Defining and formalizing SRE practices and help guide the
overall reliability engineering direction
- Providing mentorship both formally and informally to
engineers
- Continuously optimizing systems and workflows by improving
architecture, infrastructure, automation, CI/CD, and
observability
- Combining software and systems knowledge to engineer
high-volume distributed systems in a reliable, scalable, and
fault-tolerant mannerYou bring
- 5+ years of relevant industry experience with a focus on
distributed cloud native systems design, observability, operation,
maintenance, and troubleshooting
- 5+ years operational experience with an observability platform
like Datadog, Splunk, Prometheus/Grafana, or AppDynamics
- Fluency in one or more programming languages (e.g. Python,
Typescript, Go)
- A strong conviction in software development best practices,
including version control, automated testing, and continuous
integration and delivery
- You're self-motivated, inquisitive, and always looking to learn
new technologies
- You're a great teammate who communicates clearly and
transparently
- The Triple H Factor: Humble, Hungry and Honest
- An act-like-an-owner mentality. We have a bias toward taking
action.
#J-18808-Ljbffr
Keywords: Smash CR, Santa Cruz , Site Reliability Engineer P-051, Professions , San Jose, California
Didn't find what you're looking for? Search again!
Loading more jobs...