Senior Site Reliability Engineer

Angi
New York City, NY Remote Full Time
POSTED ON 5/23/2022 CLOSED ON 7/1/2022

What are the responsibilities and job description for the Senior Site Reliability Engineer position at Angi?

About the Role

Site Reliability Engineers (SREs) on the Telemetry team are responsible for ensuring that Angi’s Insights Platform can be relied upon to support the needs of our mission-critical systems.  The SRE role at Angi is different from many other organizations.  You will find yourself working in a team of SREs tasked with completing company objectives instead of being embedded amongst development teams.  The team works together to address client needs as any development group would.  This allows for easier sharing of knowledge between team members and a more consistent experience for the clients.  We build all of our solutions using EKS in AWS with Terraform and leverage Weave Flux, Prometheus, Cortex, Loki, Tempo, and Grafana to provide telemetry services for our clients. Every day you’ll find yourself either managing them, providing solutions based on their data, or working with clients on how to properly use our Telemetry Platform.

We are looking for experienced Site Reliability Engineers who meet the following criteria 

Technical: 

  1. A working knowledge of metrics, logs, and distributed tracing practices.
  2. Depth of knowledge in at least one of those practices. 
  3. Comfortable contributing to a shared codebase.
  4. Understand Kubernetes and the container orchestration concepts it uses.
  5. Passionate about process automation and familiar with enough different approaches to entertain several before deciding on which to pursue.
  6. A healthy amount of curiosity for containerized technology and how it works.

Execution: 

  1. Experience identifying changes that improve processes from a reliability and performance perspective.
  2. Enjoy finding solutions in low information situations.
  3. Comfortable using telemetry data to spot parts of a system that do not scale, research solutions, and implement a migration plan that mitigates the situation
  4. Enjoy working to determine what service information is important enough to drive service levels and create the means for them to use that data.

Collaboration and Communication: 

  1. Have a curiosity for current and new practices that lead to collaboration and process change.
  2. Enjoy documenting and sharing solutions to interesting challenges with others. 
  3. Participated in post-mortems and have definite opinions on how they serve the organization.
  4. Experience working as a team to support a critical core system. 

As an SRE you will: 

  • Determine what information is important enough to drive service levels for our services.
  • Use service level information to determine reliability on our Telemetry Platform. 
  • Participate in an on-call rotation that responds to incidents concerning the Telemetry Platform.
  • Contribute to solutions defined in GitLab projects and GitHub repositories.
  • Maintain AWS EKS clusters using our Terraform modules.
  • Automate complex business challenges that require your specific skill set.
  • Contribute to core infrastructure pieces that allow Angi to scale to meet the needs of its clients.
  • Use the Telemetry Platform to assist in investigations that happen across the organization.
  • Plan and shape the growth of Angi’s infrastructure as we iterate it over time.

You may be a fit for this role if you: 

  • Think about systems - edge cases, failure modes, behaviors, specific implementations. 
  • Have an understanding of large scale system design, monitoring, observability, and operational practices. 
  • Have strong programming skills - Go, Python, and/or Ruby 
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it. 
  • Have experience with Weave Flux, Nginx, Kubernetes, Terraform, Prometheus, Loki, Cortex, Tempo, or similar technologies
  • Are compelled to keep a constant eye on the Observability space, identifying and planning ahead based on changes in practices/technologies as they arise

Projects you could work on: 

  • Contribute to our team’s Telemetry Platform that consists of Prometheus, Cortex, Loki, Tempo, and Grafana deployed in EKS using Terraform and Weave Flux on AWS. 
  • Contribute to projects across the organization to address challenges that your skill set exceeds.
  • Work with our dev teams to determine how to make their paging strategy more meaningful and less problematic.
  • Develop ways to aid our development teams in instrumenting their services to collect important information about our applications that allows for investigation
  • Working to reduce the level of effort needed to utilize the instrumentation that the teams are creating.
  • Provide valuable feedback and collaborate with the teams whose products we use as we iterate on our own infrastructure.

Compensation & Benefits: 

  • The salary band for this position ranges from 140k - 200k, commensurate with experience and performance. Compensation may vary based on factors such as cost of living. 
  • This position will be eligible for a competitive year end performance bonus & equity package
  • Full medical, dental, vision package to fit your needs
  • Flexible vacation policy; work hard and take time when you need it
  • Pet discount plans & retirement plan with company match (401K)
  • The rare opportunity to work with sharp, motivated teammates solving some of the most unique challenges and changing the world

#LI-Remote
#BI-Remote 

Senior IT Site Reliability Engineer
Hudson River Trading -
New York, NY
Senior DevOps and Site Reliability Engineer, remote
Cherre -
New York, NY
Senior Site Reliability Engineer (Cloud Networking)
1000 Kyndryl, Inc. -
New York, NY

For Employer
Looking for Real-time Job Posting Salary Data?
Keep a pulse on the job market with advanced job matching technology.
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

Sign up to receive alerts about other jobs with skills like those required for the Senior Site Reliability Engineer.

Click the checkbox next to the jobs that you are interested in.

  • Capacity Management Skill

    • Income Estimation: $92,084 - $126,229
    • Income Estimation: $113,340 - $141,948
  • Capacity Planning Skill

    • Income Estimation: $92,084 - $126,229
    • Income Estimation: $98,637 - $143,466
This job has expired.
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Angi

Angi
Hired Organization Address Waimānalo, HI Full Time
Looking for handymen with experience! Angi Services for Pros is a nationwide home services platform that is looking to c...
Angi
Hired Organization Address Valley, AZ Full Time
Looking for handymen with experience! Angi Services for Pros is a nationwide home services platform that is looking to c...
Angi
Hired Organization Address Corpus Christi, TX Full Time
Looking for cleaners with experience! Angi Services for Pros is a nationwide home services platform that is looking to c...
Angi
Hired Organization Address Odem, TX Full Time
Looking for cleaners with experience! Angi Services for Pros is a nationwide home services platform that is looking to c...

Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the New York City, NY area that may be a better fit.

Senior Site Reliability Engineer

Kforce Inc, Brooklyn, NY

Senior Site Reliability Engineer

Kforce Technology Staffing, Brooklyn, NY