Site Reliability Engineer (SRE) Lead

Roostify
San Francisco, CA Full Time
POSTED ON 11/22/2021 CLOSED ON 12/22/2021

What are the responsibilities and job description for the Site Reliability Engineer (SRE) Lead position at Roostify?

Roostify is transforming the mortgage industry with an innovative and integrated platform that’s streamlining the entire digital lending experience. We believe that home lending should be a fair, fast, and transparent experience. Our software is used by banks across the country to improve lending experiences every day. We are a team of innovative thinkers on a mission to reinvent the lending experiences so people can accelerate their future.   

Roostify is transforming the mortgage industry with an innovative and integrated platform that’s streamlining the entire digital lending experience. We believe that home lending should be a fair, fast, and transparent experience. Our software is used by banks across the country to improve lending experiences every day. We are a team of innovative thinkers on a mission to reinvent the lending experience so people can accelerate their future. 

We are looking for a Senior Leader to head Site Reliability Engineering for the Roostify platform and suite of services. In this role, you will be joining the Client Services Delivery Team with the mission of delivering world-class services to our clients. As the SRE Lead you will be helping us continue to raise the bar for excellence in Production Operations. 

 

ROLE

  • Join us as a leader, with a strategic goal of establishing and independently running the Site Reliability function

  • The role is responsible for reliability and availability of all Production environments, their health, on-going monitoring, proactive and preventive health assessments.

  • Transform Operations & influence Engineering practices to achieve the strategic goal of new code deployed in Production frequently via Continuous Delivery (CD) pipelines

  • The role encompasses handling complex and varied product platforms and multiple Cloud deployment platforms.

RESPONSIBILITIES

  • Design the SRE function with the goal of providing 24x7x365 coverage

  • Build and evolve an Operations Model that can handle complexities spanning various cloud-based deployment models, and technology partner integrations.

  • Create & Support a delivery ecosystem that thrives on demonstrating value to stakeholders by adopting highly iterative & Continuous delivery models

  • Work with the product management team to define Service Level Agreements (SLAs) Service Level Objectives (SLOs) and implement Service Level Indicators (SLIs) for core capabilities

  • Collaborate with product and engineering to drive and improve the whole lifecycle of operational readiness - from inception to design, through deployment, operations, and proactive refinement

  • Influence Architectural and Product decisions with a bias towards Scale, Observability, Monitoring & Stability, and Security

  • Drive incident management process and support a blameless post-mortem culture

  • Own and drive high profile customer escalations

  • Drive and implement lean-ops culture by applying self-service, self-healing, and automation.

  • Advocate for SRE Principles, collaborate with all Engineering teams to create a DevOps mindset

  • Responsible for Capacity forecast, Budget & Cost optimization  

  • Define and deliver KPIs, Metrics for Operations & Quality to stakeholders – Deployment Frequency, MTTR, Lead Time, etc.

  • Adopt and evolve internal processes based on industry best practices in SRE

  • Grow team members through career development through coaching and mentoring for junior engineers, foster leadership principles and behaviors to groom the next generation of leaders.

SKILLS & EXPERIENCE

  • Excellent academic background with a Bachelors’ Degree in Engineering

  • Minimum 10 years of Software Engineering and/or Infrastructure Operations, 4 years in SRE role

  • Ability to work with distributed, multicultural, and diverse teams

  • Expertise in deploying and supporting Micro-Service based applications, Containerization and Cloud Technologies

  • Experience with CI/CD tooling: Concourse, Jenkins, Azure DevOps, etc.

  • Proven experience troubleshooting complex and large cloud environments

  • Experience with designing, deploying, and maintaining monitoring solutions such as NewRelic, DataDog, Splunk, Prometheus, etc.

  • Developing, running, and/or consuming cloud technologies such as AWS, Azure, Google Cloud Platform, and related tooling: Terraform, configuration management, etc.

  • Experience with customer escalations and/or operations war room.

  • Strong understanding of modern monitoring and logging technologies

  • Strong analytical skills with a data-driven approach to solving problems

  • The ability to partner and influence product, engineering, and operations teams is a must

  • Strong organizational planning and development, business judgment, influential skills, and technical leadership

  • Experience with Agile methodologies – SCRUM, KANBAN, etc.

BENEFITS & PERKS

At Roostify we know that people do their best when they feel their best; we care about our people and want them to thrive. Here are some of the benefits we’re proud to offer:

  • Competitive Salary & Equity Packages
  • Health, Dental, and Vision Plans
  • 401(K)
  • Flexible Vacation Time


Roostify is an Equal Opportunity Employer 

At Roostify we have a value of People First. We strive to provide the best experiences to our employees and candidates. We consider applicants without regards to race, color, national origin, sex, age, religion, sexual orientation, gender identity, veteran status, marital status, physical or mental disability, or other protected classes under all local, state, and federal laws and ordinances. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

 

While Roostify HQ is located in San Francisco, CA, we are open to remote work within the USA for this role.

Site Reliability Engineer (The Reliability Guardian)
Unreal Gigs -
San Francisco, CA
Site Reliability Engineer
Oven -
San Francisco, CA
Site Reliability Engineer II
Earnest Current Job Openings -
San Francisco, CA

For Employer
Looking for Real-time Job Posting Salary Data?
Keep a pulse on the job market with advanced job matching technology.
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

Sign up to receive alerts about other jobs with skills like those required for the Site Reliability Engineer (SRE) Lead.

Click the checkbox next to the jobs that you are interested in.

  • Automation Skill

    • Income Estimation: $54,606 - $69,900
    • Income Estimation: $50,271 - $79,025
  • Bilingual Skill

    • Income Estimation: $50,271 - $79,025
    • Income Estimation: $63,213 - $80,230
This job has expired.
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Site Reliability Engineer (SRE) Lead jobs in the San Francisco, CA area that may be a better fit.

Lead Site Reliability Engineer

Happyrobot, San Francisco, CA

Software Engineer III (Full Stack)

Lead, San Francisco, CA