What are the responsibilities and job description for the Site Reliability Engineer position at Socure?
What The Role Is
The Senior SRE will be a key member of the Platform Engineering team. The Senior SRE will work alongside Product, Data Science and Infrastructure engineering teams and focus on improving observability and reliability of our core infrastructure.
What You’ll Do:
- Will be focused on scaling and tooling the observability platform across Socure products and infrastructure.
- Advise engineering teams on observability topics.
- Participate in 24x7 on call rotation for production incidents.
- Resolve and drive production incidents to resolution.
- Drive blameless postmortems
- Build dashboards to provide insights and visibility into critical metrics.
- Maintain alerting pipeline to ensure the right things are being alerted on.
What You'll Bring:
- 5 years of experience in Software or Platform Engineering.
- Expert understanding of Linux systems.
- Significant experience setting up Observability products such as Datadog, Grafana and Prometheus.
- Expert understanding and experience with K8S, terraform and one of the major public clouds preferably AWS.
- Experience with modern CI/CD platforms.
- Experience troubleshooting and driving production incidents.
- Proficient in at least one programming language.
- Good understanding of SLI, SLO and SLAs
- Good understanding of complex architectures.
- Be comfortable working with multiple teams.
We are an equal opportunity employer and value diversity of all kinds at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.