What are the responsibilities and job description for the SRE - Observability position at sunday?
🚀 Mission
At sunday we are creating the future of payments. In 3 years, no customer in a restaurant will raise their hand after a meal to ask for the bill. In 5 years, nobody will remember how we paid when there were no QR codes.
- $124M raised in less than six months
- 300 employees in 5 countries (France, UK, Spain, US and Canada)
- An amazing tech team of 100 top performers
- 2800 restaurants signed: 150 cities in 5 markets
- $4BB addressable GTV
We are building our tech team from scratch and are looking for an SRE - Observability to build and scale our production services for performance, reliability and security. You'll collaborate with cross-functional product & engineering teams to identify, design, and implement platforms that allow our applications to be scalable and observable. You're passionate about distributed systems and you'll find ways to optimize the logs, metrics and signals they can generate.
What you will be doing:
- Automate and industrialize monitoring, logging, alerting, insights to observe and understand Sunday's technical stack, and Increase Sunday's availability rate
- Establish and maintain Service-Level Objectives (SLOs) for production services
- Implement and tune alerts for critical production services
- Assist with incident response, troubleshooting, root-cause analysis and postmortems
- A Strong understanding of network routing and load-balancing technologies
- Ensure sufficient service capacity
- Contribute to software and systems architecture design
- Identify and automate manual tasks related to operations & event remediation
🛡 💻 The stack
- We are a GCP practice
- UI: Typescript with React
- Platform: Java 16/17 with Spring Boot and some parts in Node.js
- JVM Orientation (Java, Kafka), reactive, hexagonal architecture, CQRS, gRPC for APIs and firebase
- Persistence: mainly postgresql, with some redis servers and a big query data store for analytics purposes
🤗 Skills & Experience
Requires 2-3 years of demonstrated hands-on experience:
- Configuration & Integration: DataDog, Kafka & Grafana
- Application Performance Monitoring, synthetic transactions
- Observability of integrations with external third parties
- Experience supporting Java-based distributed services in production
- Observability of Capacity
- Logging and Tracing tools
- The ability to communicate and collaborate effectively with Product Owners, Developers, QA, Operations and Security Engineers
Preferred:
- Integration of monitoring tools with JIRA Service Management (or other ITSM platform)
- Automation such as alert-ticket generation, scripted response & autohealing
- CMDB Experience: deployment, configuration and/or integration
🏖️ What we offer: Benefits
- Great impact: you will contribute to the development of the new payment solution in the hospitality industry and beyond
- Autonomy & responsibilities - A challenging environment with supportive colleagues and unlimited resources: a place to grow both technically and as an individual.
- An inclusive environment
- Stock options for everybody
- Full flexibility -- Full remote, partial remote or 100% in office: your choice!
- 100% health coverage
- Free vacation policy
- 4 month equal parental leave policy -- Why should it be otherwise?
Keywords
# SRE, #APM, #monitoring, #observability, #LI-Remote