What are the responsibilities and job description for the Site Reliability Engineer position at OANDA?
OANDA is a global leader in online multi-asset trading services, currency data, corporate payments and FX services.
Everyone at OANDA is focused on our vision to transform how our customers can meet all their currency needs. From our roots in 1996 that provided free currency exchange information to launching a multi-award winning global FX and CFD trading business to our recent new venture of money transfer. OANDA is now a major global player.
OANDA is looking for a passionate Site Reliability Engineer to apply software development principles and practices to solve difficult operations problems.
As an SRE, you will be embedded in one of our development teams, acting as the champion for reliability best-practices including observability, automation, high-availability, fault tolerance, and full-lifecycle ownership. The perfect candidate for this role has a strong data-driven approach improving the performance of our products in on-premise and cloud environment
This is a full time role and we would like to meet you in the office minimum 4 times per month.
Primary Duties:
- Tap into your passion for eliminating repetitive manual processes (toil) using automation. Draft playbooks, and conduct tabletop and chaos engineering exercises to avoid operational underload and identify opportunities for improvement.
- Solve difficult performance and reliability problems. Perform code review for your development peers to ensure reliability, observability, and security are key pillars of our work.
- Collaborate with product managers and business stakeholders to set and maintain Service level Objectives (SLOs) and metrics that are representative of our customer experience.
- Participate in a cross-functional on-call rotation to support the team's code into production. Participate in on-call rotation, lead the blameless post-mortem process, and feed remediation tasks back into the development pipeline.
- Articulate the SRE ethos to your peers and stakeholders and help educate your colleagues the application of SRE principles to achieve a healthy balance of new feature development and reliability initiatives.
Requirements:
- Minimum 3 years of experience on similar position
- Strong experience (minimum 3 years) working in cloud-native and on-premise environments, in bare metal, virtualized (VM), and containerized / orchestrated deployments (AWS, GCP, Docker, Kubernetes, Anthos, Cloudflare)
- Experience working with any infrastructure-as-code and configuration management tools (Ansible, Terraform, Helm, etc.);
- Programming experience with any of Python, C , JavaScript, Go. (minimum 3 years)
- Experience with incident response and security-focused production ready mindset;
OANDA Global Corporation is a diverse and global team with offices around the world. We value the unique skills and experiences each individual brings to OANDA. We are committed to creating and sustaining a collegial work environment in which all individuals are treated with dignity and respect and one which reflects the diversity of the community in which we operate. We provide an inclusive and accessible environment for everyone.
Candidates selected for an interview will be contacted directly. If you require accommodation during the recruitment and selection process, please let us know. We will work with you to provide as seamless a recruitment experience as possible.
Learn more about our culture here
Instagram | Twitter | LinkedIn | YouTube