What are the responsibilities and job description for the Senior Site Reliability Engineer position at Circle?
You will aspire to our four core values:
- Multistakeholder - you have dedication and commitment to our customers, shareholders, employees and families and local communities.
- Mindful - you seek to be respectful, an active listener and to pay attention to detail.
- Driven by Excellence - you are driven by our mission and our passion for customer success which means you relentlessly pursue excellence, that you do not tolerate mediocrity and you work intensely to achieve your goals.
- High Integrity - you seek open and honest communication, and you hold yourself to very high moral and ethical standards. You reject manipulation, dishonesty and intolerance.
Here is our team hierarchy for individual contributors:
Principal Site Reliability Engineer (VI)
Senior Staff Site Reliability Engineer (V)
Staff Site Reliability Engineer (IV)
Senior Site Reliability Engineer (III)
What you’ll be responsible for:
As a Senior Site Reliability Engineer at Circle, you’ll build out and maintain Circle’s infrastructure estate to meet growing worldwide customer base across multiple regions on public cloud providers.. You’ll use your experience, knowledge and skills to ensure Circle’s products and core systems are running in a consistent, reliable and performant manner. You’ll get the opportunity to develop your skills, collaborate across Circle teams and continue to learn in a fun, collaborative, iterative, and rapidly changing environment.
What you'll work on:
- Support multiple development teams with a agile, responsive CI/CD platform to deliver high-quality builds with measurable performance and quality
- Build, maintain, improve, scale and secure cloud infrastructure and resources using IaC tools (Terraform, CloudFormation, Pulumi)
- Automate operational tasks via Go, Python and serverless solutions (AWS Lambda, Kubernetes Jobs)
- Design, manage and monitor Kubernetes clusters for multiple production workloads
- Driving forward our blockchain infrastructure by creating and managing blockchain nodes across a wide variety of blockchains that includes Algorand, Ethereum, Hedera, Flow, Solana, Stellar, Tron
- Participate in an on-call rotation to mitigate disruption for any production systems and conduct root cause analysis
- Plan and test disaster recovery scenarios for a highly available microservices architecture
- Collaborate with the Security team to create and maintain security focused tools and frameworks and exert a top-class security posture
- Engaging and mentoring team members and helping grow and scale the team
What you’ll bring to Circle (not all required):
Senior Site Reliability Engineer (III)
- 4 years in DevOps or SRE roles, with a focus on tooling, automation and infrastructure on a major public cloud provider
- Proficiency with coding and/or scripting with the following languages (Python, Shell, Go)
- You have at least 3 years of combined experience in building and maintaining CI/CD platforms and supporting agile engineering teams building microservices
- Experience with
- building Docker images and deploy containers in Kubernetes clusters
- any modern CI/CD platform with seemingly complex gates and workflows
- Blue-Green, Canary and A/B Testing deployment strategies
- distributed blockchain systems, running and maintaining blockchain full nodes
- database technologies (PostgreSQL, Redis, Elasticsearch)
- migrating and transforming large, complex datasets from diverse sources, structures and formats
- data warehousing services (AWS Redshift, Databricks, Snowflake
- Knowledge of networking routing, DNS, load balancing and edge networking
- Knowledge of APM, RUM, monitoring and telemetry tools
- Helm charts and deploying and maintaining Kubernetes clusters
- writing infrastructure as code with Terraform or CloudFormation and using IaC to deploy resources in AWS, Azure, GCP or any other public cloud providers
- Strong skills around observability, troubleshooting and performance solutioning
- Ability and eagerness to deep dive into understanding, debugging and improving any layer of the tech stack
- Exhibit strong communication skills and ability to explain technical concepts to peers and stakeholders
Staff Site Reliability Engineer (IV)
All the requirements of a Senior Site Reliability Engineer and:
- 7 years in DevOps or SRE roles, with a focus on tooling, automation and infrastructure on a major public cloud provider
- Led teams technically on architecture and system design.
- Deep understanding/experience with:
- API design and REST principles,
- Cloud services (AWS, Google Cloud, Microsoft Azure, etc)
- Container orchestration systems like Kubernetes or EKS, ECS
- SQL databases and designing schemas
- Deep focus on coding standards and code quality -- a desire to have great test coverage
Senior Staff Site Reliability Engineer (V)
All the requirements of a Staff Site Reliability Engineer and:
- 10 years in DevOps or SRE roles, with a focus on tooling, automation and infrastructure on a major public cloud provider
- Expert in many areas of System availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
- Possess the best practices on automation, system design, and improvements to system resilience.
Principal Site Reliability Engineer (VI)
All the requirements of a Senior Staff Site Reliability Engineer and:
- 12 years in DevOps or SRE roles, with a focus on tooling, automation and infrastructure on a major public cloud provider
- Cross-functional partnership to align teams with external stakeholders.
- A champion on setting coding standards and code quality -- a desire to have great test coverage to enable continuous delivery.