What are the responsibilities and job description for the Staff Platform Engineer position at Upbound?
As a Staff Platform Engineer, you’ll build, deploy, and operate Upbound’s Internal Platform. This consists of both our internal developer platform and our cloud control planes that serve as the foundation of the Upbound Cloud SaaS offering. Our control planes are based on the open source Crossplane (https://crossplane.io/) project, the open and cloud native control plane based on Kubernetes, which was created by Upbound and is now a Cloud Native Computing Foundation (CNCF) Sandbox project.
In this role, you will...
- Serve as a senior technical leader and contributor to Upbound’s Platforms
- Take ownership of business critical features and product innovations to deliver new features that will delight and amaze our customers
- Support the full project lifecycle - discovery, analysis, architecture, design, documentation, building, migration, automation, and production-readiness
- Evaluate and identify appropriate technology platforms, including frameworks and technology stacks for delivering Upbound products
- Collaborate with the development teams to assess and recommend technologies that support company organizational needs
- Take ownership of the health and reliability of the live production service and infrastructure, ensuring that SLOs/SLAs are consistently met
- Be entrusted to make technology decisions for the business, procuring the right technology and designing and implementing a self-service solution for the teams that consume Upbound infrastructure
- Be a contributor to the Kubernetes/Crossplane platform that Upbound Cloud is built on
- Communicate thoughtful and thorough designs and architecture for new initiatives
- Deliver high quality, well tested, and reliable functionality and services to production environments
- Demonstrate a strong operational understanding of service behavior and trade-offs in production with regards to scale, reliability, security, availability, etc.
- Troubleshoot and problem-solve to remediate production issues with your (and related) services
- Create clever solutions to complex problems by designing, building, and automating critical portions of the Upbound Cloud service infrastructure
- Report and fix bugs in private and public projects.
You are a good fit if you have...
- 8 years of software engineering experience writing high quality, reliable, and maintainable software
- 2 years of working experience with Kubernetes and its internals.
- Hands-on programming experience with Go
- Worked in teams that have deeply internalized site reliability engineering philosophies in their culture, environments, and processes
- You are productive and familiar with public cloud infrastructure: AWS, Azure, and GCP
- Advanced knowledge and experience with distributed systems design, principals, algorithms, and trade-offs
- Managed production Kubernetes deployments or have been responsible for deploying/managing workloads running on Kubernetes in production
- Architected and deployed highly scaled and reliable services, solutions, and infrastructure in multiple major cloud providers
- Incorporated modern operational and application delivery tools and methodologies into your production deployment workflows, like those from HashiCorp (e.g. Terraform), CI/CD, IaC, and GitOps
- Have full software lifecycle experience of successfully taking multiple projects from early designs to production deployment