What are the responsibilities and job description for the Senior Site Reliability Engineer position at Atlas Search?
This firm is a leading provider of alternative investment solutions with approximately $63 billion of assets under management and over 600 employees worldwide. They believe in leveraging technology and data to maintain cutting-edge solutions, making it an exciting environment for tech enthusiasts like yourself.
They’re currently seeking an enthusiastic Senior Site Reliability Engineer to join the team and play a key part in leading the implementation of their tooling, applications, infrastructure, and processes. This firm is on the bleeding edge of technology where you will work with the latest in infrastructure as code, containerization, configuration management, and monitoring solutions. This position allows you to work with the latest and greatest technologies as they are a very innovative firm that offers an excellent culture and great perks.
In addition to the role specifics, they offer a range of benefits including top-tier medical, dental, vision, and life insurance plans, strong 401k matching contributions, onsite breakfast and lunch daily, gym onsite, fitness reimbursement, generous PTO, 5k stipend for training and development opportunities, and more.
Base: 150-200K plus a bonus (Total Comp 200-500K) depending on interview performance, years of experience, level of education obtained, and skill set.
Location: Midtown Manhattan
Hybrid: 4 days a week onsite
The three main elements of the position include:
- System observability
- Engineering initiatives
- Production operations
Responsibilities:
- Proactively monitor system performance, address issues, identify gaps, and implement effective solutions.
- Work with system stakeholders to establish best practices for observability, ensuring business-critical systems and applications are reliable and resilient.
- Develop tools, monitoring systems, and automation using languages such as PowerShell, Python, Go, and SQL.
- Collaborate closely with development and platform teams, both locally and globally, to resolve problems and provide solutions.
Requirements:
- Possess a strong understanding of observability tools for microservices, such as Prometheus, Grafana, and OpenTelemetry.
- Experience with metrics and tracing instrumentation, with familiarity with the LGTM stack and PromQL being a plus.
- Proven experience working within a large, complex technical environment, encompassing both modern and legacy systems built with various languages and architectures.
- Strong familiarity with CI/CD tools.
- Highly proficient in coding with Python, PowerShell, and Go.