What are the responsibilities and job description for the SRE Engineer position at ISmile Technologies?
Job Title: Senior Site Reliability Engineer (SRE)
Location: Hybrid - 181 West Madison Street, Chicago, Illinois, 60602, United States (Must currently live in the area)
Duration: 12 months
Job Description
We are seeking a highly experienced Senior Site Reliability Engineer (SRE) to join our team. The ideal candidate will have extensive experience in troubleshooting and monitoring applications, with a strong background in SRE practices and tools. This role requires close collaboration with both application and infrastructure teams to ensure reliability goals are met and maintained.
Major Duties
Location: Hybrid - 181 West Madison Street, Chicago, Illinois, 60602, United States (Must currently live in the area)
Duration: 12 months
Job Description
We are seeking a highly experienced Senior Site Reliability Engineer (SRE) to join our team. The ideal candidate will have extensive experience in troubleshooting and monitoring applications, with a strong background in SRE practices and tools. This role requires close collaboration with both application and infrastructure teams to ensure reliability goals are met and maintained.
Major Duties
- Develop, collaborate, and implement SRE best practices across the entire technology stack.
- Collaborate with both application and infrastructure teams to establish reliability goals.
- Manage and maintain observability solutions for comprehensive monitoring.
- Play an important part in incident response and post-incident reviews, driving continuous improvement.
- Manage Disaster Recovery exercises.
- Conduct access review/audit activities.
- Participate in capacity planning and performance optimization efforts.
- Work with senior staff and management on service delivery improvements.
- Strong advocate for SRE principles and culture.
- Broad understanding of application and infrastructure components.
- Understanding of enterprise architecture patterns and best practices.
- Proficiency in incident response and post-mortem analysis.
- Experience in Terraform, Java, SpringBoot, Spring Framework, SQL/NoSQL databases.
- Knowledge of monitoring tools such as Datadog, Prometheus, and Grafana.
- Excellent oral and written communication skills.
- Highly flexible and adaptable to change.
- Strong analytical and problem-solving skills.
- A positive, goal-oriented attitude with a focus on service delivery.
- Strong interest in working with business teams on issue resolution.
- SRE Sr Engineer Skillset – Datadog, Prometheus, and Grafana.
- Azure Cloud.
- Java – Troubleshooting application code and monitoring apps.
- SpringBoot and Spring Framework.
- SQL/NoSQL.
- UNIX.
- Banking or Financial Services experience is a huge plus.
- Terraform - nice to have.
- A College or University degree in Computer Science.
- 8-10 years of proven work experience in a relevant field.