What are the responsibilities and job description for the SRE Lead / Architect position at IT-SCIENT?
Job Description
Job Description
10 Years of Experience Minimum
An DevOps-SRE will greatly benefit both IT operations and software development teams. Not only can DevOpsSRE drive deeper reliability to systems in production but it will likely help IT, support and development teams
spend less time working on support escalations and give them more time to build new features and service
Mandatory Skills:
Scripting: Python
DevOps Tools: Git, Jenkins, Ansible, Terraform, Docker & Kubernetes
Cloud: AWS or Open stack
Monitoring: Prometheus & any other logging tool (EFK, Splunk…)
Preferred Certifications: AWS Solution Architect & CKA Roles & Responsibilities: Optimizing on-call rotations and processes: SREs will need to take on-call responsibilities. The DevOps-SRE
role will have a lot of say in how the team can improve system reliability through the optimization of on-call
processes. DevOps-SRE teams will help add automation and context to alerts – leading to better real-time
collaborative response from on-call responders. Additionally, SRE can update runbooks, tools and
documentation to help prepare on-call teams for future incidents.
Fixing support escalation issues: DevOps-SRE can expect to spend time fixing support escalation cases. But, as
your DevOps-SRE operations mature, your systems will become more reliable and you’ll see fewer critical
incidents in production – leading to fewer support escalations.
Conducting post-incident reviews: Need to conduct post-incident reviews, documenting their findings and
taking action on their learnings. Then, site reliability engineers are often tasked with action items for building
or optimizing some part of the SDLC or incident lifecycle
Building software to help operations and support teams: DevOps-SREs in charge of proactively building and
implementing services to make IT and support better at their environments. This can be anything from
adjustments to monitoring and alerting to code changes in production.