What are the responsibilities and job description for the SRE engineer (REMOTE) position at HawkB Inc?
Job Description
Focus is on Azure, Kubernetes, Ansible and monitoring tools.
Should have experience in coding with any of the programming language.
Client is using Datadog and Spluk for monitoring.
No dev ops, or systems engineers, no scripters,
Need software developers, that have a passion for infrastructure
Certs are nice
They are trying to focus on the main areas of the platforms. They will need to maintain and then also be able to make changes. They have a lack of best practices being applied to the life cycle. He wants his team to go in and automate his systems.
We would like to find a highly engaged engineer who is obsessed with code quality and a self-healing infrastructure. You should be able to identify, troubleshoot, and resolve issues quickly and develop strategies to ensure our environment never experiences the same problem twice. Responsibilities include capacity planning, performance tuning, and automation/tools development. Expect to spend 50% time focusing software engineering activities and should be willing to approach any and all code changes using a test-driven development model. As a Site Reliability Engineer, you will have great influence on the way we design and deploy our services and infrastructure across the enterprise.
This position will be part of a great team that is developing exciting products and solutions and playing a key part in driving forward the electrification of transportation.
What youll do:
- Apply a everything-as-code philosophy across configuration, infrastructure, orchestration methodologies to ensure our production systems are fault tolerant and resilient.
- Work with product development teams to define and implement enhanced monitoring and logging solutions to improve observability and enable Service-Level Objectives to ensure a world class customer experience.
- Participate in ground-up infrastructure design and planning for all future products and services.
- Be part of an on-call rotation and act as incident commander to assist finding a resolution during incidents
- Host blameless postmortems to share learnings, discover gaps, embrace transparency, and improve reliability across our services.
- Design and implement improvements to existing systems after reviewing past incidents and employ your systems knowledge to triage problems and tune resource usage.
What Were Looking For:
Basic Qualifications
- Bachelors Degree in Computer Science/ Engineering or equivalent work experience required.
- 7 years experience working as a software engineer focused on product or feature development.
- 3 years of experience with AWS.
- Must be comfortable reading and writing in any of the following: Go, C#/C , Java, Python
Preferred Qualifications
- Experience with IAC/CM tools (Terraform, Cloud Formation, Ansible, Chef, Puppet, Salt)
- Comfortable with one or more cloud service provider offerings AWS (preferred), Azure, Google Cloud Platform
- Understanding of containers and related technologies such as Docker, Podman, Swarm,
- Kubernetes in a production environment
- Understand networks, protocols, servers, storage systems, and the Linux operating system.
- Familiar with common application and system-level health monitoring system (NewRelic, Datadog, etc)