What are the responsibilities and job description for the SRE position at Experis?
Job Description
Experis' Client is currently seeking an SRE / Site Reliability Engineer for a long-term contract. The role would be hybrid 3 days per week on-site in either Jersey City, NJ OR Plano, TX OR Charlotte, NC (2 days remote).
Due to client rules Experis is unable to work with 3rd parties - all candidates must be able to work on Experis' W2 directly.
Main skill sets: Azure, AWS, Openshift, Splunk, Ansible
Details:
Due to client rules Experis is unable to work with 3rd parties - all candidates must be able to work on Experis' W2 directly.
Main skill sets: Azure, AWS, Openshift, Splunk, Ansible
Details:
- Responsible for reliability and support of Cloud Platform including Public Cloud (Azure /AWS /Google) services.
- Monitor and troubleshoot Azure/AWS /Google environment performance issues, connectivity issues, security issues, etc.
- Perform deep dives into systemic and latent reliability issues, incident management, problem management
- Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.
- Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
- Identify and drive opportunities to improve automation for the cloud services; scope and create automation for deployment, management, and visibility of our services.
- Evaluating and automating the scaling and capacity requirements within Azure environments
- Engage with engineering teams throughout the full lifecycle from design, engineering, deployment, & operations.
- Partner with risk and compliance teams to bring visibility and implement right controls and policies in the Cloud Platform
- Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams
- Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams
- Participate in 24x7 on-call coverage follow the sun model
- BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
- Minimum 4 years of hands-on experience maintaining cloud platforms on a major cloud service provider.
- Experience working on Azure operations and Administration.
- Azure /Terraform /AWS /Google certifications are a plus
- Strong experience in implementing, monitoring, and maintaining Microsoft Azure solutions, including major services related to Compute, Storage, Network and Security
- Experience with monitoring tools such as Prometheus or Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics
- Understanding of cost management, inventory management, FinOps model
- Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
- Advanced knowledge of DNS, DHCP, Kerberos and Windows Authentication
- Experience with IaC with Terraform
- Python, Ansible and shell scripting
- Experience with CI/CD tools such as git andJenkins, familiarity with using a GitOps model
- Excellent understanding of Linux /Windows operating systems administration
- Systematic problem-solving approach, sense of ownership and drive
- Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
Lead Site Reliability Support Engineer (SRE)
Wells Fargo -
CHARLOTTE, NC
Senior Site Reliability Support Engineer (SRE)
Wells Fargo -
CHARLOTTE, NC