Site Reliability Engineer (SRE)

Edison Systems
Alpharetta, GA Full Time
POSTED ON 8/1/2024 CLOSED ON 8/23/2024

What are the responsibilities and job description for the Site Reliability Engineer (SRE) position at Edison Systems?

Job Details

Site Reliability Engineer (SRE)
Alpharetta. GA ONSITE (Locals )
Experience: 10 years
Client: Equifax
Have Skills
Big Data Processing: ETL/ELT experience
Scripting Languages: Groovy, Python
Cloud Certification: Relevant certifications in cloud technologies
Job Description
Seeking an experienced Site Reliability Engineer who can operate independently with limited guidance and oversight. This individual will be passionate about end-user experience and will be part of a tight-knit, distributed engineering team developing and delivering a comprehensive data operations management solution for Equifax's Data Fabric Platform. SRE is a critical role in the entire SDLC from coding, scaling, and ensuring production stability that includes responding to on-call incidents.
Must-Have Skills
General experience: 10 years of experience in software engineering, systems administration, database administration, and networking. System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible and/or containers (Docker, Kubernetes), and shell scripting Cloud-Native Application Development: 3 years. Solid experience with developing and supporting cloud-native applications. Experience with cloud-based security: IAM, AuthZ End-user Application Experience: 4 years experience as a SRE supporting an end-user facing application, e.g web/mobile/desktop app that includes UI, APIs, and backend systems
Development Experience: 2 years of general proficiency with Java, or JavaScript/NodeJS
Frontend Experience: Experience with Angular, JavaScript, TypeScript, or modern web application development frameworks
Architecture Knowledge: Understanding of modular systems, performance, scalability, security
Agile Experience: Agile development mindset and experience Service-Oriented Architecture: Knowledge of RESTful web services, JSON, AVRO
Application Troubleshooting: Debugging, performance tuning, production support
Documentation Skills: Strong written and verbal communication
General SDLC: Experience with CI/CD concepts and can use tools including Jenkins/Bamboo, and release management concepts. Understanding of Google Cloud Platform services related to big data like BigQuery, Dataflow, Pub/Sub,GCS, Composer/Airflow. Or, similar solutions in AWS: Redshift, SNS, SQS, S3, Kinesis and others
Frontend: Angular 17 , JavaScript, TypeScript, HTML, SCSS, Webpack Module Federation, Tailwinds CSS, Angular Material, Angular Elements
Backend: Java (JDK 17 ), Spring Framework 6.X.X, Spring Boot 3.X.X, NestJS 10.X.X, REST and GraphQL microservices, NodeJS Tools & Frameworks: Nx build management, Monorepo architecture, Jenkins CI/CD, Fortify, Sonar, GitHub Cloud & Data: Google Cloud Platform (GKE, Composer Airflow, Dataflow Apache Beam, BigQuery, BigTable, Firestore, GCS, PubSub, Vertex AI), Terraform, Helm Charts, GitOps
Other Technologies: Websockets, SSE, event-driven architecture
Data Fabric is a Google Cloud Platform cloud-native modern data management platform which allows Equifax to acquire and curate data, provide entity resolution, and ingest into a single environment. It is deployed globally in multiple regions, highly secured and complies with regional and internal regulatory controls with strict governance and oversight. Business units, Data Scientists and many other stakeholders use APIs to consume data managed by the Data Fabric and operate data exchanges to monetize data through B2B and B2C channels.
Data operations management solution consists of:
A web portal UI/UX that provides a single point of access to all data management and data reliability engineering
A suite of backend API services that services the UI and integrates with low-level Data Fabric and other third-party system APIs Modern data lakehouse (data lake, data warehouse, batch and streaming ELT pipelines)
The data operations roadmap envisions a set of rich management capabilities including:
Serves a large community of geographically dispersed data operations stakeholders Data quality and observability management to detect, alert, and prevent data anomalies Troubleshooting, triaging and resolving data and data pipeline issues
OLAP, batch and streaming big data processing, and BI reporting MLOps
Real-time dashboards, alerting and notifications, case management, user/group management, AuthZ, and many other foundational capabilities
Tech Stack
Environment
Culture: Fast-paced, creative, results-oriented
Team Structure: Agile, working in 2-week sprints using Aha and Jira for project management
Expectations: Self-starters who can work independently with limited guidance, delivering solutions that end-users value and love
General Responsibilities
Contribute to Development Activities: SRE is expected to participate in SDLC activities that include design, develop, test, deploy, and operate, covering both frontend and backend Cross-Functional Work: Collaborate with global teams to integrate with existing internal systems and Google Cloud Platform cloud
Issue Resolution: Triage and resolve product or system issues, ensuring quality and performance
Documentation: Write technical documentation, support guides, and run books
Agile Practices: Participate in sprint planning, retrospectives, and other agile activities
Compliance: Ensure software meets secure development guidelines and engineering standards
SRE Accountability
General: Use coding, automation, and software engineering principles to ensure scalability, performance, and reliability efficiently and toil-free
IAC: Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK) CI/CD: Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloudnative toolchains
Automation: Build automated tooling to deploy service requests to push a change into production. Build runbooks that are comprehensive
and detailed to manage detect, remediate and restore services
Change Management: Work closely with the dev team to ensure all DevSecOps issues are addressed timely, in compliance with Equifax security policies, and adherence to Engineering Handbook
Incident management: Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR
RCA and postmortem: Lead root cause analysis and blameless postmortem and own the call to action to remediate recurrences Customer Focus: Address service disruptions and downtime ensuring end-customer needs are met, and drive processes for a flawless customer experience ensuring
Reliability and Availability: Ensure monitoring of SRE golden signals, SLO, SLIs, and SLAs are honoured within error budgets. Work closely with devs, QE, POs, and other stakeholders providing continuous feedback on uptime, scalability, and reliability, and influence best practices with aim of providing excellent operational experiences
Reliability roadmap: Own the reliability roadmap by taking a holistic view of all data operations management capabilities that includes participating in Production Readiness Review (PRR), and working with stakeholders to ensure DR plans are in place
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Site Reliability Engineer
Jobs for Humanity -
Atlanta, GA
Senior Site Reliability Engineer
Censys -
Atlanta, GA
Senior Site Reliability Engineer
Cox -
Redan, GA

For Employer
Looking for Real-time Job Posting Salary Data?
Keep a pulse on the job market with advanced job matching technology.
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Site Reliability Engineer (SRE)?

Sign up to receive alerts about other jobs on the Site Reliability Engineer (SRE) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,325 - $136,051
Income Estimation: 
$143,884 - $190,809
Income Estimation: 
$140,077 - $165,984
Income Estimation: 
$90,594 - $121,961
Income Estimation: 
$116,251 - $150,519
Income Estimation: 
$116,251 - $150,519
Income Estimation: 
$135,310 - $175,142
Income Estimation: 
$70,726 - $96,424
Income Estimation: 
$90,594 - $121,961
Income Estimation: 
$121,948 - $153,897
Income Estimation: 
$143,022 - $179,427

Sign up to receive alerts about other jobs with skills like those required for the Site Reliability Engineer (SRE).

Click the checkbox next to the jobs that you are interested in.

  • SAP Asap Methodology Skill

    • Income Estimation: $151,672 - $199,860
    • Income Estimation: $160,434 - $212,550
  • Bug/Defect Analysis Skill

    • Income Estimation: $100,668 - $129,964
    • Income Estimation: $102,328 - $137,582
This job has expired.
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Site Reliability Engineer (SRE) jobs in the Alpharetta, GA area that may be a better fit.

Senior Site Reliability Engineer

CAI Cox Automotive Corp Svcs., LLC, Atlanta, GA

Site Reliability Engineer

Kobiton, Atlanta, GA