What are the responsibilities and job description for the Data Architect I position at UST Global?
Description:
UST Global® is looking for talented Database Reliability Engineer- Site Reliability/ Cloud Operations and will be working with one of the leading technology providers in the US. The ideal candidate should have the ability to work creatively and analytically in a problem-solving environment. The ideal candidate must possess excellent written and verbal communication skills with the ability and knowhow to collaborate effectively with domain experts and IT leadership team.
Job Responsibility:
Site Reliability Engineering, part of Cloud Operations, has an exciting mission: Build, deploy, operate, scale and maintain company-wide platforms (PlaaS) for customer facing SaaS solutions . While various development groups focus on building our platforms, Cloud Ops provides operational/engineering support for both the platform as well as the product teams that use the platforms.
Database Reliability Engineers (DBRE) are responsible for keeping database systems that support all user-facing services running efficiently 24/7/365 . DBREs are a blend of database engineering and DB administration enthusiasts and software crafters that apply sound engineering principles, operational discipline and automation, specializing in databases. In that capacity, DBREs are peers to SREs and bring database expertise to the SRE and Infrastructure teams as well as our engineering teams.
We're supporting over 25 products across different regions in the public clouds, using a mix of database technologies: Cassandra, MongoDB, MySQL, Postgres.
We are looking for an experienced Database Engineer, passioned about SRE principles, that will work closely with various engineering teams that are building cloud native customer-facing services.
Areas of Responsibility:
- Work on database reliability and performance aspects for core database infrastructure pieces that allow products to scale
- Ensure the highest level of uptime and Quality of Service (QoS) to customers through operational excellence
- Work with engineering teams on database architectural designs, performance optimization, environment build-out
- Apply SRE principles in your day-to-day activities
- Act as main point of contact for production incidents, perform root cause analysis, identify and resolve underlying problem patterns, while working towards develop automated and self-healing solutions
- Identify areas to improve service resiliency through techniques such as chaos engineering, performance & load testing, etc.
- Support and maintain globally distributed, multi-cloud database environments
- Document and automate common, repeatable tasks at large scale to streamline operational procedures and reduce the human footprint
- Adhere to Change Control policy requirements and availability mandates/requirements.
- Work in a multifaceted, fast-paced environment with distributed teams and inter-dependent services
- Cross-train and collaborate with other team members, part of the distributed team
- Participate in a cross-regional on-call rotation using a follow-the-sun model.
Job Requirements:
- At least 3 years relevant production experience in supporting at scale, highly available, mission-critical environments running at least one of the following open-source database management systems:
- NOSQL - MongoDB, CouchDB, Cassandra
- RDBMS - Percona XtraDB Cluster, MariaDB Galera Cluster, PostgreSQL
- Experience running database environments at scale in public clouds (AWS and/or Azure)
- Experience with infrastructure automation and configuration management tools such as Chef, Ansible, Puppet, Terraform
- Deep understanding of cluster management areas, such as adding/bootstrapping/removing nodes, scaling, consistency tuning, replication, and multi-datacenter configuration
- Experience in securing, monitoring, capacity planning, full-proof DR, backup & recovery for distributed database systems
- Strong understanding of HA strategies including replication, clustering, sharding
- Experience in performance monitoring and storage performance optimization, tuning database server configurations, queries, and indexes
- Strong data modeling and data structure design skills
- Good understanding of Linux OS concepts and of Linux and Unix Shell
- Proficiency in any of the scripting language (e.g. Python /PHP/Perl/ Ruby)
Desirable:
- Exposure to any other open-source DBMS solutions
- B.S. degree in Computer Science or related technical field
- Experience with monitoring software such as Prometheus, Grafana, New Relic
- Experience with containers and orchestration technologies such as Docker, K8s
- Experience working within software development or Internet-related industries, particularly in the context of a SaaS offering.
Qualities:
- Good interpersonal, verbal and written communication skills.
- The ability to efficiently communicate technical knowledge in a clear, concise and easy to understand manner.
- Able to work independently with minimum need or supervision.
- Participation in technical blogging, PoCs and community projects.
- Willing to learn new technologies and to adapt quickly
- Strong sense of humor