What are the responsibilities and job description for the Senior Infrastructure and Platform Engineer position at MPG Operations LLC?
Senior Infrastructure and Platform Engineer We are looking for a Senior Infrastructure/Platform Engineer to join our WorldQuant aligned Infrastructure team. The team is comprised of multidisciplinary individuals with unrestricted access across a large environment. We believe that one cannot build a truly great service without the ability to make changes across the stack. We take great care in focusing on solving real business problems, reducing operational overhead and working together as a team. The platform and infrastructure team is responsible for the following areas – this includes both engineering and operations: data modelling, database tuning & query optimization HPC job scheduling workflow management and batch processing container orchestration service discovery POSIX and object storage systems on premise: bare metal compute (linux) system tuning configuration management and drift management performance tuning network configuration management compute, storage, network system purchases / evaluations cloud(s) Environment provisioning and management Qualifications/Skills Required We are looking for individuals with experience in two or more of the following areas: HPC job scheduling Experience in environments at scale (eg. billions of jobs per week/month) Understanding of cost metrics, preemption, job types, queuing, scheduler and optimizations experience with products like HTCondor, slurm, spectrum LSF, nomad, AWS batch Container Orchestration (Kubernetes) Experience with: PSPs, helm, admission/mutation controllers, PVs/PVCs, kube-router, BGP – generally demonstrated ability dig deep into the k8s projects to solve hard problems Experience with docker & registries (eg. harbor, artifactory, GCP container registry, AWS container registry) Mature approach to dealing with operational complexities and gaps of the kubernetes platform Storage Systems Experience deploying and managing petabyte scale systems supporting varied workloads Mature approach to accessing price/performance, tiering and backup requirements experience with products like GPFS, NetApp, Pure, Lightbits Ceph, GCP PDs or other nvme specific products familiarity with NVMeoverfabric, POSIX, object storage and various modes of permissioning data Linux Experience using configuration management systems (eg. saltstack, ansible) Understanding of linux kernel components (eg. VFS, scheduler, memory mgmt., network) Solid troubleshooting experience using gdb, OS & application tracing/profiling mechanisms Experience with some of docker, lxd/lxc, kerberos, ebpf and virtualization technologies Workflow management and batch processing Experience in the challenges of workflow management in heavily multi-tenant environments Mature approach to dealing with/avoiding task failure and system failure experience with products like airflow, nifi, gnubatch, GCP cloud composer, AWS sagemaker Software Engineering Proficient in OO development (we use python), git and CI/CD concepts Comfortable contributing to a large code-base with varied technologies In addition to the above, the following qualifications always apply: Ability to review and/or extend open source platforms to satisfy business requirements A passion for technology and automation, deep sense of curiosity and willingness to always question A passion for in-depth understanding of technology, and building large-scale systems. Excellent verbal and written communication skills.
Senior Software Engineer - Platform
Alpaca -
New York, NY
Senior Platform Engineer
Match Group -
New York, NY
Senior Personalization Platform Engineer
Informa Group Plc. -
New York, NY