What are the responsibilities and job description for the Principal Data Engineer position at KCF Technologies?
Where You Come In:
We are currently seeking a Principal Data Engineer to join our growing team! At KCF, you will operate within and influence several cross-functional squads with other engineers and stakeholders (Software, Hardware, DevOps, Sales, Technical Operations, Product Owners, and Machine Learning). As a Principal Data Engineer, you will provide high-level individual technical contributions, you will lead a team of data scientists and analysts, and you will also help to build and maintain KCF’s SMARTdiagnostics machine health platform, which stores and processes industrial IoT sensor data to provide analytics and insights to our users. This will help us achieve our goal of zero waste, zero downtime, and zero safety incidents for all of industry.
This role can be 100% remote-based. With our Work From Home, Work From Anywhere model, KCF employees are spread across 27 different U.S states. We advocate for owning your work - you define how you do it and where you do it. Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment visa at this time.
This is starting to sound like your next challenge, right? Read on for more info!
Essential Responsibilities:
- Lead the design, development and deployment of artificial intelligence, machine learning, and statistical models.
- Lead the design and optimization of high-performance data pipelines and analytical tools, ensuring scalability and efficiency to support growing data needs.
- Build and manage databases and data warehouses and assist in implementing our new data platform to ensure scalability, reliability, and performance.
- Drive innovation by exploring emerging technologies and staying abreast of industry trends, applying this knowledge to enhance analytics solutions and methodologies.
- Manage a team of data scientists and analysts, provide technical mentorship to direct reports and guidance for external contractors, and foster growth while ensuring alignment with industry best practices
- Collaborate with team members to understand their data requirements and assist with providing them with access to clean, reliable data.
- Partner closely with stakeholders to understand business challenges and devise technical solutions using data-driven approaches, ensuring alignment with organizational goals.
- Take the helm in leading working groups, defining scope requirements, and steering projects through the entire lifecycle, from inception to implementation and maintenance.
- Maintain comprehensive and detailed documentation of model source code, data pipelines, processes, and best practices, ensuring clarity and consistency for the team and stakeholders.
- Ensure compliance with data privacy regulations and ethical standards in data handling while implementing governance practices to maintain data integrity and security.
- Design and conduct hypothesis testing to validate assumptions and assess the impact of changes or new features within the product.
- Work with domain experts, engineers, and other teams to understand business needs, gather domain-specific knowledge, and integrate diverse perspectives into data-driven solutions.
- Optimize existing algorithms and processes for efficiency, scalability, and performance in handling large-scale datasets and real-time applications.
- Analyze customer behavior patterns and market trends and provide insights to help drive strategic decision-making and product enhancements.
- Architect machine learning platforms and workflows and design scalable and efficient systems that facilitate the development and deployment of advanced analytics and AI solutions.
Qualifications:
- MSc or PhD in Statistics, Machine Learning, Computer Science, or equivalent with 7 years of Applied ML experience.
- Extensive experience with advanced statistical and machine learning techniques, including Bayesian methods, regression analysis, multi-label classification, and other predictive modeling approaches, with a proven ability to apply these methods to solve real-world problems effectively and innovatively.
- Demonstrated leadership and management skills, with a proven track record of successfully leading data science teams through complex projects from conception to completion, fostering collaboration, innovation, and professional growth among team members
- Demonstrated knowledge and ability working with AWS, Google Cloud, or other cloud-based solutions to train models, set up data pipelines, and set up inference engines. Professional certifications such as AWS Certified Solutions Architect, or Google Cloud certification preferred.
- Experience with microservices and deployment of ML models.
- Experience working with tools like AWS Sagemaker, Hadoop, Spark, and Delta Lake.
- Deep understanding of Python's data libraries and frameworks for analytics.
- Understanding of the following concepts: feature stores, data lineage, A/B testing, model scoring/feedback.
- Experience in feature engineering, including expertise in identifying, developing, and implementing innovative features that significantly enhance model performance and data utility.
- Experience applying digital signal processing techniques such as Fourier transforms, filtering and wavelet analysis to time series data.
- Excellent problem-solving skills with a focus on architecting robust and scalable analytics solutions.
- Ability to optimize SQL for high-performance data operations.
- Demonstrated strong technical communication skills, effectively conveying complex data and analytics concepts to diverse audiences, both technical and non-technical.
- Ability to design and execute data experiments to test hypotheses and validate insights quickly and effectively.
- Proficiency in validating and adopting emerging technologies and methodologies in analytics, data processing, and cloud computing.
- Strong background in data visualization and ability to communicate complex data insights effectively.
- Familiarity with concurrent programming such as threads and asynchronous I/O.
- Experience in IIoT space preferred.