What are the responsibilities and job description for the Data Engineer position at Vantage Bank Texas?
Description
Description
JOB CLASSIFICATION
Full / Part-time: Full-time
Hours Per Week: 40
Location: 1401 19th Street, Hondo TX 78861
The Data Engineer is part of a high-performing Data Engineering team that will play a crucial role in designing, developing, and maintaining scalable data pipelines and infrastructure to support our banking operations. You will collaborate with cross-functional teams to ensure efficient data flow and implement robust data solutions to drive business insights and decision-making. This role offers an exciting opportunity to work at the intersection of finance and technology, leveraging cutting-edge tools and techniques to optimize data management processes within a dynamic banking environment.
The data engineer will be responsible for architecting and deploying a highly performant data platform that supports analytics-related initiatives across the enterprise. They will build complex ETL and ELT pipelines to consolidate data from multiple disparate data sources into the bank’s enterprise data warehouse and take ownership of the bank’s enterprise analytics data infrastructure.
ESSENTIAL DUTIES
The duties listed below are not exhaustive and may not encompass all responsibilities assigned to individuals in this role. The incumbent may also be tasked with performing other related duties as assigned.
1. A subject matter expert on the bank’s enterprise data architecture
2. Provides support for the bank’s analytics and data-driven initiatives.
3. Translates complex engineering and technical concepts to senior management and non-technical employees to enable understanding and drive informed business decisions.
4. Design, develop, and maintain scalable data pipelines and ETL processes to collect, process, and store large volumes of structured and unstructured data.
5. Uphold data governance principles and standards by defining and enforcing data policies, procedures, and standards across the organization.
6. Ensure compliance with regulatory requirements and industry standards related to data privacy, security, and confidentiality.
7. Develop and maintain metadata management processes and data dictionaries to facilitate data discovery, lineage, and governance.
8. Take complete ownership of the data engineering processes, overseeing the entire lifecycle from conceptualization through design, development, and deployment.
9. Stay abreast of emerging technologies, industry trends, and best practices in data engineering and data governance, and apply them to enhance existing systems and processes.
10. Designs and builds large and complex information sets; Integrates and extracts relevant information from enormous amounts of both structured and unstructured data (internal and external) to enable analytical solutions.
11. Leads efforts to develop scalable, efficient, automated solutions for large scale data analyses, model development, model validation and model implementation.
12. Provides guidance regarding data management approaches and data pipeline methodologies to team members.
13. Contributes to documentation efforts centered around data governance, data management, and data architecture.
14. Strong individual planning and project management skills, able to juggle multiple tasks and priorities.
15. Provide technical guidance and support to team members, fostering a culture of collaboration, learning, and innovation.
16. Work closely with contractors and external vendors to enhance the bank’s data architecture.
Requirements
These specifications are general guidelines based on the minimum experience normally considered essential to the satisfactory performance of this position. The requirements listed below are representative of the knowledge, skill and/or ability required to perform the position in a satisfactory manner. Individual abilities may result in some deviation from these guidelines.
1. Bachelor's degree in computer science, mathematics, statistics, economics, or other quantitative discipline.
2. Proven experience in designing, building, and maintaining data pipelines and infrastructure in a production environment with a minimum of 3 years of experience.
3. Ability to translate business requirements into database models, data processing pipelines, application programming interfaces (API), and data tooling.
4. Proficiency in Python with a strong ability to write efficient and optimized code for data processing and manipulation with data frames. Experience with coding in notebooks like Jupyter, such as JupyterLab, Zeppelin, or Databricks, for interactive data exploration, analysis, and collaboration.
5. Demonstrated expertise and thorough familiarity with SQL, covering query optimization, database design, and data manipulation across various relational database management systems.
6. Proficient knowledge of relational databases including Microsoft SQL Server, MySQL, PostgreSQL, or NoSQL variants, coupled with proficiency in data warehousing technologies such as Azure SQL Data Warehouse, Databricks, Snowflake, BigQuery, or Redshift.
7. Strong understanding of data modeling, schema design, and database optimization techniques, with demonstrated experience implementing these concepts in real-world projects to ensure efficient data storage, retrieval, and analysis.
8. Strong written, oral, and presentation communication skills with the ability to shape messages and content to audiences of widely varying roles and technical backgrounds.
9. Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment.
PREFERRED SKILLS
· Experience working in regulated environments (e.g., Financial Services).
· 4-6 years of experience with demonstrated experience in ETL/ELT development, database administration, software engineering, or analytics.
· Experience with data governance frameworks, tools, and practices, including data lineage, data cataloging, and metadata management.
· Preferred Experience with any of the following cloud data and analytics technologies:
o (Most Preferred) Azure – Azure Databricks, Azure Data Lake, Event Hubs, SQL Data Warehouse, Azure Data Factory
o AWS – Elastic Map Reduce, Athena, Redshift, Kinesis, Glue, S3
o GCP – BigQuery, BigTable, Dataflow, DataProc, Google Cloud Storage
· Machine Learning experience, Generative AI and LLMs.