What are the responsibilities and job description for the Data Engineer position at XFORIA Inc?
Job Description
Title: Data Engineer
Location: McLean, VA (Hybrid onsite - 3 days a week)
Important notes:
- We prefer a local candidate since this opportunity is hybrid onsite in McLean, VA (3 days a week) and in-person interview is required.
- Please send the resume of the candidate who is on your own W2 (No H1B transfers at this time).
Manager's call notes
Location: Must be on-site 3x a week in McLean, VA from Day 1
Duration: 6-month contract with opportunity for extension
Schedule: 40 hour
Required Skills: 3 years of Spark ,2 years in SQL, Python
Years of experience: 5
Interview Information -
Rounds: 2 Round
Duration: 30 mins|2 hr. in person interview
Additional Notes: MS Teams, Video Mandatory
Scheduling: 6/01-1-330,06/02-10&1130
Data Engineer Senior
Qualifications:
• Bachelor’s degree in Computer Science, Engineering, Data science or a related quantitative field.
• 5-6 years of relevant experience in design and development of data pipelines to processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs)
• Hands-on programming experience in Hadoop, Spark, Python and SQL for data processing and analysis.
• Demonstrated ability to manage competing demands, prioritize work, and manage customer expectations.
• Strong verbal and written communication skills.
Required Technical Skills
• Advanced Python, SQL and Spark, very good familiarity with Big data technologies like Hadoop, Scoop, Hive, Ambari
• Prior experience working with AWS and Snowflake technologies
• Unix Shell script, Autosys batch scheduling
Responsibilities
• Cleanse, manipulate and analyze large datasets (Structured and Unstructured data – XMLs, JSONs, PDFs) using Hadoop platform.
• Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
• Be able to build Dashboards in R/Shiny for end user consumption
• Manage and implement data processes (Data Quality reports)
• Develop data profiling, deduping logic, matching logic for analysis
• Use programming languages in Python, PySpark and Spark for data ingestion.
• Develop programs in BigData platform using Hadoop platform.
• Present ideas and recommendations on Hadoop and other technologies best use to management.
- 3 years of Python and Spark in Hadoop/AWS platforms
- 3 years of SQL
- Should be able to communicate well, be able to develop unit tests, comfortable working with users
- Python for building data pipelines
- Unix shell scripting and Autosys for automation
- Are currently in AWS, need either on HDP or AWS platform
- In-person interview is required. We will ask them to ask sample code in many questions, white board is fine, will provide them pen and paper
Location: Must be on-site 3x a week in McLean, VA from Day 1
Duration: 6-month contract with opportunity for extension
Schedule: 40 hour
Required Skills: 3 years of Spark ,2 years in SQL, Python
Years of experience: 5
Interview Information -
Rounds: 2 Round
Duration: 30 mins|2 hr. in person interview
Additional Notes: MS Teams, Video Mandatory
Scheduling: 6/01-1-330,06/02-10&1130
Data Engineer Senior
Qualifications:
• Bachelor’s degree in Computer Science, Engineering, Data science or a related quantitative field.
• 5-6 years of relevant experience in design and development of data pipelines to processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs)
• Hands-on programming experience in Hadoop, Spark, Python and SQL for data processing and analysis.
• Demonstrated ability to manage competing demands, prioritize work, and manage customer expectations.
• Strong verbal and written communication skills.
Required Technical Skills
• Advanced Python, SQL and Spark, very good familiarity with Big data technologies like Hadoop, Scoop, Hive, Ambari
• Prior experience working with AWS and Snowflake technologies
• Unix Shell script, Autosys batch scheduling
Responsibilities
• Cleanse, manipulate and analyze large datasets (Structured and Unstructured data – XMLs, JSONs, PDFs) using Hadoop platform.
• Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
• Be able to build Dashboards in R/Shiny for end user consumption
• Manage and implement data processes (Data Quality reports)
• Develop data profiling, deduping logic, matching logic for analysis
• Use programming languages in Python, PySpark and Spark for data ingestion.
• Develop programs in BigData platform using Hadoop platform.
• Present ideas and recommendations on Hadoop and other technologies best use to management.
Salesforce Quality Assurance Engineer - QAA12, QAA15
Pantheon Data -
Fairfax, VA
Network Engineer
Strategic Data Systems -
Dahlgren, VA
Implementation Engineer
Assured Data Protection -
Herndon, VA