What are the responsibilities and job description for the Data Scientist position at Flagship Pioneering, Inc.?
Who We Are
Generate Biomedicines, Inc. is a Flagship backed, privately-held biotechnology company on a mission to reimagine the drug discovery process through the use of cutting-edge machine learning techniques. Core to Generate’s approach is the development and application of novel machine learning algorithms to solve foundational problems in molecular and protein biology. Generate’s unique platform seeks to drive innovation at the intersection of machine learning and biology through deep collaborations between wet lab and dry lab scientists and engineers. We are seeking collaborative, relentless problem solvers that share our passion for impact to join us!
Since its inception in 2018, Generate has received over $50 million in venture funding and its board of directors includes scientific and entrepreneurial luminaries such as Frances Arnlold (Nobel prize in chemistry, 2018), Stéphane Bancel (CEO, Moderna), and Noubar Afeyan (Founder and CEO, Flagship Pioneering). Generate was founded by Flagship Pioneering, a venture creation firm based in Cambridge, MA that conceives, creates, resources, and develops first-in-category life sciences companies to transform human health and sustainability. Since Flagship’s launch in 2000, the firm has applied a unique hypothesis-driven innovation process to originate and foster more than 100 scientific ventures, including: Moderna Therapeutics (NASDAQ: MRNA), Rubius Therapeutics (NASDAQ: RUBY), Indigo Agriculture, and Sana Biotechnology.
Position Summary
Generate is a data-driven and data-first biotech company that is productionizing computational protein design to deliver more effective therapeutics to patients at an unprecedented speed. Our generation platform is producing ever-increasing amounts of experimental data on the properties and functions of diverse in-silico generated and native proteins. We are seeking a talented and creative data scientist to maximize the utility of these data towards more effective generation methodologies and more potent medicines.
The successful candidate will build and apply new data analysis frameworks that will help guide our technology development process across the platform. This will involve a combination of statistical modeling, exploratory data-analysis, automated dashboarding, and use of our in-house machine-learning models. She/he will design, build, and deploy tools and reports that will be used by cross-functional teams of machine learning scientists, protein designers, and experimental scientists in a highly interdisciplinary setting.
Key responsibilities:
- Extract, transform and analyze large protein design experimental datasets to interpret how to improve our protein generation methodology
- Flexibly model and analyze multiple kinds of protein data including protein structure, sequence, and the results of biophysical and functional assays for interpretation in the context of project and platform goals
- Communicate key insights through presentations, visualizations, and reports to an interdisciplinary audience of protein scientists, protein designers, and machine learning scientist
- Contribute production-ready code to data analysis pipelines and deploy them into regular use
- Build dashboards and other automated tools to make protein generation results highly visible and accessible
Qualifications:
- MS in Computational Biology, Data Science, or a related field with demonstrated experience analyzing large biological datasets
- Proficiency in Python, data science stack, and database querying (panda, Jupyter Notebook, scikit-learn, DVC, Matplotlib, SQL, seaborn, datapane, etc.)
- Expertise in structural biology and published works in bioinformatic or computational biology journals are a plus
- Ability to work in a fast-paced environment and strong technical communication skills
- A self-starter attitude and willingness to dive into complicated data biological dataset challenges