What are the responsibilities and job description for the Research Scientist, Speech, Audio, & Video (PhD) position at Meta?

We are looking for a Research Scientist to join FAIR's AudioBox team within the FAIR Communication Pillar to study speech and audio generation with video conditioning.
Our Pillar performs foundational AI research across speech, audio, machine translation, video, and EMG.
We build AI-powered spoken language technology to make it faster and easier for people to build community, connect with others around the world, and to create compelling speech, audio and video generative content.
Our mission is to develop state of the art algorithms, and we focus on open research and scientific novelty.
We work in all aspects of AI for speech and audio processing, including speech recognition, speech synthesis, acoustic event detection and generation, and video conditioned speech and acoustic synthesis.
As a Research Scientist, you will help us develop innovative models and algorithms at the cutting edge of AI Research.
We are looking for a research scientist with a focus on speech and audio synthesis plus experience with ML research for video or audiovisual.
The ideal candidate should have a strong background in automatic speech recognition, speech synthesis, acoustic modeling and generation, speech and audio generation with video conditioning, general machine learning, and have a passion for speech technology and voice and audio interfaces.

Research Scientist, Speech, Audio, & Video (PhD) Responsibilities:

Perform research to tackle unsolved real-world problems and push state of the art in speech and audio synthesis with video conditioning.
Independently design and implement algorithms, train state of the art speech and audio synthesis models on large data, and evaluate their performance.

Minimum

Qualifications:

Currently has or is in the process of obtaining a PhD in the field of Artificial Intelligence, a related field, or equivalent practical experience.
Degree must be completed prior to joining Meta.
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
Research experience in one or more of these areas: speech recognition, speech synthesis, acoustic event modeling, speech generation with video conditioning, audiovisual understanding or generation, machine learning, deep learning, or related fields.
Experience working with machine learning libraries like Pytorch, Tensorflow, etc.
Knowledge of deep learning and neural networks.
Experience with scripting languages such as Python and shell scripts.
Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.

Preferred Qualifications:

Experience with developing scalable machine learning models in at least one of the following areas: automatic speech recognition, end-to-end ASR, unsupervised/semi-supervised training, on-device modeling, acoustic modeling, or relevant areas.
Experience with developing machine learning models for both server-based and embedded solutions.
Experience with large scale model training, implementing algorithms, and evaluating ASR systems.
Proven track record of achieving significant results as demonstrated by publications at leading workshops, journals or conferences such as ICASSP, INTERSPEECH, or similar.
Experience taking ideas from research to production.
Experience solving complex problems and comparing alternative solutions, tradeoffs, and diverse points of view to determine a path forward.
Experience working and communicating cross functionally in a team environment.

About Meta:

Meta builds technologies that help people connect, find communities, and grow businesses.
When Facebook launched in 2004, it changed the way people connect.
Apps like Messenger, Instagram and WhatsApp further empowered billions around the world.
Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology.
People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics.

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer.
We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.
We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law.
Meta participates in the E-Verify program in certain locations, as required by law.
Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process.
If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.

Location/Region: Menlo Park, California

Audio/Video Technician - Installer

Adirondack Audio & Video -

Queensbury, NY

View Job Details

Audio Video Install Technician

Front Row Audio Video -

Lawrence, KS

View Job Details

Audio Video Technician

Colorado Audio Video LLC -

Centennial, CO

View Job Details

Apply for this job

Receive alerts for other Research Scientist, Speech, Audio, & Video (PhD) job openings

Research Scientist, Speech, Audio, & Video (PhD)

What are the responsibilities and job description for the Research Scientist, Speech, Audio, & Video (PhD) position at Meta?