Copy of Machine Learning Researcher, Audio

Other Jobs To Apply

<h1>Machine Learning Researcher, Audio</h1><p style="min-height:1.5em"><strong>Location:</strong> San Francisco, CA or Remote (US)</p><p style="min-height:1.5em"></p><p style="min-height:1.5em"></p><h2>About Bland</h2><p style="min-height:1.5em">At Bland.com, our mission is to empower enterprises to build AI phone agents at scale. Based in San Francisco, we are a fast-growing team reimagining how customers interact with businesses through voice. We have raised $65 million from leading Silicon Valley investors, including Emergence Capital, Scale Venture Partners, Y Combinator, and founders of Twilio, Affirm, and ElevenLabs.</p><p style="min-height:1.5em"></p><p style="min-height:1.5em">Voice is quickly becoming the primary interface between businesses and their customers. We are building the models and infrastructure that make those interactions feel natural, reliable, and genuinely human.</p><p style="min-height:1.5em"></p><p style="min-height:1.5em"></p><h2>The Role: Machine Learning Researcher, Audio</h2><p style="min-height:1.5em">As a Machine Learning Researcher at Bland, you'll be working on foundational research and development across the core components of our voice stack: speech-to-text, large language models, neural audio codecs, and text-to-speech. Your work will define how our agents understand, reason, and speak in real time at enterprise scale.</p><p style="min-height:1.5em"></p><p style="min-height:1.5em">This is not a narrow research role. You will take ideas from theory to large-scale training to production inference systems serving millions of calls per day. You will design new modeling approaches, validate them with rigorous experimentation, and collaborate with engineering teams to deploy them into real customer environments.</p><p style="min-height:1.5em"></p><p style="min-height:1.5em"></p><h2>What You Will Do</h2><p style="min-height:1.5em"><strong>Build and Scale Next-Generation TTS Systems</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Design and train large scale text-to-speech models capable of expressive, controllable, human-sounding output.</p></li><li><p style="min-height:1.5em">Develop neural audio codec-based TTS architectures for efficient, high-fidelity generation.</p></li><li><p style="min-height:1.5em">Improve prosody modeling, question inflection, emotional expression, and multi-speaker robustness.</p></li><li><p style="min-height:1.5em">Optimize for real-time, low-latency inference in production.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"><strong>Advance Speech-to-Text Modeling</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Build and fine-tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching.</p></li><li><p style="min-height:1.5em">Leverage self-supervised pretraining and large-scale weak supervision.</p></li><li><p style="min-height:1.5em">Improve transcription accuracy for real-world enterprise scenarios, including structured extraction and conversational nuance.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"><strong>Pioneer Neural Audio Codecs</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss.</p></li><li><p style="min-height:1.5em">Explore discrete and continuous latent representations for scalable speech modeling.</p></li><li><p style="min-height:1.5em">Design codec architectures that enable downstream generative modeling and controllable synthesis.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"><strong>Develop Scalable Training Pipelines</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Curate and process massive audio datasets across languages, speakers, and environments.</p></li><li><p style="min-height:1.5em">Design staged training curricula and data filtering strategies.</p></li><li><p style="min-height:1.5em">Scale training across distributed GPU clusters focusing on cost, throughput, and reliability.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"><strong>Run Rigorous Experiments</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Design ablation studies that isolate the impact of architectural changes.</p></li><li><p style="min-height:1.5em">Measure improvements using both objective metrics and perceptual evaluations.</p></li><li><p style="min-height:1.5em">Validate ideas quickly through focused experiments that confirm or eliminate hypotheses.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"></p><h2>What Makes You a Great Fit</h2><p style="min-height:1.5em"><strong>Deep Research Foundations</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Experience with self-supervised learning, multimodal modeling, or generative modeling.</p></li><li><p style="min-height:1.5em">Ability to derive new formulations and implement them efficiently.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"><strong>Expertise in Voice Modeling</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Hands-on experience building or scaling TTS, STT, or neural audio codec systems.</p></li><li><p style="min-height:1.5em">Familiarity with large scale speech datasets and real-world audio variability.</p></li><li><p style="min-height:1.5em">Strong intuition for audio quality, prosody, and conversational dynamics.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"><strong>Systems and Hardware Awareness</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Experience training and serving large models on modern accelerators.</p></li><li><p style="min-height:1.5em">Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency.</p></li><li><p style="min-height:1.5em">Understanding of real-time constraints in telephony or streaming environments.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"><strong>Experimental Rigor</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Track record of designing controlled experiments and meaningful ablations.</p></li><li><p style="min-height:1.5em">Comfortable working with both offline benchmarks and live production metrics.</p></li><li><p style="min-height:1.5em">Ability to move quickly from hypothesis to validation.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"><strong>Builder Mentality</strong></p><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Comfortable in fast-moving startup environments.</p></li><li><p style="min-height:1.5em">Strong ownership mindset from research through deployment.</p></li><li><p style="min-height:1.5em">Excited by ambiguous, unsolved problems.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"></p><h2>How You Show Up</h2><ul style="min-height:1.5em"><li><p style="min-height:1.5em">You treat unsolved problems as opportunities to invent new paradigms.</p></li><li><p style="min-height:1.5em">You identify the single experiment that can validate an idea in days, not months.</p></li><li><p style="min-height:1.5em">You measure everything and let data drive decisions.</p></li><li><p style="min-height:1.5em">You are obsessed with making voice agents sound truly human.</p></li><li><p style="min-height:1.5em">You use AI tools aggressively to amplify your own impact and accelerate research cycles.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"></p><h2>Bonus Points</h2><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Experience with large scale distributed training.</p></li><li><p style="min-height:1.5em">Research publications or open source contributions in speech or language AI.</p></li><li><p style="min-height:1.5em">Background in real-time speech systems or telephony.</p></li><li><p style="min-height:1.5em">PhD in ML, AI, or a related field, or equivalent research impact.</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em"></p><h2>Benefits and Compensation</h2><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Healthcare, dental, vision, all the good stuff</p></li><li><p style="min-height:1.5em">Meaningful equity in a fast-growing company</p></li><li><p style="min-height:1.5em">Every tool you need to succeed</p></li><li><p style="min-height:1.5em">Beautiful office in Jackson Square, SF with rooftop views</p></li><li><p style="min-height:1.5em">Competitive salary: $160,000 to $250,000</p></li></ul><p style="min-height:1.5em"></p><p style="min-height:1.5em">If you are energized by building and scaling TTS models, pioneering neural audio codecs, and pushing the boundaries of speech-to-text systems, we would love to hear from you.</p>

Back to blog

Common Interview Questions And Answers

1. HOW DO YOU PLAN YOUR DAY?

This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

5. HOW DO YOU PROCESS INFORMATION?

Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

8. HOW TO PRIORITIZE WORK?

The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...