Data Curation Scientist

Save
You need to sign in or
create an account to save a job.

At Dialpad, we're a team of do-ers. A team that thinks outside the box and when that doesn't work, we reinvent it. We don't settle for the status quo and neither do the things we build. Led by the same minds behind Google Voice, we build products that get businesses talking—whether it's across the hall, street, or country.

With $70 million in funding from Google Ventures, Andreessen Horowitz, and other top VC’s along with engineers from companies like Microsoft and Google, every member of our team plays an essential role in creating a voice product that doesn’t just combine design and mobility but but works with you wherever productivity may strike.

 

Responsibilities

  • Design and own strategies and pipelines for acquiring high quality training data. Optimize the quality, latency and cost of data acquired by crowdsourcing data labelling or internal labellers.

  • Manage large quantities of text and audio data. Typical tasks include extracting samples from databases, writing scripts to trim and clean data, and making datasets available on cloud services.

  • Developing standards for text data. Typical tasks include creating processes to infer pronunciations for words, that spellings and capitalizations are consistent across data, and standardizing incoming data from human transcribers.

  • Managing human labellers. Typical tasks include writing instructions for labellers, directing data to the interface that labellers will use, and creating tests to ensure quality.

  • Interact with world-class speech recognition and NLP specialists to help them meet their model’s needs for labelled data.

Required

  • Masters or Ph.D. degree in technical or linguistic field required

  • 5+ years' experience in data management

  • 5+ years' experience in text processing

  • 5+ years using labelled data, in a machine learning context for example

  • 3+ years experience with labelling data using crowdsourcing

Preferred  

  • Excellent attention to detail

  • Creative, resourceful problem solver

  • Excellent data management skills with various platforms and languages

  • Comfortable using Python for data cleaning and management

  • Shell scripting skills

  • Strong SQL

  • Proven ability to handle big data

  • Fluency in English and excellent understanding of the English language from a phonetic, grammatical, and linguistic perspective

  • Some experience with machine learning

  • Bonus: Multiple spoken languages (particularly Spanish and Japanese)

  • Bonus: Advanced programming skills in other programming languages

  • Bonus: Data presentation and analysis skills

 

 

About Us

Joining our team means collaborating with people that aren’t just passionate about their work but about Argentine tango, musicals, sushi burritos, comic books - you name it. Because if you’re going to redefine the status quo, you need a group of people hungry to do more, to see more, and be more than where they started.

There is no idea too crazy and no task too small — we work together to make things we’re proud of.

Compensation & Equity 

Teamwork makes the dream work. We recognize that our dedicated team members are what make our success. That’s why we offer competitive salaries in addition to stock options.

Healthcare 

An apple a day keeps the doctor away - and it doesn’t hurt that we offer 100% paid Medical, Dental and Vision Plan employee coverage.

Reimbursements

We offer a monthly stipend to help cover your cell phone, home internet, and even gym membership costs.

Location, Location, Location

San Francisco <> Raleigh <> Vancouver <> Tokyo <> San Antonio <> San Jose. From coast to coast, our offices are nestled in active and growing downtown areas.