Data Curation Scientist

Neural networks and machine learning models need data! If you love clean, beautiful data, finding creative solutions (there’s no handbook) and owning the strategy for data labelling, read on!

At Talk IQ our aim is to help businesses truly understand and react to what their customers are saying. To do this, we’ve built a best-in-class proprietary enterprise speech recognition (ESR) and natural language processing (NLP) capabilities to enable client facing teams to understand and act on key moments with customers, including recognizing purchase intent, handling objections, responding to competitors, pricing, building rapport, closing, implementation, troubleshooting, renewal, and more. All these neural networks are hungry for more data, and this is where you come in!

As Data Curation Scientist:

What You’ll Do - Function

  • Design and own strategies and pipelines for acquiring high quality training data. Optimize the quality, latency and cost of data acquired by crowdsourcing data labelling or internal labellers.

  • Manage large quantities of text and audio data. Typical tasks include extracting samples from databases, writing scripts to trim and clean data, and making datasets available on cloud services.

  • Developing standards for text data. Typical tasks include creating processes to infer pronunciations for words, that spellings and capitalizations are consistent across data, and standardizing incoming data from human transcribers.

  • Managing human labellers. Typical tasks include writing instructions for labellers, directing data to the interface that labellers will use, and creating tests to ensure quality.

  • Interact with world-class speech recognition and NLP specialists to help them meet their model’s needs for labelled data.

What You Bring - Experience

  • Masters or Ph.D. degree in technical or linguistic field required

  • 5+ years' experience in data management

  • 5+ years' experience in text processing

  • 5+ years using labelled data, in a machine learning context for example

  • 3+ years experience with labelling data using crowdsourcing

What You Have - Skill

  • Excellent attention to detail

  • Creative, resourceful problem solver

  • Excellent data management skills with various platforms and languages

  • Comfortable using Python for data cleaning and management

  • Shell scripting skills

  • Strong SQL

  • Proven ability to handle big data

  • Fluency in English and excellent understanding of the English language from a phonetic, grammatical, and linguistic perspective

  • Some experience with machine learning

  • Bonus: Multiple spoken languages (particularly Spanish and Japanese)

  • Bonus: Advanced programming skills in other programming languages

  • Bonus: Data presentation and analysis skills

Talk IQ is partnering with Terminal to grow and scale in the Canadian market.

At Terminal, we identify emerging tech hubs around the globe, and connect the top engineers with the most compelling companies. We provide complete operations and services to give companies all the benefits of a new office without any of the hassle.

We are focused on building a diverse and inclusive workforce. Terminal is an Equal Opportunity Employer and considers applicants for employment without regard to race, colour, religion, sex, orientation, national origin, age, disability, genetics or any other basis forbidden under federal, provincial, or local law.

We thank all applicants for their interest.