Back

Indian Language Call Center Voice Dataset for a Renowned NLP Business


Leading NLP business wants to make their speech assistant IVR solution available to help a variety of Indian language-speaking customers as well. This company has previously developed voice bots in the call center sector to assist customers over a call in various native languages. Our assistance includes a total of 250 hours of speech data in five distinct languages.

MainImgBackground Use of speech data for automated speech recognition system
Lines

Overview

This company's NLP solutions are well known in the industry. This AI business specializes in the speech sector and offers a range of services in the areas of voice bots, natural language processing, and automatic speech recognition. With approximately 90% accuracy, their voice bot solutions are now accessible in English and a few other languages. However, they now aim to develop their AI model to also understand Indian languages.

The Challenge

The customer has extremely precise needs for 250 hours of call center speech data in Tamil, Hindi, Malayalam, Marathi, and Gujarati from a variety of industries, including travel, retail and e-Commerce, delivery and logistics, BFSI, and healthcare, with each conversation lasting 10–15 minutes. Various service/issue scenarios relevant to that industry should be covered in this call, along with both positive and unfavorable client experiences. Additionally, the client requests data on various speech accents in every language. This means we have to arrange the data from different states and locations of the native language-speaking places.

Use of call center data in natural language processing
Speech data for voice bot

The Solution

We are competent to handle this request thanks to our expertise working on comparable problems. With our readily available, off-the-shelf voice data in practically all languages, we first assisted clients. Clients were able to test the data rapidly and conduct some preliminary testing to complete the requirement using OTS speech data with transcription. Clients want to acquire custom data for the remaining batch, therefore we involve our global community in that. In a 3 week period with one layer of QA, we acquired 50 hours of speech data in each Gujarati, Hindi, Tamil, Malayalam, and Marathi.

We gave all speech data audio files in individual channels as well as composite channels, together with user-specific metadata like age, gender, location, and accent of the speaker, to make it more helpful. All due to our in-house developed YUGO voice data collection tool.

The Results

250 Hrs

Total Speech Hours

5

Languages

21

Collection Time

Services Used

Service

Custom Data collection

[Unscripted spontaneous call center conversation]

With a global network of 10,000+ people from 30+ countries that supports 50+ languages, you can collect unbiased and structured voice data of any kind.

Explore now

Know More
Service

OTS datasets

[speech datasets]

Get a variety of OTS voice datasets with transcription in more than 50 languages and from more than ten industries.  

 

Explore now

Know More
Service

Custom Data collection

[Unscripted spontaneous call center conversation]

With a global network of 10,000+ people from 30+ countries that supports 50+ languages, you can collect unbiased and structured voice data of any kind.

Explore now

Know More
Service

OTS datasets

[speech datasets]

Get a variety of OTS voice datasets with transcription in more than 50 languages and from more than ten industries.  

 

Explore now

Know More

Start your AI/ML model creation journey with FutureBeeAI!

Prompt Contact Arrow