English (India) General Conversation Speech Dataset

The audio dataset consist of general conversations between native English people from India along with metadata and transcription.

Category

Unscripted General Conversations

Total Volume

90 Speech Hours

Last updated

July 2023

Number of participants

110

English (India) Speech data for AI
Download
Download Icon

About this Off-the-shelf Speech Dataset

Card Head Line

What's Included

Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on Indian accents and dialects.

With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in India.

Speech Data:

This training dataset comprises 100 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 110 native English speakers from different part of India. This collaborative effort guarantees a balanced representation of Indian accents, dialects, and demographics, reducing biases and promoting inclusivity.

Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.

Metadata:

In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.

The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.

Transcription:

This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.

Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.

Updates and Customization:

We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.

If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.

License:

This audio dataset, created by FutureBeeAI, is now available for commercial use.

Conclusion:

Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.

Use Cases

Use of speech data for Automatic Speech Recognition

ASR

Use of speech data in Conversational AI

Conversational AI

Use of speech data for Chatbot & voicebot creation

Chatbot

Use of speech data in Language Modeling

Language Modelling

Use of speech data in Text-into-speech

TTS

Speech data usecase in Speech Analytics

Speech Analytics

Dataset Sample(s)

Card Head Line
00:00

ATTRIBUTES

TRANSCRIPTION

TIME
TRANSCRIPT
3.274 - 4.674
Hello Futurebee.
5.174 - 6.573
Hello Futurebee.
5.599 - 5.974
-
8.349 - 9.774
Hi <PII>Sushmita</PII>.
10.298 - 14.022
Did you catch the cricket match yesterday. It was incredible.
12.073 - 12.624
(())
16.117 - 19.643
oh I missed it. What happened? Tell me all about it.
20.771 - 25.312
Well it was a thrilling match between our home team and the rival.
26.469 - 27.893
Our team battle
28.178 - 31.553
first and set a challenging target of three hundred run.
32.063 - 39.137
They started of really well with our opening batsman scoring quick run and building a solid foundation.
42.377 - 46.350
That sounds promising. Did they manage to maintain the momentum
46.752 - 47.850
through out the inning?
49.195 - 56.045
Yes. They did initially. But the rival teams baller made a strong comeback in the middle over.
56.697 - 70.870
Our batsman struggled a bit to rotate the strike and find boundary. However one of our middle order batsman played a spectacular inning, hitting some massive sixes and stabilizing the inning.
74.881 - 78.605
That's great to hear. Did they eventually reach the target?
79.855 - 85.105
They came close but unfortunately, they failed short by just ten runs.
85.513 - 92.063
It was a nail biting finish with our team leading twelve run in the last over.
93.049 - 100.299
Our lower order batsman fought hard but the rival teams baller balled a fantastic final over.
100.924 - 104.899
(()) only one run and taking two crucial wicket.
108.382 - 109.533
That must have
109.831 - 110.881
been disappointing
111.090 - 112.992
but it sounds like an intense match.
113.390 - 115.265
How was the rivals team batting?
116.605 - 122.105
The rival teams batsman started over aggressively scoring boundaries from the beginning.
123.049 - 127.924
They maintain a good run rate and kept the required run rate under control.
128.258 - 132.036
However our ballers made a strong comeback in the middle over
132.479 - 135.401
taking some crucial wickets and building pressure.
138.722 - 141.169
So it was a close contest till the end.
142.056 - 150.205
Absolutely. The rival team leaded fifteen runs in the last over and it seem like they might chase (())
150.431 - 160.131
But our baller balled an exceptional over taking two wickets and considering only four run. In the end our team won the match by ten run.
162.074 - 163.924
-
162.735 - 168.336
Wow. What a comeback. It sounds like a memorable match. I wish I could have seen it live.
170.258 - 173.383
Definitely, it was one of those matches
173.651 - 176.377
where the momentum shifted back and forth
176.627 - 182.929
The atmosphere in the stadium was electric and the crowd was on their feet through out the match.
183.442 - 187.491
You should definitely catch the highlight. they are worth watching.

Dataset Details

Card Head Line

Language

English

Language code

en-In

Country

India

Accents

Chandigarh, Chhattisgarh ...more

Gender Distribution

M:55, F:45

Age Group

18-70

File Details

Card Head Line

Environment

Silent, Noisy

Bit Depth

16 bit

Format

wav

Sample rate

8khz

Channel

Dual separate channel

Audio file duration

15-60 minutes

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Contact Us
Prompt 2 Bg