We Use Cookies!!!
We use cookies to ensure that we give you the best experience on our website. Read cookies policies.
The audio dataset consist of general conversations between native English people from India along with metadata and transcription.
Unscripted General Conversations
90 Speech Hours
July 2023
110
Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on Indian accents and dialects.
With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in India.
Speech Data:
This training dataset comprises 100 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 110 native English speakers from different part of India. This collaborative effort guarantees a balanced representation of Indian accents, dialects, and demographics, reducing biases and promoting inclusivity.
Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.
Metadata:
In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.
Transcription:
This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.
Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.
Updates and Customization:
We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.
If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.
License:
This audio dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.
Channel 1 | Channel 2 | Format |
---|---|---|
Female(22) | Female(21) | wav, json |
LABEL | START | END | CHANNEL | TRANSCRIPT |
---|---|---|---|---|
Speech | 3.274 | 4.674 | Speaker 1 | Hello Futurebee. |
Speech | 5.174 | 6.573 | Speaker 2 | Hello Futurebee. |
Noise | 5.599 | 5.974 | - | - |
Speech | 8.349 | 9.774 | Speaker 1 | Hi <PII>Sushmita</PII>. |
Speech | 10.298 | 14.022 | Speaker 1 | Did you catch the cricket match yesterday. It was incredible. |
Speech | 12.073 | 12.624 | Speaker 2 | (()) |
Speech | 16.117 | 19.643 | Speaker 2 | oh I missed it. What happened? Tell me all about it. |
Speech | 20.771 | 25.312 | Speaker 1 | Well it was a thrilling match between our home team and the rival. |
Speech | 26.469 | 27.893 | Speaker 1 | Our team battle |
Speech | 28.178 | 31.553 | Speaker 1 | first and set a challenging target of three hundred run. |
Speech | 32.063 | 39.137 | Speaker 1 | They started of really well with our opening batsman scoring quick run and building a solid foundation. |
Speech | 42.377 | 46.350 | Speaker 2 | That sounds promising. Did they manage to maintain the momentum |
Speech | 46.752 | 47.850 | Speaker 2 | through out the inning? |
Speech | 49.195 | 56.045 | Speaker 1 | Yes. They did initially. But the rival teams baller made a strong comeback in the middle over. |
Speech | 56.697 | 70.870 | Speaker 1 | Our batsman struggled a bit to rotate the strike and find boundary. However one of our middle order batsman played a spectacular inning, hitting some massive sixes and stabilizing the inning. |
Speech | 74.881 | 78.605 | Speaker 2 | That's great to hear. Did they eventually reach the target? |
Speech | 79.855 | 85.105 | Speaker 1 | They came close but unfortunately, they failed short by just ten runs. |
Speech | 85.513 | 92.063 | Speaker 1 | It was a nail biting finish with our team leading twelve run in the last over. |
Speech | 93.049 | 100.299 | Speaker 1 | Our lower order batsman fought hard but the rival teams baller balled a fantastic final over. |
Speech | 100.924 | 104.899 | Speaker 1 | (()) only one run and taking two crucial wicket. |
Speech | 108.382 | 109.533 | Speaker 2 | That must have |
Speech | 109.831 | 110.881 | Speaker 2 | been disappointing |
Speech | 111.090 | 112.992 | Speaker 2 | but it sounds like an intense match. |
Speech | 113.390 | 115.265 | Speaker 2 | How was the rivals team batting? |
Speech | 116.605 | 122.105 | Speaker 1 | The rival teams batsman started over aggressively scoring boundaries from the beginning. |
Speech | 123.049 | 127.924 | Speaker 1 | They maintain a good run rate and kept the required run rate under control. |
Speech | 128.258 | 132.036 | Speaker 1 | However our ballers made a strong comeback in the middle over |
Speech | 132.479 | 135.401 | Speaker 1 | taking some crucial wickets and building pressure. |
Speech | 138.722 | 141.169 | Speaker 2 | So it was a close contest till the end. |
Speech | 142.056 | 150.205 | Speaker 1 | Absolutely. The rival team leaded fifteen runs in the last over and it seem like they might chase (()) |
Speech | 150.431 | 160.131 | Speaker 1 | But our baller balled an exceptional over taking two wickets and considering only four run. In the end our team won the match by ten run. |
Noise | 162.074 | 163.924 | - | - |
Speech | 162.735 | 168.336 | Speaker 2 | Wow. What a comeback. It sounds like a memorable match. I wish I could have seen it live. |
Speech | 170.258 | 173.383 | Speaker 1 | Definitely, it was one of those matches |
Speech | 173.651 | 176.377 | Speaker 1 | where the momentum shifted back and forth |
Speech | 176.627 | 182.929 | Speaker 1 | The atmosphere in the stadium was electric and the crowd was on their feet through out the match. |
Speech | 183.442 | 187.491 | Speaker 1 | You should definitely catch the highlight. they are worth watching. |
TIME | TRANSCRIPT |
---|---|
3.274 4.674 | Hello Futurebee. |
5.174 6.573 | Hello Futurebee. |
5.599 5.974 | - |
8.349 9.774 | Hi <PII>Sushmita</PII>. |
10.298 14.022 | Did you catch the cricket match yesterday. It was incredible. |
12.073 12.624 | (()) |
16.117 19.643 | oh I missed it. What happened? Tell me all about it. |
20.771 25.312 | Well it was a thrilling match between our home team and the rival. |
26.469 27.893 | Our team battle |
28.178 31.553 | first and set a challenging target of three hundred run. |
32.063 39.137 | They started of really well with our opening batsman scoring quick run and building a solid foundation. |
42.377 46.350 | That sounds promising. Did they manage to maintain the momentum |
46.752 47.850 | through out the inning? |
49.195 56.045 | Yes. They did initially. But the rival teams baller made a strong comeback in the middle over. |
56.697 70.870 | Our batsman struggled a bit to rotate the strike and find boundary. However one of our middle order batsman played a spectacular inning, hitting some massive sixes and stabilizing the inning. |
74.881 78.605 | That's great to hear. Did they eventually reach the target? |
79.855 85.105 | They came close but unfortunately, they failed short by just ten runs. |
85.513 92.063 | It was a nail biting finish with our team leading twelve run in the last over. |
93.049 100.299 | Our lower order batsman fought hard but the rival teams baller balled a fantastic final over. |
100.924 104.899 | (()) only one run and taking two crucial wicket. |
108.382 109.533 | That must have |
109.831 110.881 | been disappointing |
111.090 112.992 | but it sounds like an intense match. |
113.390 115.265 | How was the rivals team batting? |
116.605 122.105 | The rival teams batsman started over aggressively scoring boundaries from the beginning. |
123.049 127.924 | They maintain a good run rate and kept the required run rate under control. |
128.258 132.036 | However our ballers made a strong comeback in the middle over |
132.479 135.401 | taking some crucial wickets and building pressure. |
138.722 141.169 | So it was a close contest till the end. |
142.056 150.205 | Absolutely. The rival team leaded fifteen runs in the last over and it seem like they might chase (()) |
150.431 160.131 | But our baller balled an exceptional over taking two wickets and considering only four run. In the end our team won the match by ten run. |
162.074 163.924 | - |
162.735 168.336 | Wow. What a comeback. It sounds like a memorable match. I wish I could have seen it live. |
170.258 173.383 | Definitely, it was one of those matches |
173.651 176.377 | where the momentum shifted back and forth |
176.627 182.929 | The atmosphere in the stadium was electric and the crowd was on their feet through out the match. |
183.442 187.491 | You should definitely catch the highlight. they are worth watching. |
English
en-In
India
Chandigarh,...more
M:55, F:45
18-70
Silent, Noisy
16 bit
wav
8khz
Dual separate channel
15-60 minutes
Explore Audio Data, Metadata and Transcription to get more clarity and hands on experience of this dataset.
Download Free Dataset
Contact Us