English (US) General Conversation Speech Dataset

The audio dataset consist of general conversations between native English people from US along with metadata and transcription.

Category

Unscripted General Conversations

Total Volume

25 Speech Hours

Last updated

July 2023

Number of participants

45

English (USA) Voice dataset for Conversational AI
Download
Download Icon

About this Off-the-shelf Speech Dataset

Card Head Line

What's Included

Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on US accents and dialects.

With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in United States.

Speech Data:

This training dataset comprises 30 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 40 native English speakers from different states/provinces of United States. This collaborative effort guarantees a balanced representation of US accents, dialects, and demographics, reducing biases and promoting inclusivity.

Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.

Metadata:

In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.

The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.

Transcription:

This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.

Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.

Updates and Customization:

We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.

If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.

License:

This audio dataset, created by FutureBeeAI, is now available for commercial use.

Conclusion:

Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.

Use Cases

Use of speech data for Automatic Speech Recognition

ASR

Use of speech data in Conversational AI

Conversational AI

Use of speech data for Chatbot & voicebot creation

Chatbot

Use of speech data in Language Modeling

Language Modelling

Use of speech data in Text-into-speech

TTS

Speech data usecase in Speech Analytics

Speech Analytics

Dataset Sample(s)

Card Head Line
00:00

ATTRIBUTES

TRANSCRIPTION

TIME
TRANSCRIPT
0.626 - 1.597
Hello Futurebee.
1.895 - 3.028
Hello Futurebee.
6.586 - 7.107
Okay
6.751 - 12.791
[filler]so (()) you can tell me what kind of vacation things you want to do?
13.336 - 14.646
(()).
13.698 - 14.330
Yeah
15.649 - 20.942
[filler]when Veronica comes in, [filler] I was gonna start with (())
18.454 - 18.968
[filler]
22.295 - 26.100
Because I am pretty they are gonna get here first. I am pretty sure we are gonna see.
23.945 - 24.439
[filler]
26.663 - 29.669
(()) and then towards the end of the trip
31.001 - 34.554
there will be, there will be doing the all the thing once Veronica and grandma come.
32.310 - 32.490
-
32.902 - 33.279
-
35.261 - 37.088
I don't know. I just, I am more
37.502 - 38.962
interested in the (()) stuff.
38.277 - 38.813
Yeah sure
39.381 - 40.658
I don't know. I am kind of looking at
41.146 - 42.121
whatever I guess.
42.972 - 45.640
[filler]but right now I am looking at
45.405 - 46.658
Okay [filler]
46.158 - 49.161
Buses from Bangkok to Phuket.
50.317 - 54.066
I don't know if I want to go to Phuket or no we are not going to Phuket. We are going to go visit (()).
54.432 - 55.456
Where is Tulip?
56.393 - 62.920
Yeah we are going to, we are going to visit (()). [filler] let me, let me look up where he lives (()).
63.887 - 66.322
You know what, I can, do you want to message him?
68.069 - 69.123
Me message him?
69.387 - 70.697
Do you want me to message him?
70.887 - 72.290
Yeah, yeah go ahead and message him.
72.977 - 73.750
Because I will look very.
73.427 - 74.736
Okay I will find out where he lives.
75.501 - 75.899
Okay.
77.215 - 79.727
I will find out where he lives (()).
78.129 - 80.709
Yeah where we can, we can, we can go with him.
81.191 - 81.977
[filler]
84.197 - 86.146
I think if we go visit to
86.566 - 87.795
I think it will be
90.474 - 97.456
I don't know if there is going to be so many (()) things to do. I am not sure (()) like what (()) or kind of.
98.278 - 99.495
Hoping to do
100.266 - 105.938
like may be they are, because like Phuket is very very (()) but it is also pretty cute.
107.140 - 108.105
And like it has
109.007 - 110.736
like cute buildings and shops and stuff.
111.462 - 116.358
[filler]but if we go with the (()) you know probably cheaper because it is not a tourist area.
114.162 - 114.587
You could.
117.019 - 119.140
Here have been living there for
119.635 - 122.013
a couple of months already. So he can, you know
122.629 - 125.278
show us around. Also, we could get to visit our friend.
126.250 - 127.227
[filler]
128.639 - 129.554
and
129.973 - 132.127
its even on a small
133.943 - 134.729
counts
135.405 - 137.347
if they are on the water
137.782 - 139.294
or like near the islands.
140.566 - 145.985
I am pretty sure they still have like, little tourist things you can like rent [filler]
146.431 - 147.294
tourist (())
148.031 - 151.048
to go take it to the different islands because thats what we did
151.812 - 154.479
when we were (()) and (()) is really
154.905 - 157.229
small town too. It is not small place.
157.905 - 159.554
But we were still able to rent
160.413 - 161.359
like a tourist
162.163 - 162.859
tour boat
163.387 - 166.473
and go to the islands and (()) and stuff so
167.393 - 168.709
it's, it could be
170.020 - 174.794
even nicer because its not super touristy and we have a person we already know.
175.580 - 176.715
And
179.175 - 180.520
the stuff will probably be cheaper.

Dataset Details

Card Head Line

Language

English

Language code

en-us

Country

USA

Accents

Arizona, California ...more

Gender Distribution

M:55, F:45

Age Group

18-70

File Details

Card Head Line

Environment

Silent, Noisy

Bit Depth

16 bit

Format

wav

Sample rate

8khz

Channel

Dual separate channel

Audio file duration

15-60 minutes

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Contact Us
Prompt 2 Bg