Canadian English Wake Words & Voice Command Speech Dataset

This dataset features high-quality audio recordings of wake words and voice commands spoken by native English speakers from Canada. Each recording is paired with detailed metadata and precise transcriptions, making it ideal for training and evaluating speech recognition and voice assistant models.

About this Off-the-shelf Speech Dataset

Introduction

The Canadian English Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.

Speech Data

This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:

•Wake words alone

•Wake words followed by command phrases

Participant Diversity

•

Speakers: 50 native Canadian English speakers from the FutureBeeAI community

•

Regions: Participants from various Canada provinces, ensuring broad coverage of accents and dialects

•

Demographics: Ages 18–70; 60% male and 40% female participants

Recording Details

•

Type: Scripted wake words and command phrases

•

Duration: 1 to 15 seconds per clip

•

Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

Dataset Diversity

•Wake Word Types

•

Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.

•

Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.

•

Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more

•Command Types by Use Case

•

Automobile: Play music, check directions, voice search, provide feedback, and more

•

Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more

•

Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.

•Recording Environments

•No background noise

•Background traffic noise

•People talking in the background

•Speaking Pace

•Normal speed

•Fast speed

This diversity ensures robust training for real-world voice assistant applications.

Metadata

Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.

•

Participant Metadata: Unique ID, age, gender, region, accent, dialect

•

Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

Use Cases & Applications

•

Voice Assistant Activation: Train models to accurately detect and trigger based on wake words

•

Smart Home Devices: Enable responsive voice control in smart appliances

•

Automotive Voice Control: Power voice-based commands for navigation, entertainment, and system control

•

Wearables: Enhance hands-free operation with precise wake word recognition

•

Consumer Electronics: Improve voice interactivity across TVs, IoT devices, and more

•

Generative AI Integration: Use wake words to trigger context-aware conversational AI systems

Data Security & Ethics

•Collected via FutureBeeAI’s proprietary Yugo platform

•Maintained in a secure and confidential environment

•Full participant consent ensured; no personally identifiable information included

•Compliant with ethical data collection standards

Customization Options

We offer continuous updates and flexible customization to suit your project needs:

•

Environmental Customization: Recordings in specific background conditions

•

Sampling Rate Options: Custom data at 8 kHz, 16 kHz, 44.1 kHz, or 48 kHz

•

Pace Adjustments: Slow, normal, or fast speech

•

Device-Specific Recording: Capture using specific brands or operating systems

•

Custom Wake Words/Commands: Record your custom prompts using our community network

License

This dataset is developed by FutureBeeAI and is available for commercial use.

Use Cases

Wake Word Detection

Command Recognition

Voice Assistant

Dataset Sample(s)

Dataset Details

Language

English

Language code

en-ca

Country

Canada

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Sample rate

16kHz

Channel

Monologue

Audio file duration

1 to 15 seconds

Read the License Terms

Browse FAQs

Similar to Wake Words & Voice Command Datasets

German (Germany)

German Wake Word & Command Audio Data

German audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

Spanish (Spain)

Spanish Wake Word & Command Audio Data

Spanish audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

Bengali (Bangladesh)

Bangladesh Bengali Wake Word & Command Audio Data

Bangladesh Bengali audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

Vietnamese (Vietnam)

Vietnamese Wake Word & Command Audio Data

Vietnamese audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

View All

US English Wake Word & Command Audio Data

US English audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

English (India)

Indian English Wake Word & Command Audio Data

Indian English audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

English (New Zealand)

New Zealand English Wake Word & Command Audio Data

New Zealand English audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

English (Philippines)

Philippines English Wake Word & Command Audio Data

Philippines English audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Canadian English Wake Words & Voice Command Speech Dataset

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Participant Diversity

Recording Details

Dataset Diversity

Metadata

Use Cases & Applications

Data Security & Ethics

Customization Options

License

Use Cases

Samples will be available soon!

Dataset Details

File Details

German Wake Word & Command Audio Data

Spanish Wake Word & Command Audio Data

Bangladesh Bengali Wake Word & Command Audio Data

Vietnamese Wake Word & Command Audio Data

US English Wake Word & Command Audio Data

Indian English Wake Word & Command Audio Data

New Zealand English Wake Word & Command Audio Data

Philippines English Wake Word & Command Audio Data