Algerian Arabic Wake Words & Voice Command Speech Dataset

This dataset features high-quality audio recordings of wake words and voice commands spoken by native Arabic speakers from Algeria. Each recording is paired with detailed metadata and precise transcriptions, making it ideal for training and evaluating speech recognition and voice assistant models.

About this Off-the-shelf Speech Dataset

Introduction

The Algerian Arabic Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.

Speech Data

This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:

•Wake words alone

•Wake words followed by command phrases

Participant Diversity

•

Speakers: 50 native Algerian Arabic speakers from the FutureBeeAI community

•

Regions: Participants from various Algeria provinces, ensuring broad coverage of accents and dialects

•

Demographics: Ages 18–70; 60% male and 40% female participants

Recording Details

•

Type: Scripted wake words and command phrases

•

Duration: 1 to 15 seconds per clip

•

Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

Dataset Diversity

•Wake Word Types

•

Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.

•

Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.

•

Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more

•Command Types by Use Case

•

Automobile: Play music, check directions, voice search, provide feedback, and more

•

Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more

•

Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.

•Recording Environments

•No background noise

•Background traffic noise

•People talking in the background

•Speaking Pace

•Normal speed

•Fast speed

This diversity ensures robust training for real-world voice assistant applications.

Metadata

Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.

•

Participant Metadata: Unique ID, age, gender, region, accent, dialect

•

Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

Use Cases & Applications

•

Voice Assistant Activation: Train models to accurately detect and trigger based on wake words

•

Smart Home Devices: Enable responsive voice control in smart appliances

•

Automotive Voice Control: Power voice-based commands for navigation, entertainment, and system control

•

Wearables: Enhance hands-free operation with precise wake word recognition

•

Consumer Electronics: Improve voice interactivity across TVs, IoT devices, and more

•

Generative AI Integration: Use wake words to trigger context-aware conversational AI systems

Data Security & Ethics

•Collected via FutureBeeAI’s proprietary Yugo platform

•Maintained in a secure and confidential environment

•Full participant consent ensured; no personally identifiable information included

•Compliant with ethical data collection standards

Customization Options

We offer continuous updates and flexible customization to suit your project needs:

•

Environmental Customization: Recordings in specific background conditions

•

Sampling Rate Options: Custom data at 8 kHz, 16 kHz, 44.1 kHz, or 48 kHz

•

Pace Adjustments: Slow, normal, or fast speech

•

Device-Specific Recording: Capture using specific brands or operating systems

•

Custom Wake Words/Commands: Record your custom prompts using our community network

License

This dataset is developed by FutureBeeAI and is available for commercial use.

Use Cases

Wake Word Detection

Command Recognition

Voice Assistant

Dataset Sample(s)

Dataset Details

Language

Arabic

Language code

ar-dz

Country

Algeria

Accents

Eastern Hilal, Central Hilal ...moreMâqil Accents

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Sample rate

16kHz

Channel

Monologue

Audio file duration

1 to 15 seconds

Read the License Terms

Browse FAQs

Similar to Wake Words & Voice Command Datasets

German (Germany)

German Wake Word & Command Audio Data

German audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

Thai (Thailand)

Thai Wake Word & Command Audio Data

Thai audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

English (Australia)

Australian English Wake Word & Command Audio Data

Australian English audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

English (Canada)

Canadian English Wake Word & Command Audio Data

Canadian English audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

View All

Saudi Arabian Arabic Wake Word & Command Audio Data

Saudi Arabian Arabic audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

Arabic (Egypt)

Egyptian Arabic Wake Word & Command Audio Data

Egyptian Arabic audio dataset featuring wake words and short commands.

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Algerian Arabic Wake Words & Voice Command Speech Dataset

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Participant Diversity

Recording Details

Dataset Diversity

Metadata

Use Cases & Applications

Data Security & Ethics

Customization Options

License

Use Cases

Samples will be available soon!

Dataset Details

File Details

German Wake Word & Command Audio Data

Thai Wake Word & Command Audio Data

Australian English Wake Word & Command Audio Data

Canadian English Wake Word & Command Audio Data

Saudi Arabian Arabic Wake Word & Command Audio Data

Egyptian Arabic Wake Word & Command Audio Data