Argentine Spanish Wake Words & Voice Command Speech Dataset

This dataset features high-quality audio recordings of wake words and voice commands spoken by native Spanish speakers from Argentina. Each recording is paired with detailed metadata and precise transcriptions, making it ideal for training and evaluating speech recognition and voice assistant models.

Category

Wake Words & Command Recordings

Total Volume

20,000+ recordings

Last updated

July 2025

Number of participants

50+

Wake words & Command dataset for training & fine-tuning of voice assistants in Spanish (Argentina)

About this Off-the-shelf Speech Dataset

Card Head Line

Introduction

The Argentinians Spanish Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.

Speech Data

This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:

  • Wake words alone
  • Wake words followed by command phrases
  • Participant Diversity

  • Speakers: 50 native Argentinians Spanish speakers from the FutureBeeAI community
  • Regions: Participants from various Argentina provinces, ensuring broad coverage of accents and dialects
  • Demographics: Ages 18–70; 60% male and 40% female participants
  • Recording Details

  • Type: Scripted wake words and command phrases
  • Duration: 1 to 15 seconds per clip
  • Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz
  • Dataset Diversity

  • Wake Word Types
  • Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.
  • Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.
  • Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more
  • Command Types by Use Case
  • Automobile: Play music, check directions, voice search, provide feedback, and more
  • Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more
  • Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.
  • Recording Environments
  • No background noise
  • Background traffic noise
  • People talking in the background
  • Speaking Pace
  • Normal speed
  • Fast speed
  • This diversity ensures robust training for real-world voice assistant applications.

    Metadata

    Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.

  • Participant Metadata: Unique ID, age, gender, region, accent, dialect
  • Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format
  • Use Cases & Applications

  • Voice Assistant Activation: Train models to accurately detect and trigger based on wake words
  • Smart Home Devices: Enable responsive voice control in smart appliances
  • Automotive Voice Control: Power voice-based commands for navigation, entertainment, and system control
  • Wearables: Enhance hands-free operation with precise wake word recognition
  • Consumer Electronics: Improve voice interactivity across TVs, IoT devices, and more
  • Generative AI Integration: Use wake words to trigger context-aware conversational AI systems
  • Data Security & Ethics

  • Collected via FutureBeeAI’s proprietary Yugo platform
  • Maintained in a secure and confidential environment
  • Full participant consent ensured; no personally identifiable information included
  • Compliant with ethical data collection standards
  • Customization Options

    We offer continuous updates and flexible customization to suit your project needs:

  • Environmental Customization: Recordings in specific background conditions
  • Sampling Rate Options: Custom data at 8 kHz, 16 kHz, 44.1 kHz, or 48 kHz
  • Pace Adjustments: Slow, normal, or fast speech
  • Device-Specific Recording: Capture using specific brands or operating systems
  • Custom Wake Words/Commands: Record your custom prompts using our community network
  • License

    This dataset is developed by FutureBeeAI and is available for commercial use.

    Use Cases

    Voice Detection

    Wake Word Detection

    Command Recognition

    Command Recognition

    Voice Assistants

    Voice Assistant

    Dataset Sample(s)

    Card Head Line

    Dataset Details

    Card Head Line

    Language

    Spanish

    Language code

    es-ar

    Country

    Argentina

    Accents

    Rioplatense Castilian

    Gender Distribution

    M:60, F:40

    Age Group

    18-70 Years

    File Details

    Card Head Line

    Environment

    Silent, Noisy

    Bit Depth

    16 bit

    Sample rate

    16kHz

    Channel

    Monologue

    Audio file duration

    1 to 15 seconds

    Need datasets for a specific AI/ML use case?
    Don't worry, we've got you covered! 👍

    Contact Us
    Prompt 2 Bg