Mexican Spanish Call Center Speech Dataset for Delivery & Logistics

This Mexican Spanish speech dataset features real-world call center conversations from the Delivery & Logistics domain. With detailed metadata and accurate transcriptions, it’s designed to power ASR systems, voice AI, and conversational agents.

About this Off-the-shelf Speech Dataset

Introduction

This Mexican Spanish Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

Speech Data

The dataset contains 30 hours of dual-channel call center recordings between native Mexican Spanish speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

•Participant Diversity:

•

Speakers: 60 native Mexican Spanish speakers from our verified contributor pool.

•

Regions: Multiple provinces of Mexico for accent and dialect diversity.

•

Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.

•Recording Details:

•

Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.

•

Call Duration: 5 to 15 minutes on average.

•

Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.

•

Recording Environment: Captured in clean, noise-free, echo-free conditions.

Topic Diversity

This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

•Inbound Calls:

•Order Tracking

•Delivery Complaints

•Undeliverable Addresses

•Return Process Enquiries

•Delivery Method Selection

•Order Modifications, and more

•Outbound Calls:

•Delivery Confirmations

•Subscription Offer Calls

•Incorrect Address Follow-ups

•Missed Delivery Notifications

•Delivery Feedback Surveys

•Out-of-Stock Alerts, and others

This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

Transcription

All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

•Transcription Includes:

•Speaker-Segmented Dialogues

•Time-coded Segments

•Non-speech Tags (e.g., pauses, noise)

•High transcription accuracy with word error rate under 5% via dual-layer quality checks.

These transcriptions support fast, reliable model development for Spanish voice AI applications in the delivery sector.

Metadata

Detailed metadata is included for each participant and conversation:

•

Participant Metadata: ID, age, gender, region, accent, dialect.

•

Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

Usage and Applications

This dataset is ideal for a range of AI and NLP use cases in the delivery and logistics industry:

•

Automatic Speech Recognition (ASR): Build or fine-tune Mexican Spanish speech-to-text systems.

•

Speech Analytics: Gain insights from customer feedback and logistics-related interactions.

•

Voice Assistants & Chatbots: Enable automated support for deliveries, returns, and updates.

•

Sentiment Analysis: Detect frustration, urgency, or satisfaction in delivery-related calls.

•

Generative AI: Train Spanish generative models for summarization, call simulation, or support scripts.

Secure and Ethical Collection

•Data collected via FutureBeeAI’s secure platform, “Yugo,” under strict ethical standards.

•No personally identifiable information is included.

•Compliant with global data privacy regulations and copyright-free.

Updates and Customization

We regularly update this dataset with fresh audio and offer full customization:

•Customization Options:

•

Acoustic Conditions: Silent or noisy environments on request.

•

Sample Rate: Configurable between 8kHz and 48kHz.

•

Transcription Format: Custom guidelines or formatting accepted.

License

This Delivery and Logistics domain dataset is commercially licensed and ready for use in ASR, NLP, and voice automation projects in Spanish.

Use Cases

Call Center Conversational AI

Use of speech data for Automatic Speech Recognition

ASR

Chatbot

Language Modelling

TTS

Speech Analytics

Dataset Sample(s)

Dataset Details

Language

Spanish

Language code

es-mx

Country

Mexico

Accents

Bajío, Chiapaneco ...moreCosteño, Norteño del este, Norteño del oeste, Occidental, Sureño Central, Altiplano, Bajacaliforniano

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Format

wav

Sample rate

8khz & 16khz

Channel

Stereo (dual-channel, separated speakers)

Audio file duration

5-15 minutes

Read the License Terms

Browse FAQs

Similar to Call Center Conversation Speech Datasets

Italian (Italy)

Italian Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Italian.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Hindi (India)

Hindi Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Hindi.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Russian (Russia)

Russian Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Russian.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Arabic (Algeria)

Algerian Arabic Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Algerian Arabic.

30 Speech Hours

60 People

ASR

Conversational AI

View All

Colombian Spanish Telecom CC Speech Data

Telecom call center audio data in Colombian Spanish.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Spanish (Mexico)

Mexican Spanish Travel CC Speech Data

Travel call center audio data in Mexican Spanish.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Spanish (USA)

US Spanish Real Estate CC Speech Data

Real Estate call center audio data in US Spanish

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Spanish (Colombia)

Colombian Spanish Retail & E-com CC Speech Data

Retail & E-commerce call center audio data in Colombian Spanish.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Mexican Spanish Call Center Speech Dataset for Delivery & Logistics

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Secure and Ethical Collection

Updates and Customization

License

Use Cases

Samples will be available soon!

Dataset Details

File Details

Italian Delivery & Lgc CC Speech Data

Hindi Delivery & Lgc CC Speech Data

Russian Delivery & Lgc CC Speech Data

Algerian Arabic Delivery & Lgc CC Speech Data

Colombian Spanish Telecom CC Speech Data

Mexican Spanish Travel CC Speech Data

US Spanish Real Estate CC Speech Data

Colombian Spanish Retail & E-com CC Speech Data