Spanish (Spain) Call Center Speech Dataset for Delivery & Logistics

This Spanish (Spain) speech dataset features real-world call center conversations from the Delivery & Logistics domain. With detailed metadata and accurate transcriptions, it’s designed to power ASR systems, voice AI, and conversational agents.

About this Off-the-shelf Speech Dataset

Introduction

This Spanish Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

Speech Data

The dataset contains 30 hours of dual-channel call center recordings between native Spanish speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

•Participant Diversity:

•

Speakers: 60 native Spanish speakers from our verified contributor pool.

•

Regions: Multiple provinces of Spain for accent and dialect diversity.

•

Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.

•Recording Details:

•

Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.

•

Call Duration: 5 to 15 minutes on average.

•

Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.

•

Recording Environment: Captured in clean, noise-free, echo-free conditions.

Topic Diversity

This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

•Inbound Calls:

•Order Tracking

•Delivery Complaints

•Undeliverable Addresses

•Return Process Enquiries

•Delivery Method Selection

•Order Modifications, and more

•Outbound Calls:

•Delivery Confirmations

•Subscription Offer Calls

•Incorrect Address Follow-ups

•Missed Delivery Notifications

•Delivery Feedback Surveys

•Out-of-Stock Alerts, and others

This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

Transcription

All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

•Transcription Includes:

•Speaker-Segmented Dialogues

•Time-coded Segments

•Non-speech Tags (e.g., pauses, noise)

•High transcription accuracy with word error rate under 5% via dual-layer quality checks.

These transcriptions support fast, reliable model development for Spanish voice AI applications in the delivery sector.

Metadata

Detailed metadata is included for each participant and conversation:

•

Participant Metadata: ID, age, gender, region, accent, dialect.

•

Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

Usage and Applications

This dataset is ideal for a range of AI and NLP use cases in the delivery and logistics industry:

•

Automatic Speech Recognition (ASR): Build or fine-tune Spanish speech-to-text systems.

•

Speech Analytics: Gain insights from customer feedback and logistics-related interactions.

•

Voice Assistants & Chatbots: Enable automated support for deliveries, returns, and updates.

•

Sentiment Analysis: Detect frustration, urgency, or satisfaction in delivery-related calls.

•

Generative AI: Train Spanish generative models for summarization, call simulation, or support scripts.

Secure and Ethical Collection

•Data collected via FutureBeeAI’s secure platform, “Yugo,” under strict ethical standards.

•No personally identifiable information is included.

•Compliant with global data privacy regulations and copyright-free.

Updates and Customization

We regularly update this dataset with fresh audio and offer full customization:

•Customization Options:

•

Acoustic Conditions: Silent or noisy environments on request.

•

Sample Rate: Configurable between 8kHz and 48kHz.

•

Transcription Format: Custom guidelines or formatting accepted.

License

This Delivery and Logistics domain dataset is commercially licensed and ready for use in ASR, NLP, and voice automation projects in Spanish.

Use Cases

Call Center Conversational AI

Use of speech data for Automatic Speech Recognition

ASR

Chatbot

Language Modelling

TTS

Speech Analytics

Dataset Sample(s)

Dataset Details

Language

Spanish

Language code

Country

Spain

Accents

Castellano del Norte, Castellano del Sur ...moreEspañol de Asturias, Español de Galicia, Español de Valencia/Cataluña, Extremaduran (Suroeste de España), Islas Baleares, Islas Canarias, Leonés (León), Andaluz (Andalucía), Aragonés (Aragón)

Gender Distribution

M:60, F:40

Age Group

18-70 Years