Bulgarian Call Center Speech Dataset for Delivery & Logistics

This Bulgarian speech dataset features real-world call center conversations from the Delivery & Logistics domain. With detailed metadata and accurate transcriptions, it’s designed to power ASR systems, voice AI, and conversational agents.

About this Off-the-shelf Speech Dataset

Introduction

This Bulgarian Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Bulgarian-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

Speech Data

The dataset contains 30 hours of dual-channel call center recordings between native Bulgarian speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

•Participant Diversity:

•

Speakers: 60 native Bulgarian speakers from our verified contributor pool.

•

Regions: Multiple provinces of Bulgaria for accent and dialect diversity.

•

Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.

•Recording Details:

•

Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.

•

Call Duration: 5 to 15 minutes on average.

•

Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.

•

Recording Environment: Captured in clean, noise-free, echo-free conditions.

Topic Diversity

This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

•Inbound Calls:

•Order Tracking

•Delivery Complaints

•Undeliverable Addresses

•Return Process Enquiries

•Delivery Method Selection

•Order Modifications, and more

•Outbound Calls:

•Delivery Confirmations

•Subscription Offer Calls

•Incorrect Address Follow-ups

•Missed Delivery Notifications

•Delivery Feedback Surveys

•Out-of-Stock Alerts, and others

This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

Transcription

All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

•Transcription Includes:

•Speaker-Segmented Dialogues

•Time-coded Segments

•Non-speech Tags (e.g., pauses, noise)

•High transcription accuracy with word error rate under 5% via dual-layer quality checks.

These transcriptions support fast, reliable model development for Bulgarian voice AI applications in the delivery sector.

Metadata

Detailed metadata is included for each participant and conversation:

•

Participant Metadata: ID, age, gender, region, accent, dialect.

•

Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

Usage and Applications

This dataset is ideal for a range of AI and NLP use cases in the delivery and logistics industry:

•

Automatic Speech Recognition (ASR): Build or fine-tune Bulgarian speech-to-text systems.

•

Speech Analytics: Gain insights from customer feedback and logistics-related interactions.

•

Voice Assistants & Chatbots: Enable automated support for deliveries, returns, and updates.

•

Sentiment Analysis: Detect frustration, urgency, or satisfaction in delivery-related calls.

•

Generative AI: Train Bulgarian generative models for summarization, call simulation, or support scripts.

Secure and Ethical Collection

•Data collected via FutureBeeAI’s secure platform, “Yugo,” under strict ethical standards.

•No personally identifiable information is included.

•Compliant with global data privacy regulations and copyright-free.

Updates and Customization

We regularly update this dataset with fresh audio and offer full customization:

•Customization Options:

•

Acoustic Conditions: Silent or noisy environments on request.

•

Sample Rate: Configurable between 8kHz and 48kHz.

•

Transcription Format: Custom guidelines or formatting accepted.

License

This Delivery and Logistics domain dataset is commercially licensed and ready for use in ASR, NLP, and voice automation projects in Bulgarian.

Use Cases

Call Center Conversational AI

Use of speech data for Automatic Speech Recognition

ASR

Chatbot

Language Modelling

TTS

Speech Analytics

Dataset Sample(s)

Dataset Details

Language

Bulgarian

Language code

Country

Bulgaria

Accents

Eastern Bulgarian, Western Bulgarian ...moreRhodopean, Shopski, Strandzha, Dobrudzha, Balkan, Pirin, Thracian, Rup/Danube

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Format

wav

Sample rate

8khz & 16khz

Channel

Stereo (dual-channel, separated speakers)

Audio file duration

5-15 minutes

Read the License Terms

Browse FAQs

Similar to Call Center Conversation Speech Datasets

Romanian (Romania)

Romanian Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Romanian

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Korean (South Korea)

Korean Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Korean.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

English (UK)

British English Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in British English.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Ukrainian (Ukraine)

Ukrainian Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Ukrainian.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

View All

Bulgarian Real Estate CC Speech Data

Real Estate call center audio data in Bulgarian.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Bulgarian (Bulgaria)

Bulgarian BFSI CC Speech Data

BFSI call center audio data in Bulgarian.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Bulgarian (Bulgaria)

Bulgarian Healthcare CC Speech Data

Healthcare call center audio data in Bulgarian.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Bulgarian (Bulgaria)

Bulgarian Retail & E-com CC Speech Data

Retail & E-commerce call center audio data in Bulgarian.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Bulgarian Call Center Speech Dataset for Delivery & Logistics

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Secure and Ethical Collection

Updates and Customization

License

Use Cases

Samples will be available soon!

Dataset Details

File Details

Romanian Delivery & Lgc CC Speech Data

Korean Delivery & Lgc CC Speech Data

British English Delivery & Lgc CC Speech Data

Ukrainian Delivery & Lgc CC Speech Data

Bulgarian Real Estate CC Speech Data

Bulgarian BFSI CC Speech Data

Bulgarian Healthcare CC Speech Data

Bulgarian Retail & E-com CC Speech Data