Bahasa Call Center Speech Dataset for Travel

This Bahasa speech dataset features real-world call center conversations from the Travel domain. With detailed metadata and accurate transcriptions, it’s designed to power ASR systems, voice AI, and conversational agents.

About this Off-the-shelf Speech Dataset

Introduction

This Bahasa Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 40 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Bahasa -speaking travelers.

Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

Speech Data

The dataset includes 40 hours of dual-channel audio recordings between native Bahasa speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

•Participant Diversity:

•

Speakers: 80 native Bahasa contributors from our verified pool.

•

Regions: Covering multiple Indonesia provinces to capture accent and dialectal variation.

•

Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).

•Recording Details:

•

Conversation Nature: Naturally flowing, spontaneous customer-agent calls.

•

Call Duration: Between 5 and 15 minutes per session.

•

Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.

•

Recording Environment: Captured in controlled, noise-free, echo-free settings.

Topic Diversity

Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

•Inbound Calls:

•Booking Assistance

•Destination Information

•Flight Delays or Cancellations

•Support for Disabled Passengers

•Health and Safety Travel Inquiries

•Lost or Delayed Luggage, and more

•Outbound Calls:

•Promotional Travel Offers

•Customer Feedback Surveys

•Booking Confirmations

•Flight Rescheduling Alerts

•Visa Expiry Notifications, and others

These scenarios help models understand and respond to diverse traveler needs in real-time.

Transcription

Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

•Transcription Includes:

•Speaker-Segmented Dialogues

•Time-Stamped Segments

•Non-speech Markers (e.g., pauses, coughs)

•High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

Metadata

Extensive metadata enriches each call and speaker for better filtering and AI training:

•

Participant Metadata: ID, age, gender, region, accent, and dialect.

•

Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

Usage and Applications

This dataset is ideal for a variety of AI use cases in the travel and tourism space:

•

ASR Systems: Train Bahasa speech-to-text engines for travel platforms.

•

Speech Analytics: Uncover customer insights and travel behavior patterns.

•

Chatbots & Voice Assistants: Develop Bahasa -speaking travel agents.

•

Sentiment Detection: Analyze customer tone for better service delivery.

•

Generative AI: Fine-tune LLMs for summarizing or responding to traveler requests.

Secure and Ethical Collection

•All data is collected via FutureBeeAI’s secure platform, “Yugo.”

•No personally identifiable information is captured.

•Compliant with data protection regulations and copyright-safe.

Updates and Customization

We regularly expand this dataset with fresh audio and provide custom options:

•Customization Options:

•

Environment: Silent, noisy, or varied real-world conditions on request.

•

Sample Rate: Adjustable from 8kHz to 48kHz.

•

Transcription: Custom formats and QA guidelines available.

License

This travel-focused Bahasa call center dataset is commercially licensed and ready for enterprise or research deployment.

Use Cases

Call Center Conversational AI

Use of speech data for Automatic Speech Recognition

ASR

Chatbot

Language Modelling

TTS

Speech Analytics

Dataset Sample(s)

Dataset Details

Language

Bahasa

Language code

Country

Indonesia

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Format

wav

Sample rate

8khz & 16khz

Channel

Stereo (dual-channel, separated speakers)

Audio file duration

5-15 minutes

Read the License Terms

Browse FAQs

Similar to Call Center Conversation Speech Datasets

Bengali (Bangladesh)

Bengali (Bangladesh) Travel CC Speech Data

Travel call center audio data in Bengali (Bangladesh)

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Thai (Thailand)

Thai Travel CC Speech Data

Travel call center audio data in Thai

30 Speech Hours

60 People

Call Center Conversational AI

ASR

English (New Zealand)

New Zealand Travel CC Speech Data

Travel call center audio data in New Zealand English.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Norwegian (Norway)

Norwegian Travel CC Speech Data

Travel call center audio data in Norwegian.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

View All

Bahasa Telecom CC Speech Data

Telecom call center audio data in Bahasa.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Bahasa (Indonesia)

Bahasa Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Bahasa.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Bahasa (Indonesia)

Bahasa BFSI CC Speech Data

BFSI call center audio data in Bahasa.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Bahasa (Indonesia)

Bahasa Retail & E-com CC Speech Data

Retail & E-commerce call center audio data in Bahasa.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Bahasa Call Center Speech Dataset for Travel

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Secure and Ethical Collection

Updates and Customization

License

Use Cases

Samples will be available soon!

Dataset Details

File Details

Bengali (Bangladesh) Travel CC Speech Data

Thai Travel CC Speech Data

New Zealand Travel CC Speech Data

Norwegian Travel CC Speech Data

Bahasa Telecom CC Speech Data

Bahasa Delivery & Lgc CC Speech Data

Bahasa BFSI CC Speech Data

Bahasa Retail & E-com CC Speech Data