Bahasa Scripted Monologue Speech Dataset for Travel Domain

The audio dataset comprises scripted monologue speech data in the Travel domain, featuring native Bahasa speakers from Indonesia. It includes speech data, detailed metadata, and accurate transcriptions.

About this Off-the-shelf Speech Dataset

Introduction

Welcome to the Algerian Arabic Scripted Monologue Speech Dataset for the Travel domain, a carefully constructed resource created to support the development of Arabic speech recognition technologies, particularly for applications in travel, tourism, and customer service automation.

Speech Data

This training dataset features 6,000+ high-quality scripted prompt recordings in Algerian Arabic, crafted to simulate real-world Travel industry conversations. It’s ideal for building robust ASR systems, virtual assistants, and customer interaction tools.

•Participant Diversity

•

Speakers: 60 native Algerian Arabic speakers.

•

Geographic Coverage: Participants from multiple regions across Algeria to ensure rich diversity in dialects and accents.

•

Demographics: Age range from 18 to 70 years, with a gender ratio of approximately 60% male and 40% female.

•Recording Details

•

Prompt Type: Scripted monologue-style prompts.

•

Duration: Each audio sample ranges from 5 to 30 seconds.

•

Audio Format: WAV files with mono channels, 16-bit depth, and 8 kHz / 16 kHz sample rates.

•

Environment: Clean, quiet, echo-free spaces to ensure high-quality recordings.

Topic Coverage

The dataset includes a wide spectrum of travel-related interactions to reflect diverse real-world scenarios:

•Booking and reservation dialogues

•Customer support and general inquiries

•Destination-specific guidance

•Technical and login help

•Promotional offers and travel deals

•Service availability and policy information

•Domain-specific statements

Context Elements

To boost contextual realism, the scripted prompts integrate frequently encountered travel terms and variables:

•

Names: Common Algeria male and female names

•

Addresses: Regional address formats and locality names

•

Dates & Times: Booking dates, travel periods, and time-based interactions

•

Destinations: Mention of cities, countries, airports, and tourist landmarks

•

Prices & Numbers: Cost of flights, hotel rates, promotional discounts, etc.

•

Booking & Confirmation Codes: Typical ticketing and travel identifiers

Transcription

Every audio file is paired with a verbatim transcription in .TXT format.

•

Consistency: Each transcript matches its corresponding audio file exactly.

•

Accuracy: Transcriptions are reviewed and verified by native Algerian Arabic speakers.

•

Usability: File names are synced across audio and text for easy integration.

Metadata

Each audio file is enriched with detailed metadata to support advanced analytics and filtering:

•

Participant Metadata: Unique ID, age, gender, region/state, dialect

•

Recording Metadata: Transcript text, device used, bit depth, sample rate, and environmental settings

Applications & Use Cases

This dataset is a powerful resource for a wide range of voice AI and NLP applications:

•

Automatic Speech Recognition (ASR): Train and fine-tune Arabic ASR systems tailored to the Travel domain

•

Voice Synthesis: Enable realistic synthetic voice generation for travel agents and guides

•

Chatbots & Virtual Assistants: Build intelligent travel booking and support bots

•

Named Entity Recognition (NER): Train models to extract travel-specific entities like dates, destinations, and prices

•

Sentiment & Intent Analysis: Analyze customer tone and feedback to improve service quality

•

Multilingual NLP: Support cross-dialect and regional generalization in Arabic

Secure & Ethical Collection

All data collection and transcription were conducted using FutureBeeAI’s proprietary platform, Yugo, under strict ethical and security protocols:

•The entire dataset creation process was securely managed end-to-end

•All participants gave informed consent

•No personally identifiable information (PII) is included in any form

•The dataset is compliant, anonymized, and safe for commercial use

License

This dataset is the intellectual property of FutureBeeAI and is available for commercial licensing across research and product development use cases.

Use Cases

Use of scripted speech monologues datasets for Automatic Speech Recognition

ASR

Conversational AI

Chatbot

Use of scripted speech monologues datasets for TTS

TTS

Speech Analytics

Mobile Speech

Dataset Sample(s)

Dataset Details

Language

Bahasa

Language code

Country

Indonesia

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent

Bit Depth

16 bit

Sample rate

8KHz & 16KHz

Channel

Mono

Audio file duration

5 to 30 seconds

Read the License Terms

Browse FAQs

Similar to Industry Specific Scripted Monologue Speech Datasets

Spanish (Spain)

Spanish Travel Scripted Monologue Speech Data

Audio recordings of scripted prompts in Spanish language for Travel domain.

6000+ prompts

60+ people

ASR

Conversational AI

Bulgarian (Bulgaria)

Bulgarian Travel Scripted Monologue Speech Data

Audio recordings of scripted prompts in Bulgarian language for Travel domain.

6000+ prompts

60+ people

ASR

Conversational AI

Urdu (Pakistan)

Urdu Travel Scripted Monologue Speech Data

Audio recordings of scripted prompts in Urdu language for Travel domain.

6000+ prompts

60+ people

ASR

Conversational AI

Finnish (Finland)

Finnish Travel Scripted Monologue Speech Data

Audio recordings of scripted prompts in Finnish Langauge for Travel domain.

6000+ prompts

60+ people

ASR

Conversational AI

View All

Bahasa Delivery domain Monologue Data

Recordings of scripted prompts in Bahasa language for Delivery & Logistics.

6000+ prompts

60+ people

ASR

Conversational AI

Bahasa (Indonesia)

Bahasa Real Estate Scripted Monologue Speech Data

Audio recordings of scripted prompts in Bahasa language for Real Estate domain.

6000+ prompts

60+ people

ASR

Conversational AI

Bahasa (Indonesia)

Bahasa Healthcare Monologue Speech Data

Audio recordings of scripted prompts in Bahasa language for Healthcare domain.

6000+ prompts

60+ people

ASR

Conversational AI

Bahasa (Indonesia)

Bahasa BFSI Scripted Monologue Speech Data

Audio recordings of scripted prompts in Bahasa language for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Bahasa Scripted Monologue Speech Dataset for Travel Domain

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic Coverage

Context Elements

Transcription

Metadata

Applications & Use Cases

Secure & Ethical Collection

License

Use Cases

Samples will be available soon!

Dataset Details

File Details

Spanish Travel Scripted Monologue Speech Data

Bulgarian Travel Scripted Monologue Speech Data

Urdu Travel Scripted Monologue Speech Data

Finnish Travel Scripted Monologue Speech Data

Bahasa Delivery domain Monologue Data

Bahasa Real Estate Scripted Monologue Speech Data

Bahasa Healthcare Monologue Speech Data

Bahasa BFSI Scripted Monologue Speech Data