Malayalam Scripted Monologue Speech Dataset for BFSI Domain

The audio dataset comprises scripted monologue speech data in the BFSI domain, featuring native Malayalam speakers from India. It includes speech data, detailed metadata, and accurate transcriptions.

About this Off-the-shelf Speech Dataset

Introduction

Welcome to the Malayalam Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Malayalam speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.

Speech Data

This dataset includes over 6,000 scripted prompt recordings in Malayalam, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.

•Participant Diversity

•

Speakers: 60 native Malayalam speakers.

•

Regions: Diverse representation from various Kerala provinces to ensure dialect and accent coverage.

•

Demographics: Age range of 18–70, with a male-to-female ratio of 60:40.

•Recording Details

•

Nature: Scripted monologues and domain-specific prompt recordings.Duration:

•

Audio Format: WAV, mono channel, 16-bit depth, recorded at 8 kHz and 16 kHz sample rates.

•Environment: Clean, echo-free, and noise-free environments.

Topic & Context Diversity

This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:

•Customer service interactions

•Financial transactions & balance inquiries

•Banking and insurance product queries

•Loan & credit support

•Regulatory and compliance questions

•Technical help and password resets

•Promotional campaigns and service updates

Contextual Elements

To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:

•

Names: Region-specific names in multiple formats

•

Addresses: Local address structures and pronunciations

•

Dates & Times: Typical time expressions used in banking

•

Organization Names: Names of banks, financial firms, and institutions

•

Currencies & Amounts: Spoken currency formats, prices, and numeric data

•

IDs & Transaction Numbers: For authentic service simulation

Transcription

Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.

•

Content: Exact match of each prompt

•

Format: Clean .TXT files, mapped to audio file names

•

Accuracy: Reviewed and validated by native Malayalam linguists

Metadata

Each data point is enriched with detailed metadata for advanced training and analysis:

•

Participant Metadata: Unique ID, age, gender, state, country, dialect

•

Recording Metadata: Transcript, recording setup, sample rate, bit depth, device, file format

Applications and Use Cases

This BFSI-focused dataset is ideal for:

•

Speech Recognition Training: Build or fine-tune ASR models in Malayalam

•

Voice Synthesis Models: Create realistic synthetic banking voices

•

Voice Assistants & IVR: Power smart assistants and bots for finance workflows

•

Chatbot Training: Build virtual agents for financial services

•

NER & Entity Extraction: Train NLP models with real-world financial terms

•

Language Understanding: Improve intent detection, sentiment analysis, and topic modeling

Secure & Ethical Data Collection

All data was collected via FutureBeeAI’s proprietary platform Yugo

•Entire workflow conducted within a secure, controlled environment

•Participants gave full consent under strict ethical protocols

•No PII (Personally Identifiable Information) is included

•Fully compliant and safe for commercial use

License

This dataset is created and owned by FutureBeeAI and is available for commercial licensing.

Use Cases

Use of scripted speech monologues datasets for Automatic Speech Recognition

ASR

Conversational AI

Chatbot

Use of scripted speech monologues datasets for TTS

TTS

Speech Analytics

Mobile Speech

Dataset Sample(s)

Dataset Details

Language

Malayalam

Language code

ml-in

Country

India

Accents

Kasaragod, North Malabar ...moreWayanad, Kozhikode, Eranad, Valluvanad (South Malabar), Palakkad, Thrissur-Koch

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent

Bit Depth

16 bit

Sample rate

8KHz & 16KHz

Channel

Mono

Audio file duration

5 to 30 seconds

Read the License Terms

Browse FAQs

Similar to Industry Specific Scripted Monologue Speech Datasets

Telugu (India)

Telugu BFSI Scripted Monologue Speech Data

Audio recordings of scripted prompts in Telugu language for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

Portuguese (Brazil)

Brazilian Portuguese BFSI Scripted Monologue Data

Audio recordings of scripted prompts in Brazilian Portuguese for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

Italian (Italy)

Italian BFSI Scripted Monologue Speech Data

Audio recordings of scripted prompts in Italian Langauge for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

Arabic (Saudi Arabia)

Saudi Arabian Arabic BFSI Scripted Monologue Data

Audio recordings of scripted prompts in Saudi Arabian Arabic for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

View All

Malayalam Travel Scripted Monologue Speech Data

Audio recordings of scripted prompts in Malayalam Langauge for Travel domain.

6000+ prompts

60+ people

ASR

Conversational AI

Malayalam (India)

Malayalam Real Estate Scripted Monologue Data

Audio recordings of scripted prompts in Malayalam Langauge for Real Estate domain.

6000+ prompts

60+ people

ASR

Conversational AI

Malayalam (India)

Malayalam Retail Scripted Monologue Speech Data

Recordings of scripted prompts in Malayalam Langauge for Retail & E-commerce.

6000+ prompts

60+ people

ASR

Conversational AI

Malayalam (India)

Malayalam Delivery domain Monologue Data

Recordings of scripted prompts in Malayalam Langauge for Delivery & Logistics.

6000+ prompts

60+ people

ASR

Conversational AI

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Malayalam Scripted Monologue Speech Dataset for BFSI Domain

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic & Context Diversity

Contextual Elements

Transcription

Metadata

Applications and Use Cases

Secure & Ethical Data Collection

License

Use Cases

Samples will be available soon!

Dataset Details

File Details

Telugu BFSI Scripted Monologue Speech Data

Brazilian Portuguese BFSI Scripted Monologue Data

Italian BFSI Scripted Monologue Speech Data

Saudi Arabian Arabic BFSI Scripted Monologue Data

Malayalam Travel Scripted Monologue Speech Data

Malayalam Real Estate Scripted Monologue Data

Malayalam Retail Scripted Monologue Speech Data

Malayalam Delivery domain Monologue Data