Gujarati Scripted Monologue Speech Dataset for BFSI Domain

The audio dataset comprises scripted monologue speech data in the BFSI domain, featuring native Gujarati speakers from India. It includes speech data, detailed metadata, and accurate transcriptions.

About this Off-the-shelf Speech Dataset

Introduction

Welcome to the Gujarati Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Gujarati speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.

Speech Data

This dataset includes over 6,000 scripted prompt recordings in Gujarati, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.

•Participant Diversity

•

Speakers: 60 native Gujarati speakers.

•

Regions: Diverse representation from various Gujarat provinces to ensure dialect and accent coverage.

•

Demographics: Age range of 18–70, with a male-to-female ratio of 60:40.

•Recording Details

•

Nature: Scripted monologues and domain-specific prompt recordings.Duration:

•

Audio Format: WAV, mono channel, 16-bit depth, recorded at 8 kHz and 16 kHz sample rates.

•Environment: Clean, echo-free, and noise-free environments.

Topic & Context Diversity

This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:

•Customer service interactions

•Financial transactions & balance inquiries

•Banking and insurance product queries

•Loan & credit support

•Regulatory and compliance questions

•Technical help and password resets

•Promotional campaigns and service updates

Contextual Elements

To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:

•

Names: Region-specific names in multiple formats

•

Addresses: Local address structures and pronunciations

•

Dates & Times: Typical time expressions used in banking

•

Organization Names: Names of banks, financial firms, and institutions

•

Currencies & Amounts: Spoken currency formats, prices, and numeric data

•

IDs & Transaction Numbers: For authentic service simulation

Transcription

Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.

•

Content: Exact match of each prompt

•

Format: Clean .TXT files, mapped to audio file names

•

Accuracy: Reviewed and validated by native Gujarati linguists

Metadata

Each data point is enriched with detailed metadata for advanced training and analysis:

•

Participant Metadata: Unique ID, age, gender, state, country, dialect

•

Recording Metadata: Transcript, recording setup, sample rate, bit depth, device, file format

Applications and Use Cases

This BFSI-focused dataset is ideal for:

•

Speech Recognition Training: Build or fine-tune ASR models in Gujarati

•

Voice Synthesis Models: Create realistic synthetic banking voices

•

Voice Assistants & IVR: Power smart assistants and bots for finance workflows

•

Chatbot Training: Build virtual agents for financial services

•

NER & Entity Extraction: Train NLP models with real-world financial terms

•

Language Understanding: Improve intent detection, sentiment analysis, and topic modeling

Secure & Ethical Data Collection

All data was collected via FutureBeeAI’s proprietary platform Yugo

•Entire workflow conducted within a secure, controlled environment

•Participants gave full consent under strict ethical protocols

•No PII (Personally Identifiable Information) is included

•Fully compliant and safe for commercial use

License

This dataset is created and owned by FutureBeeAI and is available for commercial licensing.

Use Cases

Use of scripted speech monologues datasets for Automatic Speech Recognition

ASR

Conversational AI

Chatbot

Use of scripted speech monologues datasets for TTS

TTS

Speech Analytics

Mobile Speech

Dataset Sample(s)

TRANSCRIPTION

SPEAKER	DURATION	TRANSCRIPT
Male(25)	0:00:05	ચેક રીટર્ન થતાં કોર્ટમાં ફરિયાદ નોંધાવાઇ હતી.
Female(50)	0:00:07	બેંકોએ 3-4 મહિનામાં એફડીના વ્યાજ દરોમાં ઘણો વધારો કર્યો છે.
Female(23)	0:00:06	મોબાઈલ બેકિંગ અને નેટ બેંકિંગે ગ્રાહકોની અનેક મુશ્કેલીઓ ને સરળ બનાવી છે.
Male(24)	0:00:04	RBI દ્વારા રેપો રેટમાં વધારો કરવામાં આવ્યો છે.