European Portuguese Scripted Monologue Speech Dataset for BFSI Domain

The audio dataset comprises scripted monologue speech data in the BFSI domain, featuring native Portuguese speakers from Portugal. It includes speech data, detailed metadata, and accurate transcriptions.

About this Off-the-shelf Speech Dataset

Introduction

Welcome to the Portuguese Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Portuguese speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.

Speech Data

This dataset includes over 6,000 scripted prompt recordings in Portuguese, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.

•Participant Diversity

•

Speakers: 60 native Portuguese speakers.

•

Regions: Diverse representation from various Portugal provinces to ensure dialect and accent coverage.

•

Demographics: Age range of 18–70, with a male-to-female ratio of 60:40.

•Recording Details

•

Nature: Scripted monologues and domain-specific prompt recordings.Duration:

•

Audio Format: WAV, mono channel, 16-bit depth, recorded at 8 kHz and 16 kHz sample rates.

•Environment: Clean, echo-free, and noise-free environments.

Topic & Context Diversity

This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:

•Customer service interactions

•Financial transactions & balance inquiries

•Banking and insurance product queries

•Loan & credit support

•Regulatory and compliance questions

•Technical help and password resets

•Promotional campaigns and service updates

Contextual Elements

To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:

•

Names: Region-specific names in multiple formats

•

Addresses: Local address structures and pronunciations

•

Dates & Times: Typical time expressions used in banking

•

Organization Names: Names of banks, financial firms, and institutions

•

Currencies & Amounts: Spoken currency formats, prices, and numeric data

•

IDs & Transaction Numbers: For authentic service simulation

Transcription

Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.

•

Content: Exact match of each prompt

•

Format: Clean .TXT files, mapped to audio file names

•

Accuracy: Reviewed and validated by native Portuguese linguists

Metadata

Each data point is enriched with detailed metadata for advanced training and analysis:

•

Participant Metadata: Unique ID, age, gender, state, country, dialect

•

Recording Metadata: Transcript, recording setup, sample rate, bit depth, device, file format

Applications and Use Cases

This BFSI-focused dataset is ideal for:

•

Speech Recognition Training: Build or fine-tune ASR models in Portuguese

•

Voice Synthesis Models: Create realistic synthetic banking voices

•

Voice Assistants & IVR: Power smart assistants and bots for finance workflows

•

Chatbot Training: Build virtual agents for financial services

•

NER & Entity Extraction: Train NLP models with real-world financial terms

•

Language Understanding: Improve intent detection, sentiment analysis, and topic modeling

Secure & Ethical Data Collection

All data was collected via FutureBeeAI’s proprietary platform Yugo

•Entire workflow conducted within a secure, controlled environment

•Participants gave full consent under strict ethical protocols

•No PII (Personally Identifiable Information) is included

•Fully compliant and safe for commercial use

License

This dataset is created and owned by FutureBeeAI and is available for commercial licensing.

Use Cases

Use of scripted speech monologues datasets for Automatic Speech Recognition

ASR

Conversational AI

Chatbot

Use of scripted speech monologues datasets for TTS

TTS

Speech Analytics

Mobile Speech

Dataset Sample(s)

Dataset Details

Language

Portuguese

Language code

Country

Portugal

Accents

southern and central dialects, northern dialects

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent

Bit Depth

16 bit

Sample rate

8KHz & 16KHz

Channel

Mono

Audio file duration

5 to 30 seconds

Read the License Terms

Browse FAQs

Similar to Industry Specific Scripted Monologue Speech Datasets

Tamil (India)

Tamil BFSI Scripted Monologue Speech Data

Audio recordings of scripted prompts in Tamil language for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

Romanian (Romania)

Romanian BFSI Scripted Monologue Speech Data

Audio recordings of scripted prompts in Romanian language for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

Czech (Czech Republic)

Czech BFSI Scripted Monologue Speech Data

Audio recordings of scripted prompts in Czech language for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

English (Australia)

Australian English BFSI Scripted Monologue Data

Audio recordings of scripted prompts in Australian English for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

View All

Portuguese Retail Scripted Monologue Speech Data

Recordings of scripted prompts in Portuguese Langauge for Retail & E-commerce.

6000+ prompts

60+ people

ASR

Conversational AI

Portuguese (Brazil)

Brazilian Portuguese BFSI Scripted Monologue Data

Audio recordings of scripted prompts in Brazilian Portuguese for BFSI domain.

6000+ prompts

60+ people

ASR

Conversational AI

Portuguese (Portugal)

European Portuguese Delivery domain Monologue Data

Recordings of scripted prompts in Portuguese Langauge for Delivery & Logistics.

6000+ prompts

60+ people

ASR

Conversational AI

Portuguese (Portugal)

Portuguese Telecom Scripted Monologue Data

Audio recordings of scripted prompts in Portuguese Langauge for Telecom domain.

6000+ prompts

60+ people

ASR

Conversational AI

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

European Portuguese Scripted Monologue Speech Dataset for BFSI Domain

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic & Context Diversity

Contextual Elements

Transcription

Metadata

Applications and Use Cases

Secure & Ethical Data Collection

License

Use Cases

Samples will be available soon!

Dataset Details

File Details

Tamil BFSI Scripted Monologue Speech Data

Romanian BFSI Scripted Monologue Speech Data

Czech BFSI Scripted Monologue Speech Data

Australian English BFSI Scripted Monologue Data

Portuguese Retail Scripted Monologue Speech Data

Brazilian Portuguese BFSI Scripted Monologue Data

European Portuguese Delivery domain Monologue Data

Portuguese Telecom Scripted Monologue Data