British English Scripted Monologue Speech Dataset for General Domain

The audio dataset comprises scripted monologue speech data in the General domain, featuring native English speakers from UK. It includes speech data, detailed metadata, and accurate transcriptions.

Category

Scripted Prompt Recordings

Total Volume

5000+ prompts

Last updated

July 2025

Number of participants

40+

English (UK) Spnoteneous scripted prompts dataset for speech technology
Download
Download Icon

About this Off-the-shelf Speech Dataset

Card Head Line

Introduction

The UK English Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of English language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic English speech data.

Speech Data

This dataset features over 6,000 high-quality scripted monologue recordings in UK English. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.

  • Participant Diversity
  • Speakers: 60 native UK English speakers
  • Regions: Broad regional coverage ensures diverse accents and dialects
  • Demographics: Participants aged 18 to 70, with a 60:40 male-to-female ratio
  • Recording Specifications
  • Recording Type: Scripted monologues and prompt-based recordings
  • Audio Duration: 5 to 30 seconds per file
  • Format: WAV, mono channel, 16-bit, 8 kHz & 16 kHz sample rates
  • Environment: Clean, noise-free conditions to ensure clarity and usability
  • Topic Coverage

    The dataset covers a wide variety of general conversation scenarios, including:

  • Daily Conversations
  • Topic-Specific Discussions
  • General Knowledge and Advice
  • Idioms and Sayings
  • Contextual Features

    To enhance authenticity, the prompts include:

  • Names: Male and female names specific to different United Kingdom regions
  • Addresses: Commonly used address formats in daily UK English speech
  • Dates & Times: References used in general scheduling and time expressions
  • Organization Names: Names of businesses, institutions, and other entities
  • Numbers & Currencies: Mentions of quantities, prices, and monetary values
  • Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.

    Transcription

    Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.

  • Content: Exact match to the spoken audio
  • Format: Plain text (.TXT), named identically to the corresponding audio file
  • Quality Control: All transcripts are validated by native English transcribers
  • Metadata

    Rich metadata is included for detailed filtering and analysis:

  • Speaker Metadata: Unique speaker ID, age, gender, region, and dialect
  • Audio Metadata: Prompt transcript, recording setup, device specs, sample rate, bit depth, and format
  • Applications & Use Cases

    This dataset can power a variety of English language AI technologies, including:

  • Speech Recognition Training: ASR model development and fine-tuning
  • Voice Synthesis: Training data for TTS and voice cloning models
  • Voice Assistants: Building general-purpose English voice assistants
  • Entity Recognition: Identifying names, numbers, and key terms
  • Language Understanding: Training models for tasks like sentiment analysis, topic classification, and semantic parsing
  • Ethical & Secure Data Collection

    All data was collected using FutureBeeAI’s proprietary Yugo platform

  • Data remained secure and was never shared externally during the process
  • Participant consent and ethical guidelines were strictly followed
  • No personally identifiable information (PII) is included in the dataset
  • License

    This dataset is developed and owned by FutureBeeAI and is available for commercial use, offering high-value resources for enterprises and research organizations developing English speech technologies.

    Use Cases

    Use of scripted speech monologues datasets for Automatic Speech Recognition

    ASR

    Use of scripted speech monologues datasets for Conversational AI

    Conversational AI

    Use of scripted speech monologues datasets for Chatbot

    Chatbot

    Use of scripted speech monologues datasets for TTS

    TTS

    Use of scripted speech monologues datasets for Speech analytics

    Speech Analytics

    Use of scripted speech monologues datasets for Mobile speech

    Mobile Speech

    Dataset Sample(s)

    Card Head Line

    TRANSCRIPTION

    SPEAKERDURATIONTRANSCRIPT

    Dataset Details

    Card Head Line

    Language

    English

    Language code

    en-gb

    Country

    UK

    Accents

    English - East and Central Midlands, English - East Anglia ...more

    Gender Distribution

    M:60, F:40

    Age Group

    18-70 Years

    File Details

    Card Head Line

    Environment

    Silent

    Bit Depth

    16 bit

    Sample rate

    8KHz & 16KHz

    Channel

    Mono

    Audio file duration

    5 to 30 seconds

    Need datasets for a specific AI/ML use case?
    Don't worry, we've got you covered! 👍

    Contact Us
    Prompt 2 Bg