Danish Scripted Monologue Speech Dataset for Delivery & Logistics Domain

The audio dataset comprises scripted monologue speech data in the Delivery & Logistics domain, featuring native Danish speakers from Denmark. It includes speech data, detailed metadata, and accurate transcriptions.

Category

Scripted Prompt Recordings

Total Volume

6000+ prompts

Last updated

July 2025

Number of participants

60+

Delivery & Logistics domain scripted monologue speech dataset in Danish (Denmark)

About this Off-the-shelf Speech Dataset

Card Head Line

Introduction

The Danish Scripted Monologue Speech Dataset for the Delivery & Logistics Domain is a meticulously curated resource developed to support Danish language speech recognition technologies, with a focus on real-world delivery and logistics applications.

Speech Data

This dataset includes 6,000+ high-quality scripted monologue recordings in Danish, crafted to simulate practical scenarios in the delivery and logistics industry. These prompts are ideal for building robust, domain-specific conversational AI and customer support systems.

  • Participant Diversity
  • Speakers: 60 native Danish speakers
  • Regional Representation: Covers diverse dialects and accents from multiple regions of Denmark
  • Demographics: Participants aged 18–70, with a 60:40 male-to-female ratio
  • Recording Specifications
  • Nature of Recordings: Scripted prompts and monologues
  • Average Duration: 5–30 seconds per clip
  • Format: WAV files, mono channel, 16-bit depth, 8 kHz and 16 kHz sample rates
  • Environment: Noise-free, echo-free, quiet recording settings
  • Topic & Scenario Coverage

    The dataset captures a wide variety of realistic delivery and logistics situations, including:

  • Customer service dialogues
  • Order processing and status inquiries
  • Shipping, delivery, and tracking updates
  • Returns, refunds, and complaint handling
  • Technical assistance for delivery issues
  • Regulatory questions and operational policies
  • General advisory and domain-specific statements
  • Linguistic Features

    To simulate authentic conversations, prompts include:

  • Names: Regional male and female names in natural formats
  • Addresses: Diverse location references including street names and regions
  • Dates & Times: Common references for delivery slots, pickups, and ETA
  • Order Numbers: Tracking IDs, invoice numbers, and order references
  • Quantities & Weights: Units related to shipments and packaging
  • Logistics Providers: Mentions of real or fictional courier and logistics services
  • Transcription

    Each audio file is paired with a verbatim transcription, enhancing usability for training and validation:

  • Content: Exact match of the audio prompt
  • Format: Plain text (.TXT) with filenames aligned to audio files
  • Quality Assurance: All transcripts are reviewed by native Danish linguists for precision and consistency
  • Metadata

    Comprehensive metadata accompanies every audio file and participant profile, supporting flexible filtering and model adaptation:

  • Participant Metadata: Unique speaker ID, age, gender, region, and dialect
  • Audio Metadata: Prompt transcript, recording environment, device used, sample rate, bit depth, and file format
  • Use Cases & Applications

    This dataset is a versatile asset for developing various AI-powered voice and language solutions in the Delivery & Logistics sector:

  • Speech Recognition Training: Build ASR systems tailored for logistics workflows
  • Voice Synthesis: Train TTS models for courier updates and customer support bots
  • Voice Assistants: Enable natural-sounding, logistics-specific voice assistants
  • Chatbots: Build dialogue models to automate tracking, complaint handling, and more
  • NER Models: Extract structured data like names, order IDs, dates, and prices
  • Language Understanding: Develop models for sentiment analysis, topic detection, and intent classification in delivery-related contexts
  • Ethical & Secure Data Collection

    All data was collected using Yugo, FutureBeeAI’s proprietary and secure data collection and transcription platform

  • Data privacy and participant consent were ensured through strict ethical practices
  • No personally identifiable information (PII) is present, making the dataset secure for commercial and research use
  • License

    This dataset is published by FutureBeeAI and is available for commercial licensing. It serves as a high-quality resource for enterprises and researchers building Danish speech solutions for the Delivery & Logistics sector.

    Use Cases

    Use of scripted speech monologues datasets for Automatic Speech Recognition

    ASR

    Use of scripted speech monologues datasets for Conversational AI

    Conversational AI

    Use of scripted speech monologues datasets for Chatbot

    Chatbot

    Use of scripted speech monologues datasets for TTS

    TTS

    Use of scripted speech monologues datasets for Speech analytics

    Speech Analytics

    Use of scripted speech monologues datasets for Mobile speech

    Mobile Speech

    Dataset Sample(s)

    Card Head Line

    Dataset Details

    Card Head Line

    Language

    Danish

    Language code

    da

    Country

    Denmark

    Accents

    Insular Danish, Jutlandic ...more

    Gender Distribution

    M:60, F:40

    Age Group

    18-70 Years

    File Details

    Card Head Line

    Environment

    Silent

    Bit Depth

    16 bit

    Sample rate

    8KHz & 16KHz

    Channel

    Mono

    Audio file duration

    5 to 30 seconds

    Need datasets for a specific AI/ML use case?
    Don't worry, we've got you covered! 👍

    Contact Us
    Prompt 2 Bg