Mandarin Call Center Speech Dataset for Delivery & Logistics

This Mandarin speech dataset features real-world call center conversations from the Delivery & Logistics domain. With detailed metadata and accurate transcriptions, it’s designed to power ASR systems, voice AI, and conversational agents.

Category

Unscripted Call Center Conversations

Total Volume

30 Speech Hours

Last updated

June 2025

Number of participants

60

AI voice dataset for Delivery & Logistics in Mandarin (China)

About this Off-the-shelf Speech Dataset

Card Head Line

Introduction

This Mandarin Chinese Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Mandarin-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

Speech Data

The dataset contains 30 hours of dual-channel call center recordings between native Mandarin Chinese speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

  • Participant Diversity:
  • Speakers: 60 native Mandarin Chinese speakers from our verified contributor pool.
  • Regions: Multiple provinces of China for accent and dialect diversity.
  • Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.
  • Recording Details:
  • Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.
  • Call Duration: 5 to 15 minutes on average.
  • Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.
  • Recording Environment: Captured in clean, noise-free, echo-free conditions.
  • Topic Diversity

    This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

  • Inbound Calls:
  • Order Tracking
  • Delivery Complaints
  • Undeliverable Addresses
  • Return Process Enquiries
  • Delivery Method Selection
  • Order Modifications, and more
  • Outbound Calls:
  • Delivery Confirmations
  • Subscription Offer Calls
  • Incorrect Address Follow-ups
  • Missed Delivery Notifications
  • Delivery Feedback Surveys
  • Out-of-Stock Alerts, and others
  • This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

    Transcription

    All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

  • Transcription Includes:
  • Speaker-Segmented Dialogues
  • Time-coded Segments
  • Non-speech Tags (e.g., pauses, noise)
  • High transcription accuracy with word error rate under 5% via dual-layer quality checks.
  • These transcriptions support fast, reliable model development for Mandarin voice AI applications in the delivery sector.

    Metadata

    Detailed metadata is included for each participant and conversation:

  • Participant Metadata: ID, age, gender, region, accent, dialect.
  • Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.
  • This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

    Usage and Applications

    This dataset is ideal for a range of AI and NLP use cases in the delivery and logistics industry:

  • Automatic Speech Recognition (ASR): Build or fine-tune Mandarin Chinese speech-to-text systems.
  • Speech Analytics: Gain insights from customer feedback and logistics-related interactions.
  • Voice Assistants & Chatbots: Enable automated support for deliveries, returns, and updates.
  • Sentiment Analysis: Detect frustration, urgency, or satisfaction in delivery-related calls.
  • Generative AI: Train Mandarin generative models for summarization, call simulation, or support scripts.
  • Secure and Ethical Collection

  • Data collected via FutureBeeAI’s secure platform, “Yugo,” under strict ethical standards.
  • No personally identifiable information is included.
  • Compliant with global data privacy regulations and copyright-free.
  • Updates and Customization

    We regularly update this dataset with fresh audio and offer full customization:

  • Customization Options:
  • Acoustic Conditions: Silent or noisy environments on request.
  • Sample Rate: Configurable between 8kHz and 48kHz.
  • Transcription Format: Custom guidelines or formatting accepted.
  • License

    This Delivery and Logistics domain dataset is commercially licensed and ready for use in ASR, NLP, and voice automation projects in Mandarin.

    Use Cases

    Use of speech data in Conversational AI

    Call Center Conversational AI

    Use of speech data for Automatic Speech Recognition

    ASR

    Use of speech data for Chatbot & voicebot creation

    Chatbot

    Use of speech data in Language Modeling

    Language Modelling

    Use of speech data in Text-into-speech

    TTS

    Speech data usecase in Speech Analytics

    Speech Analytics

    Dataset Sample(s)

    Card Head Line

    Dataset Details

    Card Head Line

    Language

    Mandarin

    Language code

    zh-cn

    Country

    China

    Accents

    Anhui Sheng, Beijing Shi ...more

    Gender Distribution

    M:60, F:40

    Age Group

    18-70 Years

    File Details

    Card Head Line

    Environment

    Silent, Noisy

    Bit Depth

    16 bit

    Format

    wav

    Sample rate

    8khz & 16khz

    Channel

    Stereo (dual-channel, separated speakers)

    Audio file duration

    5-15 minutes

    Need datasets for a specific AI/ML use case?
    Don't worry, we've got you covered! 👍

    Contact Us
    Prompt 2 Bg