Algerian Arabic Call Center Speech Dataset for Retail & E-commerce

This Algerian Arabic speech dataset features real-world call center conversations from the Retail and E-commerce domain. With detailed metadata and accurate transcriptions, it’s designed to power ASR systems, voice AI, and conversational agents.

About this Off-the-shelf Speech Dataset

Introduction

This Algerian Arabic Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Arabic speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.

Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.

Speech Data

The dataset contains 30 hours of dual-channel call center recordings between native Algerian Arabic speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.

•Participant Diversity:

•

Speakers: 60 native Algerian Arabic speakers from our verified contributor pool.

•

Regions: Representing multiple provinces across Algeria to ensure coverage of various accents and dialects.

•

Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:

•

Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•

Call Duration: Ranges from 5 to 15 minutes.

•

Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•

Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity

This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.

•Inbound Calls:

•Product Inquiries

•Order Cancellations

•Refund & Exchange Requests

•Subscription Queries, and more

•Outbound Calls:

•Order Confirmations

•Upselling & Promotions

•Account Updates

•Loyalty Program Offers

•Customer Verifications, and others

Such variety enhances your model’s ability to generalize across retail-specific voice interactions.

Transcription

All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

•Transcription Includes:

•Speaker-Segmented Dialogues

•30 hours-coded Segments

•Non-speech Tags (e.g., pauses, cough)

•High transcription accuracy with word error rate < 5% due to double-layered quality checks.

These transcriptions are production-ready, making model training faster and more accurate.

Metadata

Rich metadata is available for each participant and conversation:

•

Participant Metadata: ID, age, gender, accent, dialect, and location.

•

Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.

Usage and Applications

This dataset is ideal for a range of voice AI and NLP applications:

•

Automatic Speech Recognition (ASR): Fine-tune Arabic speech-to-text systems.

•

Speech Analytics: Extract customer insights and behavior patterns.

•

Voice Assistants & Chatbots: Train natural-sounding Arabic voice interfaces.

•

Sentiment Analysis: Detect emotion and intent from customer calls.

•

Generative AI: Use in training dialogue generation and summarization models.

Secure and Ethical Collection

•All data was collected using “Yugo,” FutureBeeAI’s proprietary platform under strict ethical and security standards.

•No personally identifiable information is included.

•Dataset complies with global data privacy guidelines and is copyright-free.

Updates and Customization

We regularly expand this dataset with fresh recordings and offer tailored options:

•Customization Options:

•

Acoustic Environment: Silent or noisy upon request.

•

Sample Rate: Customizable from 8kHz to 48kHz.

•

Transcription Format: Can follow your QA and formatting requirements.

License

This dataset is commercially licensed and ready for integration into your ASR, NLP, or voice AI pipeline.

Use Cases

Call Center Conversational AI

Use of speech data for Automatic Speech Recognition

ASR

Chatbot

Language Modelling

TTS

Speech Analytics

Dataset Sample(s)

ATTRIBUTES

TRANSCRIPTION

Dataset Details

Language

Arabic

Language code

ar-dz

Country

Algeria

Accents

Eastern Hilal, Central Hilal ...moreMâqil Dialects

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Format

wav

Sample rate

8khz & 16khz

Channel

Stereo (dual-channel, separated speakers)

Audio file duration

5-15 minutes

Read the License Terms

Browse FAQs

Similar to Call Center Conversation Speech Datasets

Malay (Malaysia)

Malay Retail & E-com CC Speech Data

Retail & E-commerce call center audio data in Malay

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Filipino (Philippines)

Filipino Retail & E-com CC Speech Data

Retail & E-commerce call center audio data in Filipino.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Thai (Thailand)

Thai Retail & E-com CC Speech Data

Retail & E-commerce call center audio data in Thai

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Danish (Denmark)

Danish Retail & E-com CC Speech Data

Retail & E-commerce call center audio data in Danish.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

View All

Algerian Arabic Travel CC Speech Data

Travel call center audio data in Algerian Arabic.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Arabic (Egypt)

Egyptian Arabic Retail & E-com CC Speech Data

Retail & E-commerce call center audio data in Egyptian Arabic.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Arabic (Saudi Arabia)

Saudi Arabian Healthcare CC Speech Data

Healthcare call center audio data in Saudi Arabian Arabic.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Arabic (Egypt)

Egyptian Arabic Healthcare CC Speech Data

Healthcare call center audio data in Egyptian Arabic.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Algerian Arabic Call Center Speech Dataset for Retail & E-commerce

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Secure and Ethical Collection

Updates and Customization

License

Use Cases

Dataset Details

File Details

Malay Retail & E-com CC Speech Data

Filipino Retail & E-com CC Speech Data

Thai Retail & E-com CC Speech Data

Danish Retail & E-com CC Speech Data

Algerian Arabic Travel CC Speech Data

Egyptian Arabic Retail & E-com CC Speech Data

Saudi Arabian Healthcare CC Speech Data

Egyptian Arabic Healthcare CC Speech Data