Egyptian Arabic Call Center Speech Dataset for Telecom

This Egyptian Arabic speech dataset features real-world call center conversations from the Telecom domain. With detailed metadata and accurate transcriptions, it’s designed to power ASR systems, voice AI, and conversational agents.

About this Off-the-shelf Speech Dataset

Introduction

This Egyptian Arabic Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Arabic-speaking telecom customers. Featuring over 40 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

Speech Data

The dataset contains 40 hours of dual-channel call center recordings between native Egyptian Arabic speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

•Participant Diversity:

•

Speakers: 80 native Egyptian Arabic speakers from our verified contributor pool.

•

Regions: Representing multiple provinces across Egypt to ensure coverage of various accents and dialects.

•

Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:

•

Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•

Call Duration: Ranges from 5 to 15 minutes.

•

Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•

Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity

This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

•Inbound Calls:

•Phone Number Porting

•Network Connectivity Issues

•Billing and Payments

•Technical Support

•Service Activation

•International Roaming Enquiry

•Refund Requests and Billing Adjustments

•Emergency Service Access, and others

•Outbound Calls:

•Welcome Calls & Onboarding

•Payment Reminders

•Customer Satisfaction Surveys

•Technical Updates

•Service Usage Reviews

•Network Complaint Status Calls, and more

This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

Transcription

All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

•Transcription Includes:

•Speaker-Segmented Dialogues

•Time-coded Segments

•Non-speech Tags (e.g., pauses, coughs)

•High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

Metadata

Rich metadata is available for each participant and conversation:

•

Participant Metadata: ID, age, gender, accent, dialect, and location.

•

Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

This metadata supports fine-grained analysis, dialect-specific tuning, and precise dataset segmentation.

Usage and Applications

This dataset is ideal for a range of telecom AI and NLP applications:

•

Automatic Speech Recognition (ASR): Fine-tune Arabic speech-to-text systems for telecom interactions.

•

Speech Analytics: Identify user pain points and improve telecom service delivery.

•

Voice Assistants & Chatbots: Build telecom virtual assistants for customer self-service.

•

Sentiment Analysis: Detect customer frustration or satisfaction in support calls.

•

Generative AI: Train telecom-specific summarization and response generation models.

Secure and Ethical Collection

•All data was collected using “Yugo,” FutureBeeAI’s proprietary platform under strict ethical and security standards.

•No personally identifiable information is included.

• The Dataset complies with global data privacy guidelines and is copyright-free.

Updates and Customization

We regularly expand this dataset with new telecom voice data and support full customization:

•Customization Options:

•

Acoustic Environment: Silent or noisy upon request.

•

Sample Rate: Customizable from 8kHz to 48kHz.

•

Transcription Format: Can follow your QA and formatting requirements.

License

This Telecom domain dataset is commercially licensed and ready for integration into Arabic ASR, NLP, and voice AI solutions.

Use Cases

Call Center Conversational AI

Use of speech data for Automatic Speech Recognition

ASR

Chatbot

Language Modelling

TTS

Speech Analytics

Dataset Sample(s)

ATTRIBUTES

TRANSCRIPTION

Dataset Details

Language

Arabic

Language code

ar-eg

Country

Egypt

Accents

Damietta, Al Sharqia ...moreKafr el-Sheikh, Dakahlia, Alexandria, Asyut, Beni Suef, Cairo, Gharbia, Giza

Gender Distribution

M:60, F:40

Age Group

18-70 Years

File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Format

wav

Sample rate

8khz & 16khz

Channel

Stereo (dual-channel, separated speakers)

Audio file duration

5-15 minutes

Read the License Terms

Browse FAQs

Similar to Call Center Conversation Speech Datasets

Gujarati (India)

Gujarati Telecom CC Speech Data

Telecom call center audio data in Gujarati.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Telugu (India)

Telugu Telecom CC Speech Data

Telecom call center audio data in Telugu.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Danish (Denmark)

Danish Telecom CC Speech Data

Telecom call center audio data in Danish.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

English (US)

American English Telecom CC Speech Data

Telecom call center audio data in American English.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

View All

Egyptian Arabic BFSI CC Speech Data

BFSI call center audio data in Egyptian Arabic.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Arabic (Saudi Arabia)

Saudi Arabian Delivery & Lgc CC Speech Data

Delivery & Logistics call center audio data in Saudi Arabian Arabic.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Arabic (Saudi Arabia)

Saudi Arabian Real Estate CC Speech Data

Real Estate call center audio data in Saudi Arabian Arabic.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Arabic (Algeria)

Algerian Arabic Real Estate CC Speech Data

Real Estate call center audio data in Algerian Arabic.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

View All

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Explore Our Latest Insightful Blog

Egyptian Arabic Call Center Speech Dataset for Telecom

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Secure and Ethical Collection

Updates and Customization

License

Use Cases

Dataset Details

File Details

Gujarati Telecom CC Speech Data

Telugu Telecom CC Speech Data

Danish Telecom CC Speech Data

American English Telecom CC Speech Data

Egyptian Arabic BFSI CC Speech Data

Saudi Arabian Delivery & Lgc CC Speech Data

Saudi Arabian Real Estate CC Speech Data

Algerian Arabic Real Estate CC Speech Data