Introduction
The Thai Telecom Chat Dataset is a comprehensive collection of over 10,000 text-based conversations between telecom customers and call center agents. This dataset captures real-world service interactions and domain-specific language in Thai, enabling the development of intelligent conversational AI and NLP systems tailored for the telecommunications sector.Participant & Chat Overview
•
Participants:
150+ native Thai speakers from the FutureBeeAI Crowd Community
•
Conversation Length:
300–700 words per chat
•
Turns per Chat:
50–150 dialogue turns across both participants
•
Chat Types:
Inbound and outbound
•
Sentiment Coverage:
A mix of positive, neutral, and negative interactions
Topic Diversity
This dataset spans a wide range of telecom customer service scenarios:
•Inbound Chats (Customer-Initiated)•Network connectivity issues
•Billing inquiries and adjustments
•Technical support requests
•Service activations and upgrades
•International roaming inquiries
•Refunds and complaint resolution
•Emergency service access
•Outbound Chats (Agent-Initiated)•Welcome and onboarding calls
•Payment reminders and due alerts
•Customer satisfaction surveys
•Technical issue follow-ups
•Usage reviews and service feedback
•Promotions and service offers
Language Nuance & Realism
The conversations reflect real-life telecom interactions in Thai, incorporating:
•
Naming Patterns:
Realistic Thai personal, business, and telecom brand names
•
Localized Content:
Phone numbers, email addresses, and locations consistent with regional norms
•
Time & Number Formats:
Thai representations of dates, times, currencies, and service numbers
•
Informal Language & Slang:
Common Thai expressions, idioms, and conversational shortcuts found in telecom discussions
Conversational Flow & Structure
Conversations follow the natural flow of telecom customer service exchanges, including:
•Dialogue Types:•Simple service inquiries
•Detailed problem-solving discussions
•Plan explanations and upgrades
•Feedback collection and status updates
•Interaction Stages:•Initial greetings and verification
•Data or issue collection
•Clarification and troubleshooting
•Resolution and action steps
Data Format & Structure
The dataset is available in TXT, CSV, and JSON formats and includes:
•Full dialogue history with speaker turns
•Participant IDs or anonymized identifiers
•
Optional metadata:
Chat type, sentiment, topic tag
•Designed for seamless integration with major NLP and machine learning pipelines
Applications
This dataset is ideal for a wide range of telecom-focused AI and NLP tasks, including:
•Telecom Chatbots and Smart Assistants
•Intent Classification and Dialogue Management
•
NER and Entity Extraction:
Phone numbers, plan IDs
•Customer Support Automation
•Text Summarization and Response Generation
•Thai NLP Model Fine-tuning
Ethical Collection & Data Security
•
Consent-Based Participation:
All contributors participated with informed consent
•
PII-Free:
No personally identifiable information is shared
•
Secure Infrastructure:
Collected and stored within FutureBeeAI’s secure environment
•
Responsible AI Practices:
Adheres to industry standards in data governance and ethics
Continuous Updates & Customization
The dataset is regularly updated and can be tailored to meet your needs:
•
Custom Annotations:
Sentiment, NER, intent tags, domain-specific labeling
•
Topic Expansion:
Custom chat collection for new telecom use cases
•
Region-Specific Variants:
Country or dialect-specific data collection in Thai
•
Multilingual Extension:
Option to collect similar data in additional languages
Licensing
This dataset is developed and owned by FutureBeeAI and is available for commercial use under flexible licensing terms for enterprises, startups, and research institutions.