Marathi Retail & E-Commerce Conversational Chat Dataset

This dataset features Marathi text-based chat conversations between customers and call center agents, specifically focused on Retail and E-Commerce interactions. Covering a wide range of real-world topics, the dataset captures the authentic language, tone, and flow of Marathi customer service dialogues. It is ideal for training chatbots, virtual assistants, and NLP models for retail-focused applications.

Category

Conversational Chat Dataset

Total volume

10K+ chats

Last Updated

July 2025

Number of participants

150 people

Retail & E-commerce NLP conversational chat dataset in Marathi

About This OTS Dataset

Card Head Line

Introduction

The Marathi Retail & E-Commerce Chat Dataset is a large-scale, high-quality collection of over 12,000 chat conversations between customers and call center agents, focused exclusively on Retail and E-Commerce domains. Designed to reflect real-world service interactions, this dataset supports the development of robust conversational AI and NLP models tailored for Marathi-speaking audiences.

Participant & Chat Overview

  • Contributors: 200 native Marathi speakers from the FutureBeeAI Crowd Community
  • Chat Length: 300–700 words per conversation
  • Turn Count: 50–150 dialogue turns across both participants
  • Chat Types: Inbound and outbound
  • Sentiment Coverage: Positive, neutral, and negative interaction outcomes
  • Topic Diversity

    This dataset spans a wide range of Retail and E-Commerce conversation types:

  • Inbound Chats (Customer-Initiated)
  • Product inquiries
  • Return or exchange requests
  • Order cancellations
  • Refunds and payment issues
  • Membership or subscription queries
  • Shipping, delivery, and more
  • Outbound Chats (Agent-Initiated)
  • Order confirmation and verification
  • Cross-selling and upselling
  • Loyalty program promotions
  • Account updates
  • Special offers and discounts
  • Customer feedback and verification
  • This diversity enables training of models that handle varied intents, scenarios, and outcomes within customer service workflows.

    Language Nuance & Realism

    The dataset is rich in linguistic diversity and mirrors real conversational tone and structure used in Marathi-speaking regions:

  • Personal & Brand Names: Culturally accurate naming conventions
  • Local Elements: Realistic addresses, phone numbers, emails, currency references, and time/date formats
  • Slang & Idioms: Local expressions, informal phrases, and customer service jargon
  • Cultural Specificity: Region-aware vocabulary and tone
  • This linguistic authenticity ensures the development of culturally fluent AI models for Marathi Retail & E-Commerce use cases.

    Conversational Structure & Flow

    The conversations reflect natural dialogue dynamics and are organized into various types of interaction styles:

  • Simple inquiries
  • Detailed problem-solving discussions
  • Transactional exchanges
  • Follow-ups and status updates
  • Advisory and assistance sessions
  • Each conversation includes common dialogue stages such as:

  • Greetings
  • Customer authentication
  • Information gathering
  • Issue resolution
  • Closing remarks
  • Feedback collection
  • This structured flow helps train models to manage real-world service interactions from start to finish.

    Data Format & Structure

    Available in TXT, CSV, and JSON formats, each conversation is structured with fields such as:

  • Participant identifiers
  • Message timestamps (if needed)
  • Full chat history
  • Topic tags and metadata
  • This flexible formatting ensures compatibility with major NLP frameworks and workflows.

    Applications

    This dataset can be used across a wide range of commercial and research applications:

  • Retail Chatbots & Voicebots
  • Customer Service Automation
  • Sentiment Analysis & Intent Detection
  • NER & Information Extraction
  • Text Generation & Prediction Models
  • Multilingual Retail NLP Research
  • Domain-specific Smart Assistants
  • Secure & Ethical Collection

  • Informed Consent: All participants contributed with full awareness and written consent
  • Privacy Protected: No personally identifiable information (PII) is included
  • Secure Pipeline: All data was collected, processed, and stored within FutureBeeAI’s secure platform environment
  • Updates & Customization

    This dataset is regularly updated with fresh conversations and offers extensive customization capabilities:

  • Annotation Options: Add NER, sentiment, intent, or custom tags
  • Topic Expansion: Custom chat collection for specific product categories or services
  • Language Flexibility: Custom chat datasets available in Marathi or other languages upon request
  • Licensing

    This dataset is developed and owned by FutureBeeAI and is available for commercial licensing. Flexible licensing options can be provided for enterprises, academic researchers, and solution providers.

    Use Cases

    Use of conversational chat dataset in Chatbot

    Chatbot

    Use of conversational chat dataset in Text Recognition

    Text Recognition

    Use of conversational chat dataset in Text Analytics

    Text Analysis

    Use of conversational chat dataset in Text Prediction

    Text Prediction

    Use of conversational chat dataset in Smart Assistant

    Smart Assistants

    Dataset Sample(s)

    Card Head Line

    Dataset Details

    Card Head Line

    Dataset type

    Retail & E-commerce Conversational Chats

    Volume

    12K+ chats

    Media type

    Text Only

    Language

    Marathi

    Topics

    100+

    File Details

    Card Head Line

    Turns per Chat

    50-150

    Word count

    300-700 words

    Format

    TXT, DOCS, JSON or CSV

    Annotation

    On Request

    Need datasets for a specific AI/ML use case?
    Don't worry, we've got you covered! 👍

    Contact Us
    Prompt 2 Bg