Tagalog Conversational Chat Dataset for BFSI Domain

This text dataset consists of chats between two native Tagalog people on diverse topics, specifically tailored to the BFSI domain.

Category

Text-based conversational dataset

Total volume

10K+ chats

Last Updated

Jun 2024

Number of participants

150 people

Get this AI Dataset

Get Dataset Btn

About This OTS Dataset

About Gradiet Line

Introduction

The dataset comprises over 10,000 chat conversations, each focusing on specific BFSI-related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.

  • Participants Details: 150+ native Tagalog participants from the FutureBeeAI community.
  • Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.
  • Topic Diversity

    The chat dataset covers a wide range of conversations on BFSI topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various BFSI use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.

  • Inbound Chats:
  • Account Opening
  • Account Management
  • Transactions
  • Loan Inquiries & Applications
  • Credit Card Services, and many more
  • Outbound Chats:
  • Product & Service Promotions
  • Cross-selling & Upselling
  • Customer Retention & Loyalty Programs
  • Loan Application Follow-ups
  • Insurance Policy Renewals/Reminders, and many more
  • Language Variety & Nuances

    The conversations in this dataset capture the diverse language styles and expressions prevalent in Tagalog BFSI interactions. This diversity ensures the dataset accurately represents the language used by Tagalog speakers in BFSI contexts.

    The dataset encompasses a wide array of language elements, including:

  • Naming Conventions: Chats include a variety of Tagalog personal and business names.
  • Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Tagalog-speaking regions.
  • Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Tagalog forms, adhering to local conventions.
  • Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Tagalog BFSI conversations.
  • This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Tagalog BFSI interactions.

    Conversational Flow and Interaction Types

    The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of BFSI customer-agent interactions.

  • Simple Inquiries
  • Detailed Discussions
  • Transactional Interactions
  • Problem-Solving Dialogues
  • Advisory Sessions
  • Routine Checks and Follow-Ups
  • Each of these conversations contains various aspects of conversation flow like:

  • Greetings
  • Authentication
  • Information gathering
  • Resolution identification
  • Solution Delivery
  • Closing and Follow-ups
  • Feedback, etc
  • This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.

    Data Format and Structure

    The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat messages, designed to be easily accessible and compatible with popular NLP frameworks.

    Usage and Application

    This dataset is useful for various applications in NLP and conversational AI, including:

  • Conversational AI Development: Building of BFSI-specific conversational AI models for automated customer service.
  • Natural Language Processing (NLP) Research: Advancement of Tagalog NLP research in sentiment analysis and language generation.
  • Smart Assistants and Chatbots: Development of smart assistants and chatbots for complex BFSI interactions.
  • Text Recognition and Analytics: Training algorithms for text recognition and analytics in BFSI document processing.
  • Text Prediction and Generation: Improvement of text prediction and generation models for customer-agent interactions.
  • Secure and Ethical Collection

  • The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
  • It does not include any personally identifiable information of any participant, which makes the dataset safe to use.
  • Throughout the data collection process, the data remained within our secure platform and did not leave our environment, ensuring data security and confidentiality.
  • Updates and Customization

    The dataset is regularly updated with new chat data. Customization options are available to meet specific needs, including:

  • Customization & Custom Collection Options:
  • Annotation: Various types of annotations like Named Entity Recognition (NER), Sentiment Analysis, Intent Classification, or any other application-specific annotations can be made available upon request.
  • Different topics: Custom collection can be done on specific requirements in any language and domain.
  • Region-Specific Collection: Country or region-specific terminology can be added or custom collection can be done.
  • License

    This Tagalog Conversational Chat Dataset for BFSI is created by FutureBeeAI and is available for commercial use.

    Use Cases

    Use of conversational chat dataset in Chatbot

    Chatbot

    Use of conversational chat dataset in Text Recognition

    Text Recognition

    Use of conversational chat dataset in Text Analytics

    Text Analysis

    Use of conversational chat dataset in Text Prediction

    Text Prediction

    Use of conversational chat dataset in Smart Assistant

    Smart Assistants

    Dataset Sample(s)

    Sample Line

    Samples will be available soon!

    Contact us to get the samples immediately for this dataset.

    Contact Us

    Audio Arrow BtnAudio Arrow Btn Black
    Audio Promp 2 Bg

    Dataset Details

    Details Headline

    Dataset type

    BFSI Chats

    Volume

    10K+ chats

    Media type

    Text

    Language

    Tagalog

    Topics

    100+

    File Details

    Details Headline

    Number of thread

    50-150

    Word count

    300-700 words

    Format

    JSON, TXT, CSV

    Annotation

    NA

    Need datasets for a specific AI/ML use case? Don’t worry, we’ve got you covered! 👍

    Contact Us

    Arrow BtnArrow Btn Black
    Promp 2 Bg