English (India) Call Center Speech Dataset for Telecom

The audio dataset comprises call center conversations for the Telecom domain, featuring native English speakers from India. It includes speech data, detailed metadata and accurate transcriptions.

Category

Unscripted Call Center Conversations

Total Volume

30 Speech Hours

Last updated

Jun 2024

Number of participants

60

English (India) call center audio recording for Telecom industry
Download
Download Icon

About this Off-the-shelf Speech Dataset

Card Head Line

Introduction

Welcome to the Indian English Call Center Speech Dataset for the Telecom domain designed to enhance the development of call center speech recognition models specifically for the Telecom industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

Speech Data

This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Telecom domain, designed to build robust and accurate customer service speech technology.

  • Participant Diversity:
  • Speakers: 60 expert native Indian English speakers from the FutureBeeAI Community.
  • Regions: Different states/provinces of India, ensuring a balanced representation of Indian accents, dialects, and demographics.
  • Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
  • Recording Details:
  • Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
  • Call Duration: Average duration of 5 to 15 minutes per call.
  • Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
  • Environment: Without background noise and without echo.
  • Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

  • Inbound Calls:
  • Phone Number Porting
  • Network Connectivity Issues
  • Billing and Payments
  • Technical Support
  • Service Activation
  • International Roaming Enquiry
  • Refunds and Billing Adjustments
  • Emergency Service Access, and many more
  • Outbound Calls:
  • Welcome Calls / Onboarding Process
  • Payment Reminders
  • Customer Surveys
  • Technical Updates
  • Service Usage Reviews
  • Network Compliant Status Call, and many more
  • This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

  • Speaker-wise Segmentation: Time-coded segments for both agents and customers.
  • Non-Speech Labels: Tags and labels for non-speech elements.
  • Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.
  • These ready-to-use transcriptions accelerate the development of the Telecom domain call center conversational AI and ASR models for the Indian English language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

  • Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
  • Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.
  • This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Indian English call center speech recognition models.

    Usage and Applications

    This dataset can be used for various applications in the fields of speech recognition, natural language processing, and conversational AI, specifically tailored to the Telecom domain. Potential use cases include:

  • Speech Recognition Models: Training and fine-tuning speech recognition models for Indian English.
  • Speech Analytics Models: Building speech analytics models to extract insights, identify patterns, and glean valuable information from customer conversation, enables data-driven decision-making and process optimization within the Telecom sector.
  • Smart Assistants and Chatbots: Developing conversational agents and virtual assistants for customer service in the Telecom industries.
  • Sentiment Analysis: Analyzing customer sentiment and improving customer experience based on call center interactions.
  • Generative AI: Training generative AI models capable of generating human-like responses, summaries, or content tailored to the Telecom domain.
  • Secure and Ethical Collection

  • Our proprietary data collection and transcription platform, “Yugo” was used throughout the process of this dataset creation.
  • Throughout the data collection process, the data remained within our secure platform and did not leave our environment, ensuring data security and confidentiality.
  • The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
  • It does not include any personally identifiable information about any participant, which makes the dataset safe to use.
  • The dataset does not contain any copyrighted content.
  • Updates and Customization

    Understanding the importance of diverse environments for robust ASR models, our call center voice dataset is regularly updated with new audio data captured in various real-world conditions.

  • Customization & Custom Collection Options:
  • Environmental Conditions: Custom collection in specific environmental conditions upon request.
  • Sample Rates: Customizable from 8kHz to 48kHz.
  • Transcription Customization: Tailored to specific guidelines and requirements.
  • License

    This Telecom domain call center audio dataset is created by FutureBeeAI and is available for commercial use.

    Use Cases

    Use of speech data in Conversational AI

    Call Center Conversational AI

    Use of speech data for Automatic Speech Recognition

    ASR

    Use of speech data for Chatbot & voicebot creation

    Chatbot

    Use of speech data in Language Modeling

    Language Modelling

    Use of speech data in Text-into-speech

    TTS

    Speech data usecase in Speech Analytics

    Speech Analytics

    Dataset Sample(s)

    Card Head Line
    00:00

    ATTRIBUTES

    TRANSCRIPTION

    TIME
    TRANSCRIPT
    0.048 - 1.199
    [noise] Hello future bee.
    3.096 - 4.012
    Hello future bee.
    4.518 - 4.586
    -
    5.052 - 7.363
    Thank you for calling a Jio fibre sir. How may I help you?
    11.514 - 11.709
    -
    12.219 - 12.311
    -
    13.701 - 16.267
    So, what are the all the plans available?
    18.076 - 18.789
    For what? sir
    19.733 - 20.199
    -
    20.247 - 22.641
    #Ah I want to install a (()).
    25.000 - 25.492
    so
    25.586 - 26.510
    Jio fibre sir?
    26.450 - 27.187
    please help me.
    28.426 - 30.183
    #Ah Yes, sir. Sure sir thank you very much.
    28.988 - 30.163
    #Ah Yes Jio fibre.
    30.884 - 36.462
    #Ah Would you like to go for prepaid or postpaid? So, we have different plans in prepaid. We also have different plans in postpaid.
    39.558 - 41.382
    #Ah Prepaid, prepaid will be best.
    42.191 - 54.665
    Okay, sir. And in prepaid, we are currently having four plans. #Ah The best selling one is our four ninety nine plan, which fourteen ninety nine plan, sorry, #Ah for thirty days in which you're having three hundred <initial>MBPS</initial>
    53.462 - 54.645
    Actually I am
    56.382 - 56.793
    Yes, sir.
    59.665 - 68.975
    Currently, I am in a village area. So, can you please check in your systems? My area pin code is three two one two zero one. So, is there #Ah Wi-Fi available like
    69.525 - 75.321
    the Jio fibre team can help me in installing my village area because there is a lot of internet issue.
    75.851 - 78.808
    So, I'm planning to #Ah install a Wi-Fi.
    79.271 - 81.490
    One second sir. Can you please give me your pin code again?
    80.757 - 81.633
    It's okay.
    84.318 - 85.800
    #Ah One two zero one. [noise]
    88.434 - 94.083
    Yes, sir. It's available. So, you can go ahead. #Ah Our Jio fibre is, #Ah connection is available for there also.
    89.645 - 89.745
    -
    94.577 - 96.868
    #Ah Would you like to know further about the plan, sir?
    97.924 - 99.908
    Okay, well done. Okay, okay and.
    100.936 - 102.669
    Yeah, sure. Please go ahead ma'am.
    101.828 - 106.780
    Yeah, so, for our fourteen ninety nine every month plan, you're going to get around #Ah
    107.302 - 123.693
    #Amm fifteen #Ah channel subscriptions for free. Along with that, you're going to get a three hundred <initial>MBPS</initial> speed there in which three hundred <initial>MBPS</initial> upload and three hundred <initial>MBPS</initial> download, which is, I think, #Ah It's more than enough for your working capacity.
    127.533 - 131.450
    Yeah, but I also want to enjoy. So is there any subscription available?
    130.549 - 135.070
    [noise] Yes sir, we have subscriptions for fifteen channels sir like we have the Netflix basic
    135.458 - 137.282
    #Ah we have Amazon Prime.
    137.944 - 149.461
    We have Disney Plus hotstar. We have Sony live. We have Jio cinema premium. We have Zee five. We have hot joy, Sun next, Alt Balaji, discovery plus, Euros now.
    149.975 - 153.457
    We have a epic on Dooku Bay. shar~ shemaroo Me.
    154.633 - 155.872
    And like it plays.
    158.059 - 161.087
    Okay. #Ah Almost all the subscription available right?
    160.808 - 162.517
    Yes sir. It'll be along with this.
    164.330 - 169.454
    [noise] So please ma'am please book a slot with your clients so that they can install.
    169.838 - 174.194
    #Ah Jio fibre in my locality and I can enjoy my work life.
    172.497 - 173.163
    Okay sir.
    175.398 - 181.250
    Thank you very much. #Oh sir you'll be going for the complete one year package or you'll go for the month monthly package sir.
    182.653 - 184.099
    Complete one year package.
    184.178 - 186.884
    [noise] Of three hundred <initial>MBPS</initial> hundred <initial>MBPS</initial> or
    187.378 - 191.322
    Thirty <initial>MBPS</initial> or one fifty <initial>MBPS</initial> We have different plans for each of them sir.
    193.640 - 194.931
    So which one is best.
    195.306 - 209.063
    Sir I would I would like to know what is the kind of work that you work on. Do you need a lot of data like you work on more of the videos and games or data collection or you or you're working on the basic website and web pages in the (()) [noise]
    195.800 - 196.469
    -
    197.091 - 197.158
    -
    206.270 - 206.453
    -
    208.943 - 209.664
    Minimum
    210.234 - 223.403
    Actually, #Ah I used to work on a high speed data. So I do not want any kind of in~ interruption in between. So any plans in which the plan provided me the high speed internet speed data. [noise] I will be ready to invest.
    222.943 - 237.025
    That will be our #Ah the best one is a six ninety nine for twelve months that is eight three eight eight plus <initial>GST</initial> plan for your early plan in which you have you're having hundred <initial>MBPS</initial> of hundred <initial>MBPS</initial> upload and download. #Ah also
    237.147 - 245.751
    it's not just twelve months along with this you will also get thirty days extra sir. That is you can #Ah think that #Ah we are have we are giving kind of #Ah
    246.178 - 246.532
    #Ah
    247.484 - 251.412
    bundle free for you for one next month. That is thirteen months package.
    254.525 - 257.310
    That the benefit that you are going to get by using this plan sir.
    254.716 - 257.246
    Okay okay. Its seems very and lucrative.
    259.541 - 261.701
    Yeah. It is very lucrative offer. So
    262.138 - 263.843
    when can I expect my installation?
    264.058 - 275.297
    Sir can you please give your detailed address #Ah [noise] by in in coming to business working days our technician will visit your house along with your fibre kit and he will install and he will completely set up.
    275.827 - 280.070
    And he will help you out and how to use it later and change the password and all of it sir.
    283.285 - 288.150
    #Ah Please write down the address it is a Kamakshipuram <initial>XSR</initial> layout Bangalore.
    288.544 - 288.999
    Yes, sir.
    292.493 - 292.588
    -
    292.684 - 293.572
    pin code sir.
    294.782 - 296.654
    [noise] #Ah pin code <PII>three two one two zero one</PII>
    296.990 - 297.411
    -
    297.377 - 297.803
    Yes sir.
    297.752 - 297.987
    -
    298.365 - 307.138
    All right, very well noted sir. We will be come the team. Our technician will visit you house and next coming two days and you going to receive a message now and confirmation. Thank you very much for calling Jio fibre.
    307.337 - 307.819
    Thank you.
    308.779 - 308.959
    Thank.

    Dataset Details

    Card Head Line

    Language

    English

    Language code

    en-In

    Country

    India

    Accents

    Chandigarh, Chhattisgarh ...more

    Gender Distribution

    M:60, F:40

    Age Group

    18-70

    File Details

    Card Head Line

    Environment

    Silent, Noisy

    Bit Depth

    16 bit

    Format

    wav

    Sample rate

    8khz & 16 khz

    Channel

    Stereo

    Audio file duration

    5-15 minutes

    Need datasets for a specific AI/ML use case?
    Don't worry, we've got you covered! 👍

    Contact Us
    Prompt 2 Bg