English (India) Call Center Speech Dataset for Healthcare

The audio dataset comprises call center conversations for the Healthcare domain, featuring native English speakers from India. It includes speech data, detailed metadata and accurate transcriptions.

Category

Unscripted Call Center Conversations

Total Volume

30 Speech Hours

Last updated

Jun 2024

Number of participants

60

English (India) call center audio recording for Healthcare industry
Download
Download Icon

About this Off-the-shelf Speech Dataset

Card Head Line

Introduction

Welcome to the Indian English Call Center Speech Dataset for the Healthcare domain designed to enhance the development of call center speech recognition models specifically for the Healthcare industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

Speech Data

This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Healthcare domain, designed to build robust and accurate customer service speech technology.

  • Participant Diversity:
  • Speakers: 60 expert native Indian English speakers from the FutureBeeAI Community.
  • Regions: Different states/provinces of India, ensuring a balanced representation of Indian accents, dialects, and demographics.
  • Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
  • Recording Details:
  • Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
  • Call Duration: Average duration of 5 to 15 minutes per call.
  • Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
  • Environment: Without background noise and without echo.
  • Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

  • Inbound Calls:
  • Appointment Scheduling
  • New Patient Registration
  • Surgery Consultation
  • Consultation regarding Diet, and many more
  • Outbound Calls:
  • Appointment Reminder
  • Health and Wellness Subscription Programs
  • Lab Tests Results
  • Health Risk Assessments
  • Preventive Care Reminders, and many more
  • This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

  • Speaker-wise Segmentation: Time-coded segments for both agents and customers.
  • Non-Speech Labels: Tags and labels for non-speech elements.
  • Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.
  • These ready-to-use transcriptions accelerate the development of the Healthcare domain call center conversational AI and ASR models for the Indian English language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

  • Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
  • Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.
  • This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Indian English call center speech recognition models.

    Usage and Applications

    This dataset can be used for various applications in the fields of speech recognition, natural language processing, and conversational AI, specifically tailored to the Healthcare domain. Potential use cases include:

  • Speech Recognition Models: Training and fine-tuning speech recognition models for Indian English.
  • Speech Analytics Models: Building speech analytics models to extract insights, identify patterns, and glean valuable information from customer conversation, enables data-driven decision-making and process optimization within the Healthcare sector.
  • Smart Assistants and Chatbots: Developing conversational agents and virtual assistants for customer service in the Healthcare industries.
  • Sentiment Analysis: Analyzing customer sentiment and improving customer experience based on call center interactions.
  • Generative AI: Training generative AI models capable of generating human-like responses, summaries, or content tailored to the Healthcare domain.
  • Secure and Ethical Collection

  • Our proprietary data collection and transcription platform, “Yugo” was used throughout the process of this dataset creation.
  • Throughout the data collection process, the data remained within our secure platform and did not leave our environment, ensuring data security and confidentiality.
  • The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
  • It does not include any personally identifiable information about any participant, which makes the dataset safe to use.
  • The dataset does not contain any copyrighted content.
  • Updates and Customization

    Understanding the importance of diverse environments for robust ASR models, our call center voice dataset is regularly updated with new audio data captured in various real-world conditions.

  • Customization & Custom Collection Options:
  • Environmental Conditions: Custom collection in specific environmental conditions upon request.
  • Sample Rates: Customizable from 8kHz to 48kHz.
  • Transcription Customization: Tailored to specific guidelines and requirements.
  • License

    This Healthcare domain call center audio dataset is created by FutureBeeAI and is available for commercial use.

    Use Cases

    Use of speech data in Conversational AI

    Call Center Conversational AI

    Use of speech data for Automatic Speech Recognition

    ASR

    Use of speech data for Chatbot & voicebot creation

    Chatbot

    Use of speech data in Language Modeling

    Language Modelling

    Use of speech data in Text-into-speech

    TTS

    Speech data usecase in Speech Analytics

    Speech Analytics

    Dataset Sample(s)

    Card Head Line
    00:00

    ATTRIBUTES

    TRANSCRIPTION

    TIME
    TRANSCRIPT
    0.075 - 1.375
    Hello Futurebee.
    2.875 - 4.250
    Hello Futurebee.
    5.974 - 7.025
    Good morning.
    7.450 - 8.525
    Good morning madam.
    11.175 - 14.775
    [filler] Can you help me for some health care things?
    17.600 - 18.524
    Yes madam.
    19.500 - 21.875
    Am a mediclaim [filler] agent.
    23.300 - 23.975
    <lang:Foreign>हाँ</lang:Foreign>
    24.625 - 29.024
    okay <lang:Foreign>तो</lang:Foreign> can you suggest me some good
    29.699 - 32.500
    [filler] health care products because
    33.600 - 37.725
    nowdays als~ I see all uncertainaty everywhere
    38.450 - 39.423
    so
    41.100 - 46.548
    [filler] like I get scared what things what will happen sometimes.
    41.200 - 42.125
    true madam
    47.750 - 52.899
    so can you suggest me some good products [filler] about it?
    53.548 - 54.100
    yes
    54.600 - 57.524
    no worries madam. there is nothing to get scared of because
    58.048 - 62.076
    one day eventually everyone has to go from this earth, that is
    62.850 - 64.049
    the certanity.
    63.526 - 64.549
    that is (())
    65.126 - 66.251
    that is sure
    65.349 - 70.974
    so what yes sure sure so who~ whose may I know how you got my reference madam?
    73.400 - 79.551
    I ha~ I was going through [filler] means health care products, so I saw a video
    78.275 - 79.001
    [filler]
    80.575 - 83.200
    <lang:Foreign>हाँ</lang:Foreign> okay okay I recently posted (())
    81.224 - 87.400
    one is two three mini [filler] videos I had seen so I liked your's more
    86.575 - 87.200
    [filler]
    87.974 - 96.575
    so I thought of calling you and I could not get your number first but later I could search it and I got your number also
    89.525 - 90.150
    got it
    90.849 - 91.676
    thank you mam
    93.974 - 94.676
    [filler]
    97.525 - 102.900
    so I wanted to know some health care, good health care products.
    99.525 - 100.400
    okay
    104.176 - 107.849
    so whatever queries you have you can ask me.
    106.501 - 107.150
    sure
    110.450 - 116.677
    yes madam. basically what I will need is how many persons are there in your family?
    113.599 - 114.349
    [filler]
    118.126 - 118.902
    okay
    118.626 - 126.052
    and I will need birth date of each. You can just send it by [filler] whatsapp later on madam. I I will explain you the
    120.075 - 120.777
    [filler]
    126.777 - 127.602
    okay
    126.802 - 130.002
    various plans what we have. the best plans what we have
    131.127 - 133.026
    because our company is
    133.699 - 134.502
    [filler]
    133.727 - 136.776
    one of the top most three companies. if you can see.
    138.324 - 140.252
    on [filler] web also
    140.852 - 141.574
    okay
    141.526 - 145.227
    so we take pride in the (()) this company. yes yes.
    142.627 - 144.800
    yah I have seen on website.
    145.852 - 146.300
    good
    146.352 - 147.127
    okay
    149.401 - 152.602
    so what we have is a family floater type of thing.
    153.449 - 155.502
    in which we cover
    155.199 - 155.877
    <lang:Foreign>आ हा</lang:Foreign>
    156.199 - 159.026
    offer cover health cover for entire family
    156.776 - 157.401
    [filler]
    160.574 - 166.449
    because what happened is that as you rightly said there is uncertainty everywhere specially from
    162.477 - 163.276
    okay
    166.300 - 166.901
    [filler]
    167.574 - 168.602
    corona
    168.727 - 169.574
    [filler]
    172.925 - 173.727
    yes
    173.002 - 178.550
    so in that any family [filler] those who were having health [filler] mediclaim
    175.002 - 175.953
    that is true
    179.425 - 183.853
    that they could easy [filler] they could i would say not easily but they could
    180.276 - 180.978
    [filler]
    184.526 - 185.603
    pass through it
    186.276 - 191.800
    with not much of [filler] money problem but those who didn't have it they had big problem
    192.776 - 196.002
    with the money because it costed heavily
    196.026 - 196.703
    yes
    197.203 - 198.352
    during corona (())
    197.953 - 198.675
    yes
    199.175 - 200.703
    and every [filler] people
    199.300 - 206.953
    yes actually after that only after that only we thought of taking care of our full family.
    207.328 - 215.078
    because till that time we used to think there is no need there is no need only cough and cold like [filler] so it's not very big issue
    208.925 - 209.925
    yah understand
    214.901 - 215.651
    [filler]
    215.776 - 221.103
    but now days like after corona only we got all all of us got worried about it
    217.453 - 218.377
    [filler]
    222.703 - 223.627
    so
    222.877 - 228.828
    yes yes yes we should thank god that we every [filler] one who is alive presently
    230.675 - 232.002
    we should thank god for that
    232.651 - 238.276
    so what I was saying that it's a family floater it offers protection for to the entire family
    234.526 - 235.229
    yes
    236.854 - 237.377
    [filler]
    238.329 - 239.104
    [filler]
    239.679 - 243.026
    and [filler] and to work out (())
    241.776 - 242.627
    okay
    243.604 - 250.329
    exact what to say the money part of it I will need the full data from you like names and [filler] date of births from
    251.151 - 252.679
    all the family members
    254.979 - 260.052
    you can also include. are you earning member madam or your husband will be [filler]?
    255.829 - 256.629
    okay
    261.653 - 262.879
    to pay (())
    262.427 - 267.853
    no I am not an earning member we are dependent I am dependent on my husband
    264.177 - 265.353
    okay I understand
    268.052 - 270.204
    okay fine so he can cover
    270.754 - 274.329
    elderly [filler] parents also in the same floater
    276.004 - 277.829
    okay that is nice
    277.653 - 286.129
    yes and upto [filler] children upto twenty five age are covered by husband and after twenty five child has to take his
    287.103 - 289.004
    independent health plan
    291.204 - 292.180
    okay
    291.478 - 294.204
    kay and what we offer is
    296.480 - 298.855
    I would say o~ one of the best
    300.204 - 308.355
    offers you you can find in the market like we pay even for the consumables like cotton waste or syringes or injections
    301.653 - 302.379
    [filler]
    309.254 - 310.153
    from that
    310.680 - 315.180
    from that we take care of [filler] so that we you don't have to pay anything
    311.480 - 312.329
    okay
    316.129 - 319.778
    it's completely cashless type of insurance you will get
    321.403 - 324.379
    and we also cover one of [filler] few topmost
    324.028 - 324.930
    okay
    325.129 - 326.153
    hospitals
    326.855 - 328.105
    in every city
    328.754 - 332.403
    you can find our presence in all over india
    332.379 - 333.079
    okay
    335.305 - 347.504
    okay one more query I have I want to ask because when we take this healthcare products that time every body tells yes this we provide that we provide
    338.605 - 339.555
    sure madam
    344.129 - 344.930
    [filler]
    347.730 - 351.805
    and when the time comes that is what I have heard I have not gone through that
    349.629 - 350.504
    [filler]
    352.754 - 365.656
    [filler] they last <lang:Foreign>में</lang:Foreign> they will say no this this [filler] this thing you have not provided this certificate you didn't provide that certificate you didn't provide and then last minute they don't give anything
    354.706 - 355.480
    [filler]
    366.180 - 371.254
    or they make us [filler] life [filler] like too much [filler]
    367.355 - 367.906
    [filler]
    370.177 - 370.850
    oh
    371.906 - 376.879
    (()) we have to the yah yah that I don't want
    374.831 - 376.480
    run around from pole to pole
    378.031 - 390.180
    and mainly main big big [filler] diseases are also taken care of in this product or again we have to pay some extra for that is
    378.129 - 378.879
    yes madam
    379.805 - 380.781
    correct madam
    391.879 - 392.581
    can you
    392.156 - 401.004
    madam certain diseases like cancer you have to have add on to that at a very little cost but you have to pay extra but it will be evry less
    398.105 - 398.930
    okay
    402.206 - 405.879
    and [filler] see our settlement claim settlement ratio
    406.581 - 408.456
    is about ninety five percent
    408.382 - 409.182
    [filler]
    410.307 - 413.581
    which is very good considering that's why we are
    411.607 - 412.357
    okay
    413.982 - 414.656
    [filler]
    415.007 - 417.982
    featuring among three companies of India
    418.781 - 427.831
    and as you I say there are claims are not paid see madam there are certain [filler] formalities to be completed which are essentials
    420.932 - 421.807
    okay
    428.906 - 433.331
    without which even we also will not be able to let me be very frank with you
    429.807 - 430.557
    [filler]
    431.257 - 431.956
    [filler]
    433.932 - 438.482
    [filler] we also will not [filler] settle your claim unless we have basic
    437.406 - 438.257
    okay
    439.057 - 440.607
    minimum documents
    443.456 - 443.857
    yah
    444.307 - 445.007
    okay
    445.706 - 446.857
    (())
    445.807 - 454.206
    but see as I said it will be total cashless type of treatments so you will not have to worry about paying anything at the hospital
    447.331 - 448.007
    (())
    454.807 - 456.357
    we will take care of that
    457.706 - 466.757
    okay and one more thing I am distur~ interrupting again [filler] do we have to pay yearly or six monthly because the amount may [filler] like
    461.257 - 461.857
    yah
    467.283 - 475.932
    it will be a quiet big amount so if we pays [filler] by [filler] every six months is it okay or you need yearly payments?
    478.583 - 488.408
    madam we are flexible in that and we can accept quarterly payment also, we can have half yearly payment also, yearly payment also. We even provide (())
    487.358 - 495.307
    No half yearly is, half yearly is okay [filler] but not quarterly. Half yearly like six months after six months we can pay.
    490.783 - 491.408
    fine.
    493.783 - 494.382
    oh.
    497.658 - 498.358
    Okay madam.
    498.658 - 500.033
    Okay so,
    499.932 - 500.983
    So, what I have
    501.158 - 507.608
    [filler] will you come and explain, will you come and explain to us at home like all of us will be there
    502.757 - 503.858
    yah (()) me madam
    508.158 - 521.633
    then they if anyone of us (()) anyone wants to know something more they can ask you there because I will be asking you here on the phone they you will explain me
    510.507 - 511.156
    exact
    517.633 - 518.831
    more yah correct.
    520.331 - 520.932
    then
    522.033 - 523.932
    then I won't able to
    524.533 - 528.682
    explain same way to the my family members.
    524.682 - 525.331
    [filler]
    529.408 - 537.182
    so I request you to come home and then you explain the all the products or you can till then
    531.932 - 532.609
    (())
    537.884 - 539.634
    whatever you have in mind.
    541.484 - 553.158
    It will be my pleasure madam I was about to suggest same thing that if you can spare about half an hour and give an appointment so everybody is present that time I can very well come and explain you whole thing.
    547.581 - 548.331
    [filler]
    551.932 - 552.759
    [filler]
    554.234 - 559.182
    and any queries you or any other person has you can settle it there and there.
    556.109 - 556.884
    okay.
    560.134 - 565.158
    okay madam? So, what I would request you to send me this your home address.
    560.734 - 561.283
    [filler]
    562.706 - 563.307
    okay.
    563.884 - 564.557
    okay.
    566.984 - 572.359
    and and as I said in the beginning that names and date of births for all the family members
    569.033 - 569.884
    okay.
    572.932 - 576.359
    so I will come prepared in that [filler] sense.
    576.734 - 577.432
    okay.
    576.884 - 577.609
    okay madam?
    578.932 - 582.908
    okay okay tomorrow then you can come. okay thank
    579.033 - 580.384
    thank you madam [noise]
    581.682 - 584.432
    What time madam? Eleven o' clock is okay for you?
    585.884 - 586.759
    (())
    586.182 - 587.533
    yah that is better.
    587.634 - 589.009
    everybody will be available?
    588.307 - 592.134
    No morning morning eleven o' clock is okay. That is better.
    591.485 - 601.057
    yah tomorrow being Sunday we can, I hope everybody will be at home. Okay then madam. I'll see you tomorrow at ten o' clock. Please send me your address and date of births and names of everything. Thank you.
    595.682 - 597.134
    Yes. Oh.
    598.533 - 599.335
    Okay.
    601.888 - 602.759
    Okay.

    Dataset Details

    Card Head Line

    Language

    English

    Language code

    en-In

    Country

    India

    Accents

    Chandigarh, Chhattisgarh ...more

    Gender Distribution

    M:60, F:40

    Age Group

    18-70

    File Details

    Card Head Line

    Environment

    Silent, Noisy

    Bit Depth

    16 bit

    Format

    wav

    Sample rate

    8khz & 16khz

    Channel

    Stereo

    Audio file duration

    5-15 minutes

    Need datasets for a specific AI/ML use case?
    Don't worry, we've got you covered! 👍

    Contact Us
    Prompt 2 Bg