English to Gujarati Medical Domain Parallel Corpora

Dataset consists of bilingual sentence-aligned corpora for the medical domain from English to Gujarati.


Parallel Corpora


50K+ corpus

Last Updated

Aug 2022

Number of participants

200+ people

Get this AI Dataset

Get Dataset Btn

About This OTS Dataset

About Gradiet Line

What’s Included

This bilingual parallel corpus consists of 50K+ sentence text data translated to Gujarati from English with the help of more than 200 native translators in the Medical domain. These domain-specific parallel corpora have native language slang, phrases, and language-specific words, and follow the native way of talking, making the corpus more information-rich. Many of the same sentences are translated by various native translators, allowing us to compare how various groups interpret the same text.

The sentences in this comparable corpus range in length from 7 to 15 words. The data is accessible in excel format and can be converted into TMX, XML, XLIFF, or other equivalent formats.

These parallel bilingual corpora can be utilised for the research and development of bilingual lexicography and machine translation engines. Additionally, it can be used to create numerous language databases for applications like predictive keyboards, spell checkers, grammar checkers, text/speech understanding systems, text-to-speech modules, and many others that are based on NLP.

More translated sentences are constantly being added to this parallel corpus. Depending on your unique requirements, we can curate numerous parallel corpora in various languages. For synthetic custom curation, do not forget to check out the FutureBeeAI community.

The license for this parallel corpus dataset belongs to FutureBeeAI!

Use Cases

Use of parallel corpus dataset in MT Engine

MT Engine

Use of parallel corpus dataset in Language modeling

Language model

Use of parallel corpus dataset in Predictive keyboards

Predictive keyboards

Use of parallel corpora dataset in Spell checker

Spell check

Use of parallel corpus dataset in grammar correction tool

Grammar correction

Use of parallel corpus dataset in Text/speech system

Text/speech systems

Dataset Sample(s)

Sample Line


Source LanguageTarget Language
Smoking and drinking alcohol is injurious to health.ધૂમ્રપાન અને દારૂ પીવું સ્વાસ્થ્ય માટે હાનિકારક છે.
The organs of two brain dead patients were donated on the same day in Surat.સુરતમાં એક જ દિવસે બે બ્રેનડેડ દર્દીના અંગોનું દાન કરવામાં આવ્યું.
The patient underwent a heart transplant at a hospital 273 km in 90 minutes Far away from Ahmedabad .90 મિનિટમાં 273 કિ.મી. દૂર અમદાવાદની હોસ્પિટલમાં દર્દીનું હાર્ટ ટ્રાન્સપ્લાન્ટ કરાયું.
Swine flu became more deadly than Corona.કોરોના કરતાં પણ સ્વાઇન ફ્લૂ વધુ ઘાતક બન્યો.
The highest number of swine flu cases were reported this year.આ વર્ષે સ્વાઇન ફ્લૂના સૌથી વધુ કેસ નોધાયા.
Gujarat ranks second in the highest number of deaths due to swine flu.સ્વાઇન ફ્લૂથી સૌથી વધુ મૃત્યુમાં ગુજરાત બીજા સ્થાને.
Gujarat reported 1315 cases of swine flu in a month out of which 34 died.ગુજરાતમાં એક મહિનામાં સ્વાઇન ફ્લૂના ૧૩૧૫ કેસ, જેમાંથી ૩૪ નું મૃત્યુ થયું.
Alzheimer's disease, which cripples the elderly even though the body is healthy.શરીરે સ્વસ્થ હોવા છતાં વૃદ્ધોને પાંગળા બનાવી દેતી બીમારી, અલ્ઝાઈમર.
The number of people suffering from Alzheimer's in India is around 3.5 million.ભારતમાં અલ્ઝાઈમરથી ૫ીડાતા લોકોની સંખ્યા ૩૫ લાખ જેટલી છે.
More than two and a half crore people in world suffer from the Sourceoblem of amnesia.દૂનિયામાં અઢી કરોડથી પણ વધુ લોકો સ્મૃતિભ્રંશની સમસ્યા ભોગવે છે.



Dataset Details

Details Headline

Dataset type

Corpus data


50K+ corpus

Media type


Language pair


File Details

Details Headline



Word count

7 to 12 words/line





Download data Sample

Download a free sample of this dataset to get more clarity about this set! OR get in touch with one of our expert to get hands on experience 📨

Download Free Dataset

Download Btn
Promp Bg

Need datasets for a specific AI/ML use case? Don’t worry, we’ve got you covered! 👍

Contact Us

Arrow BtnArrow Btn Black
Promp 2 Bg