English to Tamil Political Domain Parallel Corpora

Dataset consists of bilingual sentence-aligned corpora for the political domain from English to Tamil.

Category

Parallel Corpora

Volume

50K+ corpus

Last Updated

Aug 2022

Number of participants

200+ people

Get this AI Dataset

Get Dataset Btn

About This OTS Dataset

About Gradiet Line

What’s Included

This bilingual parallel corpus consists of 50K+ sentence text data translated to Tamil from English with the help of more than 200 native translators in the Political domain. These domain-specific parallel corpora have native language slang, phrases, and language-specific words, and follow the native way of talking, making the corpus more information-rich. Many of the same sentences are translated by various native translators, allowing us to compare how various groups interpret the same text.



The sentences in this comparable corpus range in length from 7 to 15 words. The data is accessible in excel format and can be converted into TMX, XML, XLIFF, or other equivalent formats.



These parallel bilingual corpora can be utilised for the research and development of bilingual lexicography and machine translation engines. Additionally, it can be used to create numerous language databases for applications like predictive keyboards, spell checkers, grammar checkers, text/speech understanding systems, text-to-speech modules, and many others that are based on NLP.



More translated sentences are constantly being added to this parallel corpus. Depending on your unique requirements, we can curate numerous parallel corpora in various languages. For synthetic custom curation, do not forget to check out the FutureBeeAI community.


The license for this parallel corpus dataset belongs to FutureBeeAI!


Use Cases

Use of parallel corpus dataset in MT Engine

MT Engine

Use of parallel corpus dataset in Language modeling

Language model

Use of parallel corpus dataset in Predictive keyboards

Predictive keyboards

Use of parallel corpora dataset in Spell checker

Spell check

Use of parallel corpus dataset in grammar correction tool

Grammar correction

Use of parallel corpus dataset in Text/speech system

Text/speech systems

Dataset Sample(s)

Sample Line

SAMPLE

Source LanguageTarget Language
Bihar Chief Minister Nitish Kumar's confirmed: No more alliance with BJP forever
Today evening there is a meeting of ADMK MLAs in Chennai
Congress President Election tomorrow: 4 polling centers in Sathyamurthy Bhavan, Chennai
A.D.M.K. Golden Jubilee Anniversary: ​​Respect to MGR, Jayalalitha Statues
Public Meetings of 51st ADMK's Annual Inaugural : Edappadi will deliver keynote speech at Namakkal on 20th
Congress President Election: Mallikarjuna Kharge resigns from Rajya Sabha post
Sudden visit to the headquarters: E.P.S. Emergency discussion with A.D.M.K. Administrators
Ghulam Nabi Azad's new party is called 'Democratic Freedom Party’
3-day hiking from Chennai to Sriperumbudur from 25th to protect Constitution: K.S. Alagiri
DMK Nominations for internal party elections have started

ATTRIBUTES

target_languageTamil
source_languageEnglish
domainPolitical

Dataset Details

Details Headline

Dataset type

Corpus data

Volume

50K+ corpus

Media type

Text

Language pair

English-Tamil

File Details

Details Headline

Type

Bilingual

Word count

7 to 12 words/line

Format

XLSX, TMX, XML, XLIFF

Annotation

NA

Download data Sample

Download a free sample of this dataset to get more clarity about this set! OR get in touch with one of our expert to get hands on experience 📨

Download Free Dataset

Download Btn
Promp Bg

Need datasets for a specific AI/ML use case? Don’t worry, we’ve got you covered! 👍

Contact Us

Arrow BtnArrow Btn Black
Promp 2 Bg