English to Tamil Legal Domain Parallel Corpora
Dataset consists of bilingual sentence-aligned corpora for the legal domain from English to Tamil.
Category
Parallel Corpora
Volume
50K+ corpus
Last Updated
Aug 2022
Number of participants
200+ people
Get this AI Dataset
Request Custom Collection
About This OTS Dataset
What’s Included
This bilingual parallel corpus consists of 50K+ sentence text data translated to Tamil from English with the help of more than 200 native translators in the Legal domain. These domain-specific parallel corpora have native language slang, phrases, and language-specific words, and follow the native way of talking, making the corpus more information-rich. Many of the same sentences are translated by various native translators, allowing us to compare how various groups interpret the same text.
The sentences in this comparable corpus range in length from 7 to 15 words. The data is accessible in excel format and can be converted into TMX, XML, XLIFF, or other equivalent formats.
These parallel bilingual corpora can be utilised for the research and development of bilingual lexicography and machine translation engines. Additionally, it can be used to create numerous language databases for applications like predictive keyboards, spell checkers, grammar checkers, text/speech understanding systems, text-to-speech modules, and many others that are based on NLP.
More translated sentences are constantly being added to this parallel corpus. Depending on your unique requirements, we can curate numerous parallel corpora in various languages. For synthetic custom curation, do not forget to check out the FutureBeeAI community.
The license for this parallel corpus dataset belongs to FutureBeeAI!
Use Cases
MT Engine
Language model
Predictive keyboards
Spell check
Grammar correction
Text/speech systems
Dataset Sample(s)
Samples will be available soon!
Contact us to get the samples immediately for this dataset.
Contact Us
Dataset Details
Dataset type
Corpus data
Volume
50K+ corpus
Media type
Text
Language pair
English-Tamil
File Details
Type
Bilingual
Word count
7 to 12 words/line
Format
XLSX, TMX, XML, XLIFF
Annotation
NA
Need datasets for a specific AI/ML use case? Don’t worry, we’ve got you covered! 👍
Contact Us