Handwritten Shopping List Image OCR Dataset with Tamil Text

This OCR dataset consists of diverse types of images of shopping lists with handwritten text in the Tamil language. Along with images, this dataset consist of detailed metadata as well.

Category

OCR & NER

Volume

2K+ images

Last Updated

Sep 2023

Number of participants

200+ people

Get this AI Dataset

Get Dataset Btn

About This OTS Dataset

About Gradiet Line

What’s Included

Introducing the Tamil Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Tamil language.

Dataset Contain & Diversity:

Containing more than 2000 images, this Tamil OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Tamil text.

The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

All these shopping lists were written and images were captured by native Tamil people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

Metadata:

In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Tamil text recognition models.

Update & Custom Collection:

We are committed to continually expanding this dataset by adding more images with the help of our native Tamil crowd community.

If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

License:

This image dataset, created by FutureBeeAI, is now available for commercial use.

Conclusion:

Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Tamil language. Your journey to improved language understanding and processing begins here.

Use Cases

Usecase Image

Data extraction

Usecase Image

OCR

Usecase Image

Text Recognition

Usecase Image

Document processing

Dataset Sample(s)

Sample Line

ATTRIBUTE

Image Type
Shopping list
Image Orientation
Landscape
Language
Tamil
Country
India
Device Details
OnePlus-EB2101

Dataset Details

Details Headline

Dataset type

Handwritten Shopping Lists

Volume

2K+ images

Media type

Image

Language

Tamil

Type

Diverse types

Image File Details

Details Headline

Environment

Indoor & Outdoor

Diversity

Different lightening...more

Format

JPEG, HEIC

Device

Android & iOS

Annotation

NA

Type

Handwritten

Download data Sample

Download a free sample of this dataset to get more clarity about this set! OR get in touch with one of our expert to get hands on experience 📨

Download Free Dataset

Download Btn
Promp Bg

Need datasets for a specific AI/ML use case? Don’t worry, we’ve got you covered! 👍

Contact Us

Arrow BtnArrow Btn Black
Promp 2 Bg