Product Image OCR Dataset with Norwegian Text

This OCR dataset consists of diverse types of images with text in the Norwegian language from different types of products. Along with product images, this dataset consists of detailed metadata as well.

Category

OCR & NER

Volume

2K+ images

Last Updated

Aug 2023

Types

Diverse types

Norwegian OCR dataset with product images

About This OTS Dataset

Card Head Line

Introducing the Norwegian Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Norwegian language.

Dataset Contain & Diversity

Containing a total of 2000 images, this Norwegian OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.

To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Norwegian text.

Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.

All these images were captured by native Norwegian people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.

Metadata

Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.

The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Norwegian text recognition models.

Update & Custom Collection

We're committed to expanding this dataset by continuously adding more images with the assistance of our native Norwegian crowd community.

If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.

Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.

License

This Image dataset, created by FutureBeeAI, is now available for commercial use.

Conclusion:

Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Norwegian language. Your journey to enhanced language understanding and processing starts here.

Use Cases

Use Cases Image

Data extraction

Use Cases Image

OCR

Use Cases Image

Text Recognition

Dataset Sample(s)

Card Head Line

Dataset Details

Card Head Line

Dataset type

Printed Text on Product

Volume

2K+ images

Media type

Image

Language

Arabic

Type

Diverse types

File Details

Card Head Line

Environment

Indoor & Outdoor

Diversity

Different lightening condition, image type & capture angle

Format

JPEG, HEIC

Device

Android & iOS

Annotation

NA

Type

Printed

Need datasets for a specific AI/ML use case?
Don't worry, we've got you covered! 👍

Contact Us
Prompt 2 Bg