Exploring Training Datasets for Document Processing 2024