Preprocessing TTS Datasets for Effective Model Training