Aligning Text and Audio Samples in TTS Data