Multimodal Models Integrating Speech and EHR Data