Module 5: Vision Language ActionChapter 26: Multi-Modal Interaction Speech Gesture Visioncross-modal-attentioncross-modal-attention