Module 5: Vision Language ActionChapter 26: Multi-Modal Interaction Speech Gesture Visionconsistent-behavior-generationconsistent-behavior-generation