Qualitative Result

Text Prompt:   “a man does a push-up and then uses his arms to balance himself back to his feet”

Our Result:


Here the text prompt is unseen in the training set, and has some complexity. It can be observed that baseline method can not generate motion of “push-up” correctly. However, with Large Language Model to further explain “push-up” and “balance himself” to the model, our frameworks successfully generates the correct “push-up” motion and the subsequent motions.