Scene description reduces the pressure of inventing a topic. The video already gives you people, objects, actions and locations.
Build from small units
nouns → actions → locations → complete sentences → short retelling
Pause a frame and name what you see. Add what each person is doing and where the action happens. Then connect several sentences into a simple description.
Use the original Chinese as feedback
After describing the scene, listen to the original sentence. Compare vocabulary, word order and useful chunks. Shadow the original, then describe the scene again in your own Chinese.
This keeps speaking connected to real language without requiring you to copy the video exactly.