Multimodal input
Feed Seedance 2.5 text, images, clips, and audio all at once, up to 12 inputs in a single generation. It reads every input together and treats them as one creative brief, so a reference photo, a style note, and a line of dialogue all shape the same final video. This is what lets a single model cover text-to-video, image-to-video, and native audio without ever leaving the page.