“Low angle static shot: A teddy bear sitting on a picnic blanket in a park, eating a slice of pizza. The teddy bear is brown and fluffy, with a red bowtie, and the pizza slice is gooey with cheese and pepperoni. The sun is setting, casting a golden glow over the scene”
“High angle static shot: A hacker in the 1980s wearing a gray hoodie, hunched over an Apple II computer in a dimly lit room with scattered cables and monitors. The screen displays lines of green code as the hacker types furiously, attempting to break into the Pentagon’s network. The room is bathed in the eerie glow of the computer screen and a small desk lamp”
“Wide-angle shot, starting with the Sasquatch at the center of the stage giving a TED talk about mushrooms, then slowly zooming in to capture its expressive face and gestures, before panning to the attentive audience.”
In the end, the fancy prompts didn’t really help. Runway Gen-3 Alpha is a psychedelic toy at the moment and can be entertaining if you can afford the credits. But it generally lacks the coherency to generate what might be called “useful video,” although your mileage may vary depending on the project. Even if the results were flawless, the ethics of using a video synthesis model trained on an unknown dataset might spawn some backlash.
What could improve Runway’s AI models? Among other things, more training data with better annotations. The AI model needs as many varied, well-labeled examples to learn from so it can do a better job of translating prompts into things a user would like to see. One of the reasons OpenAI’s GPT-4 turned heads in text synthesis is that the model finally reached a size where it was large enough to have absorbed enough information (in training data) to give the impression that it might be able to genuinely understand and model the world when, in reality, a key aspect of its success is that it “knows” far more than most humans and can impress us by combining those existing concepts in novel ways.
With enough training data and computation, the AI industry will likely reach what you might call “the illusion of understanding” with AI video synthesis eventually—but people who work in the TV and film production industries might not like it.