After researching and beating around the bush for months, it’s finally time for me to actually do some AI based animation. But first, some more researching and beating around the bush by watching some tutorials! Luckily, it seems that the most recent version of Pika, 1.0, has especially good anime style capabilities, which work well with Midjourney generations, since Midjourney has excellent stylistic understandings in my opinion. So things are looking promising in theory.
For this test I want to try to animate my AI anime trailer I made in the last semester. The trailer featured essentially no animation but moderately good consistency in my main character, something I’m still a bit afraid of for the master’s thesis project. I want to see what the best method for Pika is to get my character moving, trying out both the entire frame and isolated character on plain backgrounds since my generated backgrounds don’t need animating.
I will be using scene 200 as a sort of playground to try all of this out, here’s what the original version of that looks like as it could be seen at the CMSI exhibition:
Video to video
Starting with maybe the most complex approach, I rendered out a clean version of the scene without subtitles or post processing and then told Pika which region to modify. And well, I think the results speak for themselves.
Prompt: Old man looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Prompt: Old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
After immediately noticing that the first character looks nothing like my generated one, I tried adding ‘In a red coat’ to the prompt, but honestly got an even worse result. Pika understands the lighting conditions well, but just about everything else is unusable; bad anatomy and faces, too drastic of a change in pose and look and uninspired, weird movement. Video to video generation also does not support motion control, leaving me with no control over the animation. Yikes. But it did leave the parts of the video outside of the specified region intact which is something i guess.
Image to video
Since that did not work in the slightest and limits my control even more, let’s move on to stills.
Entire frame
I uploaded a still of the scene with the character and the foreground and background elements visible. I also made use of the Motion control feature, setting the camera to a pan to the left with a strength of 0 to replicate the original motion.
Prompt: Old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Well at least it got the pan right, kind of – even at a motion strength of 0 (which controls both the animation within the frame as well as the camera motion) it’s still way too fast. My character stayed kind of the same this time, but only because for most of the animation, he didn’t actually move. When he did, he quickly distorted strangely, despite my negative prompts.
Subject only
Starting with a still of just the isolated character on a green background, I hoped for the best results, still more or less using the same prompts:
Prompt: Old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Ok so green doesn’t work, period. From what I can tell through the clutter, the animation actually looks somewhat decent, but this is obviously unusable, especially since the green background clutter spills into my subject. I don’t understand why Pika won’t allow the user to specify a region when uploading a reference image, I feel like that could have helped a lot here.
Prompt: Old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Progress is being made! A black background seems to work pretty well, pika perfectly understands that the black parts of the image need to stay black, making compositing easier, at least in theory. It still exhibits quite heavy flickering and the resolution is pretty bad. But the motion is quite nice! This is by far the most promising result so far.
Prompt: Old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
The white background version works great, too! Still the same flickering issue though and while I hoped that the white background would help preserve the outline of the character, the result is so blurry that that advantage is lost, I’m thinking the black background is going to be better.
Prompt: Old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Just for fun, I also tried uploading a png with actual transparency to see what I would get and the result looks REALLY COOL but not very useful for a clean approach. The artefacts around the subject are 100% created by Pika, the png had a clean alpha, I promise. But maybe this is something to bear in mind in case I want to go into a “data corruption / technology / AI is bad and evil” type of thematic direction, could be very fun.
Prompt: Old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Finally, I wanted to see what would happen if I gave Pika just the background and told it to create the old man and the camera move. Unfortunately, no character is to be seen an the pan is again, far to aggressive even at a motion strength of 0 and the video is incredibly choppy, at around 6-7 fps despite that being set to 24. The paper flying around is nice, though – maybe this could be useful for animating miscellaneous stuff in the background? But then again, that’s easily and quickly done in After Effects with minimal effort and maximum control.
Text to video
Fortunately the isolated animation looks promising, especially on the black background, but maybe I will want to generate some assets directly in Pika. Honestly, probably not because it takes away a LOT of control, but I wanted to try it out anyway by trying to give Pika a similar prompt to that I used for the original Midjourney generation and seeing if I was happy with anything.
Prompt: 80s anime still of an old man in a red coat sitting in a dimly lit archive going through files, wide shot, studio ghibli style, red and blue color grade
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Consistency with the text: 25
Camera control: pan right, strength of motion: 4
Ok so none of these are really usable. Pika ignored the ‘wide shot’ prompt most of the time, the framing is all over the place as is the motion and the setting and background in general is very very messy. From what I’ve seen in tutorials and other showcases of good anime generations by pika, the prompts are much, much simpler. I’ll try again with something less specific.
Prompt: old man in a red coat
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Consistency with the text: 15
Camera control: strength of motion: 1
Prompt: old man in a red coat
Negative Prompts: oversaturated, ugly, bad anatomy, morphing, flickering
Consistency with the text: 15
Camera control: pan right, strength of motion: 2
Well the results look acceptable, but obviously unusable for my trailer. Note the extreme differences in the two results that had the same prompts; I’m not even going to bother listing everything that changed, the only thing that remained kind of the same was the color of the character’s jacket.
These shots do show that the AI is powerful, but not very versatile – it can’t adapt well do more specific prompts like Midjourney can. Style is no issue but the results are so drastically different that any story with characters that should appear in more than one shot are out of the question.
Looks like Midjourney will stay as the tool of choice for still creation, but I’ll have to tackle the issue of character consistency there, too. I just got lucky the only reference for an anime old man in a red coat seems to be Stan Lee.
Optimisation (for now)
I want to get at least one shot done today, so I’ll try to get the best result out of the version with the black background. First of all, I want to get rid of the flickering, after that I can worry about the animation and the resolution. I’m doing this mainly by adding negative prompts and telling Pika to stay more consistent with my text input.
Prompt: an old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, distortion, inaccurate limbs, morphing, flickering, flicker, strobe effect
Consistency with the text: 25 (max)
Camera control: strength of motion: 1
Pretty good start, the flicker seems to be mostly gone, but I think that the high consistency caused the ‘oversaturated’ negative prompt to go a little crazy, giving us this washed out look. I put that prompt in there since most image to video generations apparently get oversaturated quickly, but let’s try removing that:
Prompt: an old man in a red coat looking at files, studio ghibli
Negative Prompts: ugly, bad anatomy, distortion, inaccurate limbs, morphing, flickering, flicker, strobe effect
Consistency with the text: 25 (max)
Camera control: strength of motion: 1
Well that’s weird – removing the ‘oversaturation’ negative prompt has increased the flicker drastically. Hmm, let’s add that back in and try to turn down the consistency.
Prompt: an old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, distortion, inaccurate limbs, morphing, flickering, flicker, strobe effect
Consistency with the text: 20
Camera control: strength of motion: 1
Well, it looks like the flicker is back – I think I’ll go back to a consistency of 25 since that looked the best and washed out colors are easier to fix than flickering. However, I started to notice that there is a slight zoom in with all of these generations, despite my camera controls being empty. I’ll try adding a negative prompt to prevent zooms, and if that doesn’t help I’ll have to reduce the strength of motion to 0, which I want to avoid since that will also affect my character animation.
Prompt: an old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, distortion, inaccurate limbs, morphing, flickering, flicker, strobe effect, camera zoom, camera tilt, camera pan, camera rotation
Consistency with the text: 25
Camera control: strength of motion: 1
God damn it, I don’t know what it is but my negative prompts don’t seem to represent exactly what’s happening with the generations – the flicker is gone, sure. But the zoom is still there, and what’s worse is that the animation is terrible this time. I think the amount of negative prompts may be too much for Pika, leaving it wondering which ones to consider more than others.
Prompt: an old man in a red coat looking at files, studio ghibli
Negative Prompts: oversaturated, ugly, bad anatomy, distortion, inaccurate limbs, morphing, flickering
Consistency with the text: 25
Camera control: strength of motion: 0
This time, i reduced the amount of negative prompts and the strength of motion to 0, but as suspected, this reduced the animation of my character to almost nothing 🙁
I went back to the first settings but found that my prompts didn’t matter that much, with iterations of the same exact prompts resulting in different outcomes, randomly adhering or ignoring negative prompts and specifications.
However, what did make a big difference was the seed. I tried running the same exact prompt, even with the same seed and got a much more similar result. I think that’s the method here – not worrying too much about the negative prompts and reiterating until you are happy with a certain seed, then reiterate further with that specific seed if needed.
After I chose a version I was happy with, I added four seconds to the animation, since my original shot is around 5 seconds long, making sure to include the seed for that as well.
Prompt: an old man in a red coat looking at files, studio ghibli, +4 seconds
Negative Prompts: N/A
Consistency with the text: 25
Camera control: strength of motion: 1
I did not, however notice that using the ‘add four seconds’ feature does not include your negative prompts for no apparent reason, leaving me this abomination. Gotta fix that real quick.
Prompt: an old man in a red coat looking at files, studio ghibli, +4 seconds
Negative Prompts: oversaturated, ugly, bad anatomy, distortion, inaccurate limbs, morphing, flickering
Consistency with the text: 25
Camera control: strength of motion: 1
Ok, this finally works somewhat, though it’s still quite rough. Moving on to After Effects. And the next blog post because this already took hours.