An 8-year-old girl begins sketching unfamiliar buildings and street layouts with uncanny accuracy. Her mother assumes it’s just her imagination - until she draws a detailed map of a fishing village in coastal Japan, complete with landmarks that no longer exist.

The process for making this AI-Generated Movie actually mirrored the Traditional Filmmaking Pipeline far more than it did a simplified ‘push a button and generate a movie’ process.

In many ways, it still required the exact same Creative Disciplines as conventional filmmaking in terms of craft: Script Writing, Production and Set Design, Cinematography Lighting, and Camera Movement, Directing Performance, Editing, Sound Design, Visual Effects, and Overall Storytelling.

The obvious difference was that the execution of those crafts was completed through the use of Generative AI Tools rather than traditional physical production methods.

I started with a broad synopsis and story treatment, which I then broke down into what I call “Mini Movies.” These are essentially smaller narrative chapters within the larger film, each with its own emotional objective, conflict, and obstacles to overcome. By the end of each mini-movie, some new discovery or revelation occurs that naturally launches the audience into the next section of the story.

That structure helped keep the emotional momentum moving forward while allowing each sequence to feel purposeful and self contained.

As with any decent motion picture, everything began with writing a traditional screenplay.

The story itself went through multiple rounds of outlining, rewriting, emotional refinement, pacing adjustments, and scene restructuring just like any conventional film project would. The goal was always to create something emotionally authentic first, rather than simply experimenting with technology.

The writing process alone took roughly two months. During that time, I focused heavily on outlining the narrative structure, emotional arcs, pacing, themes, character development, and overall scene progression.

To overcome this, I created what essentially became Digital Actor Reference Libraries for each main character. I would repeatedly generate and refine “hero images” of the cast until their identities stabilized enough to use across multiple scenes. Then I would choose the wardrobe and hair style for each character by day and then save my ‘hero shots’ with the correct look as images. These would become my master source files and the foundation for maintaining continuity throughout the movie.

Once the story structure was locked, the next major phase was “Casting” the film. This involved generating hundreds and eventually thousands of AI-generated character images until I found the exact look, emotional presence, age range, facial structure, and personality that felt right for each character. Because AI image generation is probabilistic, consistency is by far one of the biggest challenges. The same character might subtly change age, facial proportions, hairstyle, ethnicity, or clothing from generation to generation.

Additionally, the fictional coastal village in the story was based heavily on the real Japanese town of Tomonoura, which became the visual anchor for the film. I used Google Earth to research and match specific architectural layouts, harbor curvature, temples, streets, and landmarks to create a believable sense of place. I did this for every location, and each set in the story so it was as authentic as possible. 

Once the characters were established, I had to Design the World of the Movie. This involved generating and refining environments, homes, classrooms, city streets, hospitals, temples, trains, airports, and landscapes that matched the emotional tone of each sequence. Every environment had to be explored visually before scenes could be generated.

On top of getting the locations just right, I also had to create the Child-like Drawings of the major settings featured throughout the film. These drawings became an important story device because they appear at various moments where Anna discovers them and slowly begins realizing that Lila somehow has knowledge she shouldn’t possibly possess.

This required designing simplified crayon and colored-pencil style illustrations that visually matched the real locations and environments shown later in the movie. So there was another layer of visual continuity work involved in making sure the drawings emotionally and geographically connected back to the actual places they represented.

Another aspect of this project that surprised me was how much time went into designing and maintaining consistency for all of the Props, Vehicles, and Background Elements that make the world feel believable.

In a traditional production, a Production Designer, Art Director, Set Decorator, and Props Department would be responsible for sourcing or building these elements. In an AI workflow, I found myself performing all of those same functions.

As an example, when Anna and Lila travel to Japan, I couldn’t simply ask the AI to generate “a taxi.” I needed a Japanese Taxi that looked authentic to the region, matched the time period and visual style of the film, and remained reasonably consistent across multiple scenes and camera angles.

The same challenge applied to Airplanes, Trains, Hotel Interiors, Harbor Boats, Luggage, Street Signs, School Materials, family photographs, sketchbooks, toys, furniture, and countless other objects that appear throughout the story.

One example was the Boeing 787 airliner used during Anna and Lila’s journey to Japan. I spent a surprising amount of time generating and refining both exterior and interior aircraft imagery so that the travel sequences felt grounded and believable. Similar work went into designing train stations, taxis, ferries, and other transportation elements that helped support the illusion of a real journey.

I also spent a considerable amount of time researching and recreating locations such as Hiroshima Airport and the various Train Stations featured throughout the trip. The goal wasn’t simply to generate attractive imagery, but to capture the feeling of actually being there. I found myself studying reference photos, architectural details, signage, platform layouts, waiting areas, ticket gates, and regional design cues so that the environments felt authentically Japanese.

Many of these details are subtle and may go unnoticed by the audience, but collectively they help create the sense that Anna and Lila are truly traveling through a real place rather than a generic AI-generated environment.

What I discovered is that audiences may not consciously notice these details, but they absolutely feel them. Just as in traditional filmmaking, the authenticity of the world is often built from hundreds of small decisions that work together beneath the surface.

In many ways, AI filmmaking still requires the same Attention to Detail as traditional production design. The difference is that instead of physically building, sourcing, or renting these elements, you’re designing and generating them digitally, and then constantly working to maintain continuity from one scene to the next.

PART ONE: CHARACTER + LOCATION

Once all of the characters had been created, wearing the desired wardrobe and hairstyles for each day of the story, and all of the locations and sets had been designed, it was finally time to begin “Shooting” the movie.

Using Google’s Nano Banana image generation model through the Runway platform, I uploaded both Character Reference Images and Location Reference Images for every scene. These reference materials served much like a traditional film production’s cast photos, wardrobe continuity guides, and production design references, helping the AI maintain consistency from shot to shot.

From there, I used detailed text prompts to stage each scene, placing the characters into the environment and directing their actions much like a filmmaker working with actors on a set: “Character - Day 1,” wearing “Wardrobe - Day 1,” sits on the living room couch holding an iPad inside the “Living Room Set.”

Each prompt defined not only who was in the scene, but where they were positioned, what they were doing, what props they were interacting with, and often the emotional tone of the moment. This process was repeated hundreds of times throughout the film, generating the foundational images that would later become individual shots and sequences.

PART TWO: GETTING COVERAGE

Once the still imagery had been finalized, the next phase was moving into Animation.

Essentially, every traditional coverage angle you would capture on a real film set still had to be created individually. And each image required detailed prompting that described the shot size, character position, lighting, and mood. The Prompting Process became less like coding and more like Directing a Cinematographer and Production Designer simultaneously. This process created the series of ’shots’ that would be used in each scene. And ultimately, these became my storyboards. 

In many ways, this stage felt remarkably similar to traditional filmmaking. Instead of directing actors and camera crews on a physical set, I was directing an AI system by combining visual references and detailed creative instructions.

The Goal was always the same:

To Translate the Screenplay into Compelling Visual Storytelling, One Shot at a Time.

From there, every individual sequence had to be “shot” manually. This is where the process became surprisingly similar to live-action filmmaking. For each scene, I had to generate the following series of images: Wide Establishing Shots, Medium Shots, Close-Ups, Over-the-Shoulder Angles, POV Shots, Environmental Inserts, Reaction Shots, Transitional Imagery, etc.

This process was extremely iterative because the first generated result was almost never usable. Sometimes a single usable five-second shot required dozens of generations before the performance felt emotionally authentic.

For every shot, I would typically generate: Multiple performance variations, different facial expressions, different pacing, different camera movements, alternate emotional interpretations, variations in body language and eye movement, etc,

Every shot in the movie began as a still image that then had to be converted into moving video using Runway’s Image-to-Video models.

THE MOST IMPORTANT PERFORMANCE SECRET

YOU MUST SLOW EVERYTHING DOWN.

Al tends to: rush, over-perform, over-gesture, speak too quickly, behave theatrically

So in your Text Prompting, you should constantly reinforce the following terms: restrained, grounded, naturalistic, subtle, emotionally suppressed, observational, minimal movement, quiet realism, micro-expressions, etc.

I would often begin this process by creating the ‘Listening Shots’ just to give myself some coverage options in editorial.

Dialogue Performance itself presented another major challenge. AI-generated voices are often created one line at a time, meaning there is no natural interaction between performers. Achieving believable conversation required extensive iteration to match pacing, pauses, interruptions, emotional emphasis, breathing patterns, and conversational rhythm.

In many ways, this process felt less like directing actors on a set and more like assembling a performance puzzle from hundreds of independent pieces. The goal was not simply to create individual shots, but to create the illusion that real people were listening, reacting, thinking, and feeling together within a shared emotional moment.

Ironically, the more dialogue-heavy and emotionally nuanced a scene became, the more difficult it was to create. Some of the simplest conversations in the film ultimately required the largest amount of work because human beings are remarkably good at detecting even the smallest inconsistencies in performance, timing, and emotional truth.

In fact, the complexity of generating believable multi-character scenes had a direct impact on the screenplay itself. Early versions of the story included Anna’s ex-husband, Stephen, as well as Lila’s younger brother, Ben. While these characters worked well on the page, they introduced a significant production challenge once the film moved into AI generation.

Unlike traditional filmmaking, where multiple actors can naturally perform together within the same scene, current AI systems struggle when several characters must interact simultaneously. Every additional character dramatically increases the complexity of maintaining continuity, eye-lines, emotional reactions, body positioning, and conversational flow across multiple shots.

A simple family conversation involving four people around a table could require dozens of individual generations, all of which needed to feel as though they were occurring within the same moment. Even small inconsistencies in where a character was looking, how they reacted, or their emotional state could break the illusion of a believable scene.

As production progressed, I realized that managing four interacting characters across dozens of dialogue-heavy sequences would significantly increase both the complexity and the amount of generation work required.

Ultimately, I made the creative decision to simplify the family structure by removing Stephen and Ben from the story entirely. What began as a production necessity ultimately became a storytelling advantage. Anna evolved into a single mother, and Lila became an only child, which allowed the narrative to focus more intimately on their relationship and emotional journey together.

It was a good reminder that, just as budget limitations influence traditional filmmaking, the strengths and limitations of AI tools can shape creative decisions as well. In this case, a technical constraint ultimately led to a stronger and more focused story.

This means a simple conversation between two characters might require:

  • A wide two-shot

  • An over-the-shoulder shot of Character A

  • An over-the-shoulder shot of Character B

  • Multiple close-ups

  • Reaction shots

  • Inserts and cutaways

The challenge becomes ensuring that every generated performance feels like part of the same continuous conversation. As an example:

  • If Character A smiles during one shot, Character B’s response must feel appropriate in the next.

  • Eye-lines must appear to connect correctly.

  • Emotional intensity must build naturally from shot to shot.

  • Dialogue pacing must feel consistent.

  • Character energy levels must remain coherent.

  • Body positioning and screen direction must match.

  • Lighting, wardrobe, and environmental details must remain consistent.

Often, a shot would look excellent on its own but fail once placed into the edit because the performance no longer matched the surrounding shots.

As a result, many scenes required multiple rounds of regeneration and editorial refinement. I would frequently discover that changing a single reaction shot would require reworking several surrounding shots in order to preserve the emotional continuity of the scene.

One of the most challenging aspects of the entire production involved Creating Longer Dialogue-Driven Scenes Between Characters. Most current AI video systems excel at generating short visual moments, but maintaining believable human conversation across an entire scene remains extraordinarily difficult.

In a traditional film, actors perform together, reacting to one another in real time. Their timing, eye contact, emotional responses, body language, interruptions, and subtle expressions naturally influence the flow of the scene.

With AI filmmaking, every shot is typically generated independently.

The editing work included scene construction, pacing refinement, performance selection, music editing, sound design, ambient layering, foley, visual effects, color correction, transitions, emotional timing adjustments, etc

Once all of the visual and audio elements were complete, the project moved into a more traditional post-production workflow. Everything was imported into Adobe Premiere where the movie was assembled shot by shot just like a conventional film edit. 

The score and sound design ended up becoming enormously important because AI-generated visuals alone can sometimes feel emotionally distant. Carefully layering music, environmental audio, silence, reverb, and subtle sound effects helped create emotional cohesion and immersion.

There was also a substantial amount of additional AI cleanup work throughout the process using Topaz Video AI. Upscaling footage to Higher Resolution, Enhancing Facial Consistency, Removing Visual Artifacts, Stabilizing Flickering, Refining Motion Quality, etc.

What emerges is less ‘automated filmmaking’ and more of a hybrid between: Filmmaking, Animation, Editorial Design, Prompt Engineering, Visual Effects, Creative Direction and Interactive Iteration. 

It is still a Deeply Human-Driven Process Creatively…

But the tools fundamentally change what an individual or very small team can accomplish.

In the end, what surprised me most about this process was that AI did not eliminate Filmmaking Craftsmanship. Instead, it redistributed it. The role shifted away from physically capturing reality with cameras and crews, and toward curating, directing, iterating, refining, and emotionally shaping an enormous volume of generated material into something coherent and meaningful.

WATCH A ROUGH CUT OF THE MOVIE

PW: WayBackHome!!