AI-ography: How AI Technology is Changing The Way We Create Motion Picture Imagery

by | Jul 8, 2024 | Essays | 4 comments

AI-ography: How AI Technology is Changing The Way We Create Motion Picture Imagery

JM Headshot2014Med
by James Mathers
Cinematographer and Founder of the Digital Cinema Society
(Excerpted from the July 2024 Digital Cinema Society eNewsletter)

 

Other than some focus assist features, what we commonly refer to as AI has, as of yet, not reached too deeply into the professional cinematography field, but that is about to change. It is as impressive as it is frightening, and depending on which end of the industry you may be coming from, will probably determine on which side of this spectrum you find yourself. It is also a matter of your level of optimism regarding new technology, but whether the glass is half full, or half empty, it is extremely fluid. I would like to share a few observations from my perspective as a cinematographer on some recent advancements in AI for motion imagery creation.

“World’s First AI-Powered Movie Camera”

“World’s First AI-Powered Movie Camera”…If I saw this headline on April 1st, instead of mid-June 2024, I might be suspicious that it was an April Fools prank. However, the CMR-MI, (short for Camera Model 1), debuted at an advertising convention known as the Cannes Lions International Festival of Creativity, and it’s no joke. Developed by SpecialGuestX and 1stAveMachine, it is an experimental innovation in the field of filmmaking, purportedly the first movie camera designed to integrate generative AI technology directly into the digital capture process.

Looking more like a Kodak box camera from the 1920’s than a cutting edge digital motion picture camera, it incorporates a FLIR sensor, Snapdragon CPU, and viewport to capture real-world footage and transform it using generative AI. The current prototype barely records HD at only 1368×768 resolution, is limited to 12 fps, and with processing in the cloud via a Stable Diffusion workflow, there is a minor delay. However, plans are to reduce latency and beef up the electronics for real-time processing at higher resolutions. The goal is to bring AI into the physical filmmaking process rather than doing everything at a prompt in post, allowing live-action footage to be instantly integrated as animation is being simultaneously created around it.

The CMR-M1 features professional camera accessories such as a matte box and tripod base as well as interchangeable lenses. It also has in-camera editing capabilities. This initial prototype is not for commercial purposes, but designed as part of the research process to create physical interfaces for generative AI. The system is currently equipped with five Stable Diffusion “LoRAs,” (an acronym for Low-Rank Adaptation, a technique designed to refine and optimize large AI models.) They help style images into a variety of forms, in this case, everything from colorful jungle integrations to scenes with opulent decor, tuxedos, and gold coins. The camera also has a slot to insert a “style card”, which has a chip allowing the filmmaker to create unique styles and workflows by creating models trained with their own images and personalized prompts. According to Miguel Espada, co-founder and executive creative technologist of SpecialGuestX, they have designed a camera that serves as a physical interface to AI models. More details and an impressive sample reel are available at:
https://www.specialguestx.com/project/camera/

1 Camera + 1 Crew Member + AI = Comprehensive Sporting Event Coverage

Another example of AI stepping into the Camera department is in the coverage of sports. An early commercial example is from a company called Pixellot which combines artificial intelligence, machine learning, stationary camera systems, software and cloud computing for complex coverage of live sporting events. The Pixellot Show S3 system features a 12K triple camera array in a single unit that is designed to cover an entire field at a sport venue with additional capture units available as an option for more variety of POVs. The scary part for those of us in the camera department is that no camera operators are employed. The system identifies the players, follows the action, and cuts the program via AI.

Traditional sports coverage, even for a smaller live event requires several camera operators, as well as an engineering and directing team to cover the action on the field or court. The cameras feed into a production studio or broadcast mobile unit to mix everything from picture, sound, special effects, graphics and ad commentary live as the game is occurring. It is a very expensive proposition that has only been justified up until now by covering the very high end of sporting events.

The Pixellot system proposes to provide coverage with a one-man-band type of operation. The current business model is not to try to tackle events like the Super Bowl, but the many, many games of lower leagues, niche sports, lesser known college games, or even high school matches, and little leagues. However, the capabilities of AI are accelerating at an exponential rate, and it may not be too long before these systems are starting to cover more mainstream professional matches.

You can be sure that AI will be getting some trial runs at the 2024 Olympics coming up quickly in Paris. There are so many contests happening simultaneously, and not all are popular enough to justify round the clock coverage, which makes a great use case for such technology. AI voice generators also make it possible to convert text into speech almost instantaneously in a wide variety of languages and voices. In fact, NBC has announced they’ll be using AI to replicate the voice of top sportscaster Al Michaels to complement its traditional coverage of the games. A tool, called “Your Daily Olympic Recap on Peacock,” will use AI software to create a 10-minute personalized playlist of event highlights from the previous day narrated by an AI recreation of Michaels’ voice, which will, they claim, match “his signature expertise and elocution.”

I’m thankful they didn’t have this kind of technology when I worked on the 1984 Olympics here in Los Angeles. I had the wonderful job of covering New Zealand and Mainland China. It kept me traveling from venue to venue for 28 straight days to whichever events those two countries were competitive in. It was certainly preferable to being stuck at a single venue on the same shot, which would have quickly become tiresome. It was the highest income producing job I had that year, and probably that decade. The Olympics will be returning to L.A. in four years, and I feel for the up-and-coming camera personnel whose similar job opportunities will surely be diminished by the time 2028 rolls around.

No Cameras Needed Here

Generative AI has also been quickly invading the domain of filmmaking, able to create photo realistic moving images that could put a real dent in the livelihoods of those who sell stock footage, provide aerial and drone services, as well as animators. If you have any doubts, just take a look at some of the user generated samples on the OpenAI Sora demo page: https://openai.com/index/sora/

Be sure to check out the aerial following of an SUV on a mountain road, the “drone shots” circling an ancient church on the Amalfi Coast, the panoramic of the Big Sur coastline with waves crashing on the rugged rocks, the golden retriever puppies playing in the snow, or the woolly mammoths charging toward the “camera.” These were all created by adding text prompts into Sora, OpenAI’s generative text-to-video model.

There is no need to go and shoot custom elements; Sora is able to generate complex scenes and accurate details of the subject and background. The model also understands how those things exist in the physical world, having been trained with the detail and context to create the imagery. The term “script to screen”, which used to connote a sometimes years long process involving hundreds of artists to create such content, can now be accomplished by typing prompts into an AI image generator.  The above referenced samples are from OpenAI’s Sora, but there are several competing companies constantly pushing the AI envelope including Stability AI, Pika, Runway, and Luma Labs.

You will not see too many long sequences or those with a lot of close ups of human interaction. It is still challenging for most generative video tools to maintain consistency over a longer sequence, and the models also struggle with anatomical details like hands and faces, but it is amazing what it can now do, and it is only going to improve with time. A short narrative made with Sora, called “Air Head,” could pass for real footage if it didn’t feature a man with a balloon for a face. https://www.youtube.com/watch?v=9oryIMNVtto

New AI Tools Being Used To Enhance Traditional Narrative Techniques

An example of AI working in concert with traditional filmmaking techniques is Here, an upcoming feature directed by Robert Zemeckis starring Tom Hanks and Robin Wright to be released in November. The story revolves around the events happening on a single spot of land following its inhabitants from the past and into the future. An AI technology known as “Metaphysic Live” is employed to face-swap and de-age the actors in real time as they perform instead of using additional post-production processing methods. While films such as The Irishman, and Indiana Jones and the Dial of Destiny have previously used different postproduction techniques to de-age their characters, it’s been an extremely long and expensive process. With AI-generated imagery integrated live on set the performers can test their acting choices for their various ages in the film and have that feedback loop with the director. To see the trailer for Here, visit: https://www.youtube.com/watch?v=I_id-SkGU2k

Of course, I’m only scratching the surface here, and there are so many new iterations and use cases for AI constantly being developed. If we plan to stay active and relevant in the entertainment industry, we can’t fear new technology or simply sit back and wait for it to usurp our professional roles. Instead, we need to learn to employ AI as we have when other technological advancements or disruptions have occurred in our industry. We need to transition and adapt as we have before from analog to digital tools, from standard def to HD, then 4K, and beyond, from tape to disk and now to the cloud. However, employing new tech doesn’t mean we have to abandon our filmmaking skill set. Whatever the technology, it is still all about storytelling. If we find new ways to tell those stories, we will only add to our filmmaking abilities. And there will always be a place for “organic” content creation and our trusted tools, like capturing on celluloid.

Motion picture technology is an evolutionary process and the rate of change is increasing exponentially. So buckle up and be ready to continue the journey. The Digital Cinema Society will be along for the ride, helping where we can to keep you current on new technology while honoring the tools and techniques that have been developed over more than a century of motion picture production.

Resources:

The term “AIography” has been proposed by veteran Editor and Technologist Lawrence Jordan, ACE. He has created a website dedicated to keeping up with AI technology in the motion picture industry. You can follow along at: https://aiography.beehiiv.com/

 

 

 

Another great resource is Curious Refuge founded by the husband and wife team of AI Educators, Caleb and Shelby Ward. They offer classes to master the latest AI tools, and host a collection of user generated short film samples on their Curious Refuge website: https://curiousrefuge.com/ai-film-gallery

4 Comments

  1. H Jay Margolis

    A very interesting article, Jim. And I hasten to note that our Cine Gear LA interview hit on some of these factors. (people can find that interview at DCS). I noted that in my opinion, the Infinity lenses being the only ones not based on conventional optical design may well be “AI ready.” That is, they could open up the “envelope” of one’s creativity beyond what conventional lenses can and have done. Very possibly,
    they can be considered “an insurance policy” for continued human creativity whereas the envelope of present conventional lenses has been set for quite some time. They may well be the first lenses to expand the human aspects of cinematic creativity in the age of AI. I can direct people to ts-160system.com for information on these lenses that are based on microscope principles first enunciated by EM Nelson. That is why we call their configuration Nelsonian(R).

    Reply
  2. Lawrence Jordan

    James, what a fantastic piece! You really nailed it with this article. I love how you’ve broken down the AI revolution in our industry without going all doom and gloom. The part where you said, “Whatever the technology, it is still all about storytelling” – that’s really the bottom line.

    As you know, I too am encouraging our fellow industry professionals to do their best to adapt to the changes and embrace finding new ways to tell our stories. We’ve been through tech shakeups before, AI is just the latest remix. (although this feels more like an earthquake than a shaker).

    And of course, thanks very much for the shoutout about my AIography newsletter! Greatly appreciate it. Your article is sage wisdom for anyone trying to wrap their head around what’s happening with AI in film. Keep dropping these knowledge bombs, James – we need them!

    Reply
  3. Eric Wenocur

    Yes, yes, okay, there’s upheaval (again). But we are also deep into the hype cycle. I really wish anyone writing about “AI” would make the clear distinction between machine learning, which has been around for a while whether we realize it or not, and generative.

    And we should continue to stress that GenAI does not create anything original. What it “knows” is what it is trained on, which is other people’s original work. Apart from questions of ownership, it’s all borrowed and repurposed. Very handy for some things, but also limited. And when GenAI starts learning from its own output it may converge to some kind of lowest common denominator crap.

    The immediate danger for real people with jobs is that the ones paying the bills will try to cut corners using GenAI before finding out the limitations. Hopefully an equilibrium will be reached before younger generations think that AI-generated content is all there is.

    Reply
  4. James Mathers

    Eric Wenocur
    You make some very good points; thank you for sharing your perspective and participating in DCS.

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *

Categories


Recent Posts


Archived Posts