Pre-Production – ProVideo Coalition https://www.provideocoalition.com A Filmtools Company Sat, 03 Aug 2024 13:10:34 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.5 https://www.provideocoalition.com/wp-content/uploads/cropped-PVC_Logo_2020-32x32.jpg Pre-Production – ProVideo Coalition https://www.provideocoalition.com 32 32 What is AI good for? https://www.provideocoalition.com/what-is-ai-good-for/ https://www.provideocoalition.com/what-is-ai-good-for/#respond Mon, 10 Jun 2024 12:05:14 +0000 https://www.provideocoalition.com/?p=280326 Read More... from What is AI good for?

]]>
Over the last year or so, the world has gone AI-mad, and many people across the video industry are worried that their jobs will be replaced by a computer. Here, I’ll tell you why that — with a few exceptions — probably won’t happen. Along the way, we’ll find out what AI can do today, what its strengths are, and what it’s really not good at.

Defining several kinds of AI

Rather than listing a ton of acronyms, here I’ll focus on the uses you’re likely to find for AI today and in the near future. They are:

  • Generative AI, which creates new content based on existing content or prompts. This encompasses text output from ChatGPT, videos created from prompts by Sora, scripts written without human input, Content-Aware Fill in stills and video in Adobe apps, and artificial voiceovers that imitate humans, to name just a few examples.
  • Recognition AI, which recognizes data or patterns and then presents it somehow. This is one of the main tasks chatbots and AI-enhanced search engines perform today.
  • Automation AI, which performs actions on your behalf. Telsa’s (not really) Full Self Driving cars fall into this category, as do the promised future services that may eventually be provided by the Rabbit R1 or Humane AI pin.

There is of course some crossover between these categories, such as when you ask ChatGPT to make an image for you, or the generative component involved in preparing an English-language summary of a recognition task. Still, the categories are useful for understanding the broad capabilities and limitations of AI.

You may have also heard the term Machine Learning, a more specific term referring to how computers are trained to perform tasks. When a computer is trained to recognize what a person looks like, or how a good paragraph of text should be structured, that process uses machine learning. Since most of the new AI-based tools use machine learning, the terms are used somewhat interchangeably, but I won’t dwell on the distinction here.

Generative AI attracts the most attention, both positive and negative, so let’s start there.

Generative AI

Adobe implemented Generative AI with images a long time ago, and the most obvious related breakthrough was the Content-Aware Fill feature (also found in the Spot Healing brush) back in Photoshop CS5. The ability to simply paint over part of an image and have it filled in automatically with “something that looks right” most of the time was a big deal back then, and has changed the way images are processed. With video-friendly Content-aware fill in After Effects, and equivalent features in other plug-ins and apps, we can quickly accomplish what would have seemed like dark sorcery a decade ago. 

Fast forward to today’s Photoshop, embracing GenAI more fully, and just about anyone can create novel content simply by telling it what you want. Content-aware fill was no less magical on its debut, but it wasn’t nearly so accessible — generative AI is widespread and available from many providers. The next Premiere will include not just Adobe’s own GenAI tech, but links to third-party offerings.

The main reason GenAI has so captivated the public is that it’s approachable, it’s novel, and it enables people who aren’t skilled in creating images to create something — even if it’s not perfect. GenAI is still much better at stills than it is at video, but its weaknesses are exposed more easily when used to create a moving image. Here’s an assessment of the state of the art, and it’s not always pretty.

One massive problem is due to the way these models work: they are predictive. If you train a model by showing it thousands of pictures of dogs, and then ask for a picture of a dog, it’s going to be able to do it. But when you ask for the same dog moving at 24 frames per second, issues around persistence crop right up. The models don’t really understand how the world works, they’re just trying to imitate what they’ve seen before. The more you ask of them, the more they fail. The more you know what an image should look like, the more you see wrong with it.

What is AI good for? 1
This is not a photo of a friendly dog

GenAI outputs today are usually superficial, looking OK to the untrained eye but not to anyone paying attention. They feel undirected and empty. People in these models don’t have emotions, or a reason they’re doing what they’re doing. There’s no structure beneath the fakery, and that’s going to be a difficult or impossible problem to fix. The Sora-made diving example that plays first in this is proudly shown off, but the diver isn’t moving remotely correctly, and there are no bubbles being released. It’s a poor imitation made by an AI that doesn’t understand reality, and it doesn’t stand up to deeper scrutiny.

@openai

Saturdays are for the new Sora drops. These videos were generated by our text-to-video model, Sora, without modification. Prompt 1: a scuba diver discovers a hidden futuristic shipwreck, with cybernetic marine life and advanced alien technology Prompt 2: a man BASE jumping over tropical hawaii waters. His pet macaw flies alongside him Prompt 3: Close-up of a majestic white dragon with pearlescent, silver-edged scales, icy blue eyes, elegant ivory horns, and misty breath. Focus on detailed facial features and textured scales, set against a softly blurred background Prompt 4: a red panda and a toucan are best friends taking a stroll through santorini during the blue hour Prompt 5: a dark neon rainforest aglow with fantastical fauna and animals What would you like to see us make with Sora next? *Sora is not yet available to the public. We’re sharing our research progress early to learn from feedback and give the public a sense of what AI capabilities are on the horizon. #madewithSora #Sora #openai

♬ Hip Hop Background(814204) – Pavel

But it’s not all bad. If you focus a GenAI model on a specific task, clearly defining what to change and how to change it, a model can do really well. On the other hand, creating an entire video from a text prompt is a party trick that doesn’t scale well, because an AI doesn’t even really understand what’s going on in a single frame. We can certainly expect a host of AI-generated short film and low-quality ads, but it’s not going to replace actual long-form video.

Leaving aside the capabilities, ethics is another issue which most GenAI creators have put to one side. If you’re training your models on copyright material, without the express permission of those creators, many people already don’t want to watch that content. And while it’s certainly possible to train a model exclusively on copyright-cleared material (as Adobe have claimed) if you’re putting people out of work, you’ll still put some people offside. Unions have power. Hollywood went on strike for good reason, and there will be a line drawn in the sand somewhere.

Recognition AI

Placed firmly in the middle of what Michael Cioni’s Strada calls “Utility AI” is AI-based transcription, and tasks like this are where AI excels. I’ve used AI-based transcription to re-do captions created by live human captioners, and there’s simply no contest — the AI did a better job in far less time than a human could. A human is definitely still needed, because you can’t tell how someone spells their name just by hearing it, but I can use another AI-based recognition engine (built in on the Mac) to copy text that’s written in a video or a photo.

I would love to see more automatic AI-based classification of video content, very much like the “Tanalyze” that Strada is showing off, but most NLEs simply aren’t ready for the time-based metadata that these systems can generate. Final Cut Pro’s keyword system has been ready since 10.0, but the other major players don’t have a good UI to show this. And fair play to Strada — their system does a better job of showing exactly where all the detected keywords are than any NLE.

Color correction and grading with help from AI is possible today using Colourlab.ai, making a complex job easier. Similarly, AI-based people recognition algorithms in plug-ins like Keyper or Resolve’s Magic Mask are making new keying workflows possible. Similar tech is even in modern cameras, helping them to recognize and then auto-focus on people or animals.

Another boring job that few people are paid to do is summarizing, and AIs are great at it. Again, they don’t need to do a perfect job, because no human is going to do that job better, and certainly not in the few seconds that an AI takes. The speed at which an AI can process information and connect dots that humans simply cannot means that this isn’t going away any time soon.

Accessibility is one place in which nobody’s going to question the utility of AI. If a blind person can now hold up a camera to a scene, and ask an AI model describes it to them, that’s wonderful. If an AI model provides automatic, accurate, live captioning for a Deaf person, that’s similarly life-changing.

There are many tasks in the post-production space which Recognition AI can help with: color correction, audio cleanup, classification, clip syncing, and more. Many of these are tasks an Assistant Editor might perform, but an AI will make their job easier, allowing to do more in less time, and not replace them.

Automation AI

Siri, Alexa and Google Assistant are three examples of some form of AI acting as your assistant. They’re often useful for specific tasks, but because they were built on an older model, their capabilities are limited and they often fail to complete tasks. As the newer chat-based AIs appear to be far more capable, there’s an expectation that we’ll soon have much smarter assistants.

So far, that hasn’t happened. ChatGPT and models like it don’t connect to the internet, they don’t have access to your personal data, and they won’t do things for you. Partly, this is because Apple and Google hold the keys to your data, and they’re going to be the ones to ask your permission to connect an AI to it. But partly, it’s because this is a hard problem that hasn’t yet been solved.

What is AI good for? 2
Lots of promises, not all kept — a common thread in AI

The Rabbit R1 is a device which promised to be able to do many kinds of automation. Without a direct link to your computer, the advertised plan at launch was to use the device’s camera to take a picture of your screen or even a hand-drawn table on a piece of paper, then do something smart with that data, and output a new spreadsheet. None of that works yet. Of course, the process would work so much more smoothly if it was all performed on-device, with local access to your data, but that’s not possible.

In such a system, the potential for faster workflows is huge, but so too is the potential for an AI to make a mistake with your data, or worse, with your money. Apple and others have plans to teach an AI to navigate a UI on your behalf, so we could yet connect these dots, but this is a more difficult problem that many think it is.

Why? Because AIs make mistakes that humans generally don’t.

Imperfection is the common thread

If an AI is imperfect, and they’re all imperfect, you can only trust them on limited tasks. Trusting an AI to plan your holiday would be like trusting a junior travel agent without supervision, and yet “booking your holiday for you” is precisely the kind of example being used to sell AI assistants today. Worse, because AIs don’t learn like humans do, it’s often impossible to find out why a mistake was made, or to stop that mistake being made again. Sometimes these mistakes are trivial, but often they’re more serious.

For example, from time to time I pop into a chatbot to ask it the answer to a question I know the answer to, such as (please excuse the plug) “Who wrote Final Cut Pro Efficient Editing?”. This is a book I wrote in 2020, well before the 2022 knowledge cutoff of many of these models, and this is information easily found by Google. But ChatGPT fails almost every time, confidently presenting nonsense as truth, returning a random collection of other authors, colleagues and strangers as the author of my book. So far, so weird, and though Google’s latest Gemini 1.5 Pro does actually get this right, if you can’t be sure you’re being told the truth, there’s no point asking the question.

What is AI good for? 3
This is the right answer, but chatbots have told me that Diana Weynand, Michael Wohl, Jeff Greenberg and  others wrote my book

Generative AI for images and videos often creates nonsense, too. The examples posted online are the best of the best, picked by humans from a huge collection of mostly-bad outputs. If you’re a consumer wanting to make a cartoon image of your kid, great — you’ll be able to create a “good enough” option pretty quickly. For concept art and previz, GenAI will probably save you time. But if you need something perfect, photo real, and quite specific, you may never get there.

Worse, while AI has improved a whole lot recently, the pace of change has slowed markedly. It’s become very expensive to train AI models, and as a model grows, you need more and more examples to improve that model. Eventually, this becomes impossible, due to the expense, or simply because it’s too hard to keep teaching new concepts to the same machine learning model. We are reaching hard limits. That’s bad news for a self-driving car, bad news for a client who wants to create a TV series by asking for it, but good news for video professionals. AI can still be useful.

What’s an imperfect AI good for?

Automation AIs that handle boring, repetitive tasks are great, because not everyone can write a macro or a batch script. Generative AIs that can replace unwanted objects in shots are great; this is something we can do already, and AI would just make it faster and better. AIs that can translate text or speech into other languages improve accessibility for most humans. AIs that improve image quality can restore films that most people don’t want to watch any more, or remove glitches from damaged recordings. AIs that can categorise and classify huge data sets can make them far more accessible — and there’s way too much data out there for humans to do the job.

Happily, these things are happening already. I’m using Peakto’s latest AI update to classify hundreds of thousands of images based on their contents, making my family’s life in photos far easier to explore. A host of AI features are turning up in NLEs, including audio transcription, classification and extension, speech enhancement and soon content-aware fill in Premiere Pro. DaVinci Resolve has transcription, magic mask, noise reduction, voice isolation and audio remixing, which can isolate stems in music beds. Final Cut Pro has voice isolation, automatic color correction, noise reduction, object tracking and (soon!) super slow-mo — and of course, all these apps will evolve as they compete with one another over time. Third-party apps like TopazAI and plug-ins from many companies will fill in the gaps, using AI a little or a lot; whatever gets the job done.

Do we have to be worried?

AI will definitely be used to make low-end content, but it can’t do a good job on its own. Flashy effects don’t write stories, and the AIs that do write stories don’t write good ones. In every creative field, we will still need experts to make good content. Consistency over time is important. A clear direction is important. If you need to tell a story about a real person or a real thing for a client, that’s going to be best done by a human, because there’s a huge gulf between creating a 10-second concept and anything longer. Consumers and scummy advertisers will continue to make junk content (AKA slop) with AI, but these are not jobs we were going to get anyway. It’s a whole new low end for the market.

Disruption from progress isn’t new, and while it absolutely happens, it rarely happens in the way people fear it will. Canva templates have made it easier for anyone to create better-looking birthday party invitations, but they haven’t made it any easier to make annual reports, and designers are still employed to create them. 

However, if your job is boring, or repetitive, or you’re making content that people don’t value, then yes, AI could disrupt it. If your job is to create concept art that people won’t see in a finished product, then yes, I would expect the industry to need fewer people to do that job. Pre-viz is a perfect example of where “near enough” probably is “good enough”, but it’s also something more people might do if it was more affordable. More people will be making pre-viz, but they won’t be drawing each frame by hand.

Voice generation sits in a similar place. Real voices are better, because even good AI voices make occasional weird mistakes, but if you use them to fill in a few missing words, nobody will notice, not even Mr Beast. Again, the low end of the market might use artificial voices for entire videos, but they were never going to pay you to help them.

One area that’s at least a brief shining light for generative AI is for abstract video art like music videos. This music video is pretty interesting, but it doesn’t show techniques that would be useful in a less abstract context. Music videos are meant to explore crazy concepts like this, and if you’re trying to make art, yes, AI can help with that. AI is good at remixing, but it’s not particularly smart about how it remixes things. It will lack direction and soul, which you’ll need to add yourself. The same creator used AI as a tool to create this:

Using real shots of real people composited into AI-generated environments, the VFX breakdown shows how AI was useful in pre and post-production, largely to help match the on-set lighting to a pre-planned AI-made shot:

As an art piece, this works. It showcases what AI is good at (concepts and pre-viz) and extends the capabilities of an artist.

Conclusion

Mundane tasks will not be missed, and as they’re one of AI’s strengths, we can expect more from AIs focused on recognition and automation — as just happened at Apple’s WWDC. Some of these newly demonstrated features will probably help with post-production tasks, and while there’s room for innovation on this front, the goal is to assist us, not put us out of work. Change is certain, and some jobs will change a lot.

However, generative AIs won’t continue to develop at their current pace, and clients will not replace us completely. We will keep using AI-based tools, we will get faster at our jobs, and we will gain new capabilities, with and without AI, as it has ever been. I think it’s easy to draw false conclusions based on recent progress, and anyone making good videos should be able to keep making them, as long as people see their value. At the end of the day, AI is just another a tool for professionals to use. Wield it wisely.

AI-generated images in this article were created with Photoshop’s generative fill, which Adobe have said was trained ethically.

]]>
https://www.provideocoalition.com/what-is-ai-good-for/feed/ 0
Photogrammetry for video workflows https://www.provideocoalition.com/photogrammetry-for-video-workflows/ https://www.provideocoalition.com/photogrammetry-for-video-workflows/#respond Mon, 20 May 2024 12:00:25 +0000 https://www.provideocoalition.com/?p=279459 Read More... from Photogrammetry for video workflows

]]>
The world of 3D is a scary black box for most video professionals. We’re very used to capturing our 3D world in a 2D format, cropping off the edges, then preparing those 2D images for viewing on 2D devices. And though there’s a growing audience for 3D content, and a few ways to start working towards that future if you want to, it’s easy to use 3D objects when delivering to 2D as well.

Even better, you don’t have to understand 3D modelling to incorporate 3D elements with today’s 2D workflows, because photogrammetry — making 3D models from photos or videos of real-world objects — has recently become much, much easier. And happily, you can use the lighting and camera gear you already have to get great results. Photogrammetry doesn’t always produce perfect results, but it can be an excellent starting point, and it’s not difficult.

Here, I’ll give you a quick introduction to photogrammetry with some entirely free apps — follow along if you wish.

What kinds of objects can be scanned?

Photogrammetry is a great approach if you need a relatively realistic model of a specific object. Many common objects (such as balls used in sports) have been modelled already, and you can probably find free or inexpensive models on sites like Sketchfab already.

Photogrammetry for video workflows 6
If you need a model of a car, you’ll find plenty of free options online

But if you need a model of a specific object, and especially if that object is unique, photogrammetry may be a good solution. Some techniques work best with smaller objects, and some work best with room-size objects, or indeed whole rooms.

What’s an easy way to get started with my phone?

On iPhone, a simple way to start is with the free 3d Scanner App. Another popular cross platform option is called Polycam, and there are many others. The front-facing camera on an iPhone has a high-resolution TrueDepth scanner used by FaceID, and you can use that for scans of smaller, more detailed objects. On Pro-class iPhones, there’s also a LiDAR sensor in the rear camera array, which is more suitable for room-scale scans that don’t require as much detail. Any phone can use its rear-facing camera to gather photos, though.

Behind the scenes, Apple has provided APIs to make photogrammetry easier for a few years now, so you’ll have a pretty good experience on iOS. On Android? That’s fine — you can skip straight to the “use a real camera” section below.

How does it work?

In 3d Scanner App, it’s easiest to start by capturing a larger object like a chair or statue, hopefully one without really thin details. If you can find a suitable human volunteer who can stand still, that’ll work well. Find a large space with clear ambient lighting, and place the person or object in the middle. From the menu in the bottom right, choose LiDAR to use the rear LiDAR sensor. Press the big red record button, then walk slowly around the object, keeping the camera pointed at it. The phone’s position, camera and LiDAR sensors will all kick in, and you’ll see the object being recognized as you proceed. Overlays will indicate areas that haven’t yet been scanned, and you can keep moving around until you’ve captured everything.

Photogrammetry for video workflows 7
Here’s a quick 3D scan of the sign outside Apple’s old HQ at 1 Infinite Loop

For smaller, more detailed objects, you can instead use the front-facing TrueDepth camera, which has higher resolution, though which doesn’t reach nearly as far.

Photogrammetry for video workflows 8

One issue: since the scanning camera is facing out from the screen, you may find it challenging to see what you’re doing without appearing in the scan yourself. While scanning your own head this way is pretty easy, you might want to use Screen Mirroring to cast your iPhone’s screen to a nearby Mac to make it easier to see what’s happening.

Photogrammetry for video workflows 9
Just move in the edges of the cropping box (or sphere, or plane) to get rid of the noise around it

Either way, after the scan, you’ll need to process it, which will take a minute or so, entirely on-device. Next, you’ll need to crop off any unwanted mesh details from around the model by moving the edges of a 3D cropping box. When complete, you can preview and then export a 3D model directly from the app.

While using an iPhone app produces results quickly, it’s not great for highly detailed objects, nor for those with thin edges. If you use a “real” camera and take a little longer, you can get better results with a little more work.

How do I use a real camera to do this?

Broadly, you’ll need to take a series of still images or a short video with a camera of your choice, and then process them with an app. While photo-based capture doesn’t include position data as you move your camera around, today’s algorithms are still smart enough to figure out how to construct a model. In many cases, this can produce more detailed results, though of course it takes a little longer to capture the object in the first place.

Flat, consistent lighting is a great idea, and you’ll need to capture images from all sides of your object, potentially by walking around, shooting from high, medium and low angles. If you’re filming a video, use your NLE to edit these orbits together into a single video file. For smaller objects, you may be able to do just two passes — one with the object oriented normally, and one on its side.

Photogrammetry for video workflows 10

For better results with smaller objects, consider shooting in a white box with a motorized turntable to ensure flat, consistent lighting. Now you don’t have to move around or move your camera — just record high-quality video of your object as it spins around, then turn it on its side and repeat. Know that not every object can be scanned like this, and you’ll be sure to get some amusing failures along the way. In particular, this avocado has reflected a little too much light, so the resulting model is a little shiny.

Photogrammetry for video workflows 11
These settings will use 20 frames from each second of video, and give you a high quality USDZ

For the processing, you can use 3d Scanner App for Mac, and since it accepts videos as well as photos, the process can be pretty simple. Note that if you send it video, it won’t use every frame, but will look at up to 20 frames per second. You should use fewer frames than this if you give it real-time footage, but the maximum of 20fps makes sense if you speed up your source clips in your NLE first. Ideally, you should provide 20-200 individual frames, and if you’d prefer to take individual shots than shoot video, that’s fine.

Photogrammetry for video workflows 12
A few of the built-in tips for this process

The app has plenty of tips as to how to achieve the best results, and a few quality options to choose from too. Experiment with Object Masking to try to remove stray background elements from the resulting model; I’ve had mixed results. If it fails, you’ll need to deactivate this feature and clean up any loose mesh data in an app like Blender.

Photogrammetry for video workflows 13
In Blender, manual cleanup is best done in Edit mode, with the Lasso selection tool, and with X-Ray mode on

Other free apps like EasyPhotogrammetry are also available, and many of the free options use Apple’s free APIs. Processing will take a few minutes; a little longer than an iPhone usually takes, but the results are often better. If you want more control, there’s a world of higher-end options across all platforms.

What file formats should I be using?

This depends entirely on the apps you want to use. While the Alliance for OpenUSD (founded by Pixar, Adobe, Apple, Autodesk and Nvidia, also including Intel, Meta, Sony and others) has endorsed USDZ as the One True Format™ for 3D going forward, After Effects prefers GLTF and OBJ right now. Apple’s apps prefer USDZ, and Resolve accepts many formats. Eventually, USDZ support will be universal, but until then, you’ll have to experiment with workflows to find a straight path between a real-world object and a model in the right format for your apps. I’ve found the best path to a clean USDZ for Apple platforms is to convert from another format (like GLB) with Apple’s free Reality Converter app.

What can you use a 3D model for in video apps?

Animation apps like Motion and After Effects can import 3D models, so you can fly a camera around or through them, and incorporate them into titles or general animations.

Photogrammetry for video workflows 14
Motion makes it trivial to replicate any 3D object, scatter them in a 3D box, and then make them orbit another object

Motion and After Effects have some limits on what you can do with 3D objects, so if you need higher quality or more control, look to a dedicated 3D app. The free Blender is very capable, and can create animations or renders that can be incorporated into your video projects. Any shot you want, with any lens, any lighting, and an animated camera.

Photogrammetry for video workflows 15
This is not a photo — it’s a 3D render of a free 3D model rendered in free 3D software

On the more traditional video editing side of things, while Final Cut Pro and Premiere Pro need plug-ins to accept 3D models, DaVinci Resolve can import them directly. For maximum control and quality, you might want to work with 3D models in a dedicated animation app rather than an NLE.

What else can I do with these models?

Every Apple platform supports USDZ models, and every handheld or face-based Apple device lets you place those USDZ models in your real world using something as simple as the Files app. Augmented reality apps are of course available for other platforms too, and all this works on the Vision Pro too. Third-party apps like Simulon promise to integrate models with the real world with fantastic levels of realism.

Photogrammetry for video workflows 16
One of these avocados is not real. Lighting does give it away, but AR can deliver a very convincing effect

3D models can also be integrated with websites, where they provide a live, rotatable 3D object — and this can work on any platform. In fact, Best Buy recently announced a new app for the Apple Vision Pro that incorporates 3D models of their products, and IKEA’s iOS app lets you place their products in your own room with AR today.

Conclusion

Like 360° photos and videos, 3D models aren’t quite mainstream, but they still have definite uses. While the most common use for 360° photos and videos is virtual tours for real estate, anyone selling real-world objects could use 3D models in their webstores. Photogrammetry is of course not the only way to make 3D models, but it’s one of the most accessible for photo and video professionals, and an obvious starting point. If your clients currently engage you for video and photo services, this could be something new you can offer them, and you’ll have fun along the way.

Next? All you need to do is learn Blender, but that’s another story for another time.

]]>
https://www.provideocoalition.com/photogrammetry-for-video-workflows/feed/ 0
SSDs — beyond basic benchmarks https://www.provideocoalition.com/ssds-beyond-basic-benchmarks/ https://www.provideocoalition.com/ssds-beyond-basic-benchmarks/#comments Mon, 26 Feb 2024 15:51:54 +0000 https://www.provideocoalition.com/?p=277060 Read More... from SSDs — beyond basic benchmarks

]]>
Solid State Drives, or SSDs, have become a crucial part of many professional video workflows, but there’s more to picking an SSD than just running a quick test. I’ve written about this here before, and if you haven’t read “Why your new SSD is slow” yet, give it a quick read first. For predictability and reliability, my focus here is squarely on easy-to-find external SSDs used for video applications. Some of the points here will apply to those purchasing an internal SSD for an editing PC, but not all.

Yet, you don’t have to go with a pre-built mass-market option. If you want the absolute best speeds and know what you’re looking for, feel free to buy something nice from OWC or SanDisk Professional, or even build your own with a TB3/USB4 enclosure. But sadly, Thunderbolt is still a premium option at premium prices, and the extra speed it provides can often be wasted — only the best flash storage can sustain those speeds for a reasonable duration. Also, as much as I love the technology, an empty Thunderbolt enclosure can (in many parts of the world) cost the same as a 2TB USB-C SSD. Today, if you want small, reliable, reasonably priced, manufacturer-tested, portable storage, you’re probably looking at USB-C first, and there are limited options worth recommending. 

Here, I’ll point you in the right direction as to which uses are most demanding on an SSD, and answer the question: What would Apple do? Let’s dig in by starting with those potentially misleading “basic benchmarks”.

Basic benchmarks

The number one tool which most reviewers use to assess an SSD’s speed is, of course, the Blackmagic Disk Speed Test. On Mac, you can grab it from the Mac App Store, but for Windows or Linux, head to the Blackmagic Design support page, search “Desktop”, look for Desktop Video a little down the page, and click the button for your platform. Don’t confuse the Blackmagic Disk Speed Test with the similarly named Blackmagic RAW Speed Test app, which tests CPU and GPU power.

The main strength of this app is that it’s quick, it’s got nice big dials that spin around in a satisfying manner, and it shows absolute numbers for read and write when it’s done. Those numbers are matched up with the data rates for various video codecs and resolutions below, which is handy for anyone working with Blackmagic RAW, ProRes HQ, or HEVC. But because the test keeps repeating, a screenshot records just a moment in time. There’s no definitive result or average reported, and because numbers can fluctuate between runs, it’s easy to gather misleading results.

SSDs — beyond basic benchmarks 17
Here’s a result from the LaCie Mobile Secure 1TB SSD, connected directly to my MacBook Pro M3 Max

Still, if you’re trying to diagnose a simple problem, such as which of your USB-C cables supports fast data connections, the Blackmagic Disk Speed Test is totally fine. It’s also a good way to measure the difference between connecting an SSD directly to a computer and connecting via a Thunderbolt dock, which we’ll touch on that later. If you simply want to see how well an SSD will perform when editing video, it’s great.

However, this test only sends small amounts of data at a time (1GB to 5GB). That test cannot measure how well a particular SSD will perform at copying large amounts of data, nor does it tell you how well an SSD might perform when recording direct from a USB-equipped camera. Drives that perform well on the Blackmagic Disk Speed Test can fail horribly in those two other scenarios, and since they’re both common video production tasks, that’s a problem.

Video production is not a normal workflow

The common factor between large copying operations and camera-to-SSD recording is that most SSDs cannot deal with large write operations at full speed. Instead, most modern SSDs, especially cheaper ones, are instead optimized for general usage with lots of small read and write operations. Normal people don’t transfer hundreds of GB of footage like we do, and they’ll have a different experience.

Why? Internally, most SSDs have a larger, slower main storage area, and a smaller, much faster cache area. This cache size varies widely, but at the low end it could deliver as little as 1-2 minutes of writing at full speed. On some SSDs, the size of this cache is fixed. On others, some of the slower storage is written to in a faster, but less efficient way, meaning that as the drive fills up, the cache becomes smaller. 

Either way, cache size sets a limit to how much data can be written at full speed, before the drive starts writing directly to the slower main flash storage. The drive then needs time to recover, moving all the data from its fast cache to the slower main storage, before it can take more writes at full speed.

What does all that mean in practice? A copy that’s expected to take 15 minutes could slow down halfway through, and end up taking much longer — not what you need on set with a client over your shoulder. I’ve beaten this drum before, but it’s really important to check more detailed benchmarks like the Sustained Sequential Write tests run by Tom’s Hardware and the Disk-to-Disk Write Bandwidth test run by Anandtech.

SSDs — beyond basic benchmarks 18
Ideally, you want a consistent horizontal line, not one that dips down a long way

One clear example is with the Samsung X5, an SSD with a Thunderbolt interface. While it starts out writing at 1.75GB/s, it can only sustain that write speed for 25 seconds before it slows down to 1.25GB/s. That’s still fast, but well short of the potential of a Thunderbolt interface. Still, many other drives fare far worse on sustained write tests.

Look beyond the basic numbers

Plenty of other things can go wrong when connecting an SSD, and you should be wary of accepting quoted speeds as reality. Every manufacturer quotes the maximum speed of the connection rather than actual real-world throughput speed, and this has been an issue for the entire history of USB devices. But different sizes of the same drive can exhibit entirely different performance characteristics. For example, check out how the 1TB, 2TB and 4TB versions of the Crucial X10 Pro can perform on large writes that include many small files among the larger ones:

SSDs — beyond basic benchmarks 19
Editing creates many small cache files, so transferring an existing job can be slower than the initial large-files-only copy — and speeds vary

While an SSD’s true capabilities are important, it’s also important to realize the limits of your computer. Thunderbolt 4 can handle up to 40Gbps connections (potentially ~4GB/s after accounting for overheads) while USB 3.2 Gen 2 can handle only 10Gbps (~1GB/s). Many PCs don’t have Thunderbolt or USB4, and so there’s not much point in most PC users buying an external Thunderbolt SSD. Conversely, Macs and many PCs don’t support USB 3.2 Gen2x2, which supports 20Gbps connections (~2GB/s) so those drives are largely a waste for many of us. 

Note that some older SSDs can be faster when connected to a Thunderbolt dock rather than connected directly to an Apple Silicon Mac. While I can verify this today with an older Samsung T5 SSD, which shows about 14% higher write speeds and 30% higher read speeds when connected through a Thunderbolt dock, other drives tested show very similar speeds either way. Your mileage may vary, but it looks like Samsung have fixed their older drives’ connection problems on more recent models. Huzzah!

As an example of how tricky assessing drives can be, the LaCie Rugged Mini SSD is one of the fastest USB 3.2 2×2 drives available, until you write 600GB of data in a single operation, and performance falters. You can’t spot this by running Blackmagic Disk Speed Test.

When speed doesn’t matter

To be really clear, an SSD is often not the slowest link in the chain, and therefore not the factor slowing you down. If you’re copying from an SD card, you will be limited to its top read speed, likely much slower than an SSD’s write speed. Speed does count when copying from one fast SSD to another, but you’re not always under time pressure.

Also, while an SSD’s top speed will be tested during the initial write, when it’s used for editing, it just needs to keep up with the data rate of your media. Sure, if you’re working with multiple streams of 8K ProRes, read speed can still be tested, but it’s that initial sustained write where problems are more likely.

When speed really matters

An in-field backup between two SSDs that takes longer than expected can cause delays, and is less than ideal. While on a shoot, I’ve seen a colleague’s Samsung T7 slow down noticeably while halfway through a large copy operation, and it’s not a drive I’d want to use in the field. In my experience, the T7 Shield is good, and the older T5 has a slower top speed but is at least consistent. Still, the base T7 just suffers a slowdown, not data loss.

However, if you use a sub-standard SSD to record directly from a camera, you’re asking for trouble. Although the data rate of most video streams may not seem large enough to cause problems, it’s not the maximum write speed of the SSD that’s the issue, but the minimum write speed, and that’s not a number that’s usually quoted. If an SSD can’t handle the data being thrown down the cable, at any point during a recording operation, frames will be lost, or potentially even an entire clip. Both are (obviously) disastrous.

SSDs to consider for video transfer, storage and editing

There are physical form factor considerations here around personal taste and needs. Are you taking a drive on a walk in the wilderness, or will it live on a DIT (Digital Imaging Technician) cart? Do you need drives in different colors for easy identification? Do you want all SSDs to be the same brand, or to split your loyalties in case there’s a problem in production? Reliability is usually good with most SSDs, but a bad batch of SanDisk Extreme and Extreme Pro SSDs (drives failing) has recently given them a bad reputation. I’ve personally had no problems with my SanDisk Extreme 4TB (bought well before the problems were reported) so I can’t comment if that reputation is deserved or not.

While we haven’t personally tested all these drives, either Tom’s Hardware or Anandtech have, and many drives indeed can accept large file transfers without slowing down too drastically. While this is absolutely not an exhaustive list, you can follow the links on each drive listed to a page where a sustained write graph featuring that drive is found (not necessarily a review of that drive):

Note that some of these drives won’t see their maximum performance on Macs or on many PCs, as they use a rare USB 3.2 Gen 2 2×2 (20Gbps) interface. These can still connect at 10Gbps and top out around 1GB/s, though.

Here are a few drives that can significantly slow down during a large write operation (follow the links to the graphs):

These drives may well get the job done, and they might be fine for editing, but don’t expect consistent speed during a long copy. Also remember: because cache sizes often vary depending on how full the drive is, your mileage can vary a lot. For example, the Sabrent Rocket STRM-Q can write at 1.8GB/s for the first 2TB of data — but only if you write to an empty 8TB SSD.

SSDs to consider for direct recording

Finding out which drives are suitable for large, reliable write operations can be a tricky business, and you’ll have to look harder to find solid answers. It’s striking to discover that a data rate of 1.1Gbps can’t be reliably, consistently written to many drives which can connect at 10Gbps or more — but that’s how it goes. In the world of SSDs, we video people are outliers with extreme needs.

Helpfully, some camera manufacturers will provide a list of SSDs they’ve tested, but these tests are not exhaustive, nor likely to be frequently updated. Here’s Panasonic’s list of drives tested with the GH6 while recording ProRes HQ, which includes only the Samsung T5, Samsung T7 Shield, SanDisk Extreme and SanDisk Extreme Pro. None of these drives are 4TB, but there’s no indication on if the 4TB variants are unsuitable, or simply untested.

Sigma recommend the Samsung T5, SanDisk Extreme Pro, Delkin Juggler cinema SSD, and the the LaCie Rugged SSD with their fp camera.

Blackmagic recommend many SSDs, CF cards and more with their cameras, but here are their recommendations for the HyperDeck Extreme 8K, URSA Mini Pro 12K and the URSA Broadcast G2. 

If you’re writing an Atomos recorder, you’ll be writing to a bare SSD rather than an external SSD, but they’ve done a lot of testing for you. Head here and pick your product at the top. 

Today, the most popular camera that now supports SSD recording is one that you may already have in your pocket — the iPhone 15 Pro or Pro Max. While Apple don’t publish a list of recommend SSDs, the only SSD Apple showed as an iPhone-friendly solution at the FCP Creative Summit last year was the LaCie Mobile SSD Secure, and the unique shape is enough to identify it as the same SSD Apple used to record their keynote. You can see this drive throughout their behind-the-scenes video, but at 1:10 there’s a stack of them on a DIT cart:

SSDs — beyond basic benchmarks 20
What Would Apple Do? Check out their BTS.

LaCie were kind enough to send me a model of this drive for testing, and I’ve had good experiences so far. It’s light, it has a small LED to indicate activity, and in a real-world test, it successfully recorded continuous 4K 25fps footage in ProRes HQ from my iPhone, for over 80 minutes, without dropping a frame. The data rate for the file was 686Mbps (just under the expected 737Mbps) but evidently the drive never dipped below that. To find out if you’ve dropped frames, check the Video Info in QuickTime Player after recording, and if it’s lower than you were expecting, dropped frames will be why.

SSDs — beyond basic benchmarks 21
Mounting an iPhone on a magnetic mount above a more traditional phone cage can help with  comfort while hand-holding, and if you want to test 60p capture, capture real life or a “live” TV show

To really stress this drive out, I then backed up the data twice, erased the existing clips, then plugged it back in and started recording ProRes HQ 4K 60p footage. Unfortunately it was a hot day here in Australia, and though the drive didn’t fail, the iPhone overheated; after 45 minutes, a couple of frames were dropped, then recording stopped altogether within a few seconds.

SSDs — beyond basic benchmarks 22
As a frame rate, 59.79 is a bit less than 59.94, and indicates dropped frames

A day later, I tried again with air conditioning, and this time I could fill the hard drive with 4K 60p, without dropping frames, for over an hour and a half. (The data rate of 1.31Gbps was, again, lower than expected as it was tripod-mounted.)

SSDs — beyond basic benchmarks 23
This time, 1TB of data and no dropped frames, thanks to the wonder of air conditioning

While I’m going to use my iPhone only as a mobile, floating B-cam, if you want to take iPhone videography further and use one as your A-cam, you may want to rig it up with a hub for power and audio, which I covered here in a Q&A last October. Because not all hubs are equal, here’s a useful video featuring a hub that does it all.

Also, remember that if you use the Blackmagic Camera app (and you probably should) you can choose to use a different codec than ProRes HQ. Here’s an excellent comparison of the compromises that come with using HEVC or a less hungry flavor of ProRes:

Conclusion

SSDs have changed quite a bit over the last few years, but if you make videos professionally, your needs are not the same as a gamer, a regular home user, or a hobbyist filmmaker. Reliability, consistency and predictability matter.

So, if you dump large amounts of footage onto your drives regularly, and you might sometimes perform that copy under some kind of time pressure, be sure to buy a drive that can always keep up with those needs. And if you want to record direct to SSD with any kind of reliability, be sure to buy a drive that’s fit for purpose.

If you have any experiences, good or bad, with any specific SSDs, feel free to share them in the comments, and if you’re a manufacturer, we’d love to test whatever you’d like to send our way. Happy writing!

]]>
https://www.provideocoalition.com/ssds-beyond-basic-benchmarks/feed/ 4
OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! https://www.provideocoalition.com/ooph-get-that-carbon-footprint-off-my-neck/ https://www.provideocoalition.com/ooph-get-that-carbon-footprint-off-my-neck/#comments Fri, 12 Jan 2024 13:55:20 +0000 https://www.provideocoalition.com/?p=275469 Read More... from OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK!

]]>
 

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 35

All of us progressive, mindful, climate-change advocates in Film/TV careers know that our industry is one the most energy intensive in the world. But you might not know exactly how much energy we gobble up every day in the production of a single hour of TV content. According to the British Academy of Film and Television Arts (BAFTA), around 28,660 pounds of CO2 are emitted in the production of a single hour of TV content. A blockbuster film with a $70m budget produces 2,840 tons of CO2 and it’s the equivalent of 11 times a trip from the moon to the earth. (it takes 3,700 acres of forest to absorb the equivalent in a year). Often filmed across several countries, 51% of these vast CO2 emissions are transport-related.

Think about that for a second.

If your one second is up and you have finished picking yourself up off the floor, be thankful that there are activists among of us who figured this out a few years ago and there are some great strides taking place in the international filmmaking world. Of course, much more needs to be done, but this is a start of consciousness-raising and resources for some of the amazing and impactful sustainability initiatives and organizations happening right now. 

By following sustainable practices, filmmakers can reduce their environmental impact, save money, and create a better world for future generations. One of the most significant ways to make film production more sustainable is to reduce energy consumption.

It’s happening, baby!

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 36

Accelerated by the pandemic, virtual production technologies are reducing and sometimes eliminating travel associated with film and TV productions. Virtual production studios and sound stages, powered by virtual reality (VR) and augmented reality (AR), use LED walls and green screens to transform  many locations into a given set, which replaces sets and physical props of all shapes and sizes to help reduce material waste. I recently directed on a Volume Wall and any concerns I had about jobs being replaced evaporated when I saw that all departments were fully engaged and we even had the addition of the Pixomondo ‘Brain Bar’ – approximately 12 computer programmers hard at work on a daily basis.

The Netflix smash hit Bridgerton was able to reduce emissions on Season 2 in part by piloting a GeoPura emission-free, green hydrogen power unit in place of multiple diesel generators. Power from the unit was enough to replace several diesel generators, while by-products included clean drinking water and silence. Silence on a film set? Whodathunk it?

All this is great news for eagle-eyed Production Managers, always anxious to save money and deliver on budget.

The blockbuster ‘The Amazing Spider-Man 2’ reportedly saved $400,000 by implenting  a recycling program started by the wardrobe department who sourced textiles from farmers’ markets every week and over 49 tons of materials were recovered from its sets for donation or reuse on future films. All that left over food we always have?  5,862 meals were donated to local shelters. From biodegradable snow and water-based smoke to eliminating plastic water bottles and using eco-friendly detergents on set, the filmmakers achieved a 52% waste diversion rate from landfills to become the most eco-friendly and financially successful film in the history of Sony Pictures. Kudos!!

In Canada, the CBC has hired a sustainability coordinator to oversee sustainable practices in filmmaking. Using the Albert Calculator (more info below) on all CBC shows is now a requirement.

The national funding source, Telefilm Canada, has updated its production budget model through the lense of sustainability and EDI (equity, diversion and inclusion). The updated budget model stems from consultations with various industry stakeholders, which have resulted in changes to the names of budget item categories, the addition of jobs, and the insertion of lines that serve as reminders for more sustainable and inclusive film shoots.

And there’s more…

Makeovers, property and consumer programming provide a good platform for spreading the word. The BBC’s Shop Well for the Planet ; Channel 4’s Grand Designs, abd  Fremantle’s Naked have been praised for highlighting the use of recycled materials. Eco-friendly behaviour is increasingly being knitted into popular programming. Even Jeremy Clarkson’s softening views about climate change have been shown during Amazon hit Clarkson’s Farm, from Expectation. As The Handmaid’s Tale screenwriter Dorothy Fortenberry has said, “If you’re making a story about anything, in any place, and you don’t have climate change in it, that’s a science-fiction story.”

Drama production has some of the biggest production challenges around emissions. At Carnival Films, head of production Charlotte Ashby has been a leading force behind the development of a Green is Universal production toolkit, which is used across the Universal International Studios group. The toolkit, which includes guidelines, start-up slides and templates, is aligned with the Albert Calculator and shared with the industry through Creative England. The approach helped the team to achieve Albert certification for both season 5 of The Last Kingdom and the new BBC drama Dodger.

What can I do?

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 37

Measure Twice, Cut Once

This old saying definitely applies to achieving sustainability in our industry. If you don’t understand exactly how much energy you’re consuming in the multiple layers and departments in the filmmaking process, how the heck can you figure out what to do about it? From research through pre-production, production, and post, it’s necessary to figure out a way to measure the carbon footprint of any production in order to come up with a plan to lower impact and increase sustainability. 

There are several measuring systems out there now but Albert’s Carbon Calculator, originally developed by the BBC in 2011, has been critical for the TV industry. The next step is for all productions to be Albert-certified. “We’re there to hold the industry to account on science-based targets,” says Katy Tallon, Albert industry sustainability manager, “the certification is a good thing to strive for on the route to net zero.” In 2020, while certified production numbers were up to nearly 500, this was just one-quarter of the number of shows using the calculator.

For more information on Albert:

https://wearealbert.org/production-handbook/production-tools/

In Canada, the National Climate Action Sustainability Committee (NCASC) created a website called DGC Green. DGC Green has produced one of the first Emissions and Generator Usage Calculators to reduce the use of diesel and encourage alternative choices of green energy. The Calculator calculates Fuel Used, Fuel Costs and Emissions. 

More information:

dgcgreen.ca

In Europe, the Green Film pre-assessment tool provides a simulation of the score an audiovisual project can achieve on the Green Film checklist and allows a production to predict the economic and environmental costs or benefits.

https://www.green.film/

Energy Use Begone!

One of the best ways we can make film production more sustainable is to reduce energy consumption. Start by significantly reducing energy use lighting to energy-efficient equipment and practices. Using LED lights, for example, can reduce energy use by up to 90% compared to traditional incandescent lighting. You can also make use of natural lighting as much as possible, reducing the need for artificial lighting. For example, shooting scenes during the day or in locations with large windows can reduce the need for artificial lighting. Where appropriate, make the most of our natural assets.

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 38 OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 39

Here’s a video on sustainable practices by Production Manager Mary Anne Waterhouse with the Director’s Guild of Ontario’s Sustainability Committee:

Make Great Choices

The call has gone out and the materials industry is responding to the need to use sustainable materials for set design, costumes, and props.  Plastic and synthetic fabrics are not biodegradable and can have a significant impact on the environment. So, if props are needed, try using sustainable materials, such as bamboo, recycled materials, or natural fabrics like cotton or wool.

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 40

Looking for ECO FRIENDLY WALLPAPER? At Spoonflower Inc., all of their wallpapers are made with water-based pigment inks and dyes. Spoonflower offers three different types of eco-friendly wallpaper. Both the pre-pasted and peel and stick options are formaldehyde-, phthalate- and PVC-free and do not contain flame retardants. https://www.spoonflower.com/

Here’s a great list of vendors in Canada and the U.S. who provide alternative sustainable services to the TV/Film industry:

https://dgcgreen.ca/resources/green-vendors-suppliers/

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 41

In the U.S., The Good Plastic Company is just one example of a company providing products that are good candidates to replace Sintra, which are used prolifically for sign manufacturing.

The Art Directors Guild in the US also has a list of sustainable vendors, which you can obtain by reaching out to them: https://adg.org/

Reduce, reduce, reduce!

Think about all the waste film production generates:  plastic water bottles, food packaging, and materials used for set construction, to name a few. Imagine what would happen if you stop using disposable items like plastic cups and plates and instead use glass and ceramicware or biodegradable or compostable alternatives from companies like Reusables.com. The results are huge. 

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 42

Film sets go through a ridiculously high amount of water for personal consumption. Though you can’t ask people not to drink water while filming, providing them with reusable or recyclable water bottles can make a significant impact. 

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 43

A new company that is providing a unique recycle/reuse service in the industry is Urbanjacks. Not only do they remove wood products from movie sets (so it doesn’t get burned, buried, or fill up landfills), they then treat and repurpose the wood into stronger, straighter, more sustainable lumber than virgin that can be used over and over again on your next project. 

Their wood is perfect for a wide range of non-structural projects, from crafting exquisite sets and cabinets to designing intricate wall paneling, DIY crafts, and so on. The possibilities are endless and remember, practical sets and props are still part of the Volume Wall experience.

For more information: www.theurbanjacks.com

Construction waste is a huge generator of the carbon footprint produced by Art Department and Construction crews. Ontario Green Screen has developed a list of resources for material sharing on the website

https://digitallibrary.ontariocreates.ca/DigitalLibrary/OGS/LinksandResources.aspx

Habitat for Humanity is happy to offload your construction materials https://www.habitat.org/ and several brands of recycled paint are now available at Home Depot and other suppliers.

Brilliant: DGC Ontario  Sustainability Committee is working on providing material exchange for film students who require materials for making their films.

https://thewiderlens.ca/series/members-making-waves/members-making-waves-stacy-morris/

Location, Location, Location!

Where you shoot can impact sustainability in several ways. For example, shooting on location in a city can mean less need for transportation, reducing emissions from vehicles. Conversely, shooting in a remote location could have a significant carbon footprint due to the need to transport crew, equipment, and supplies. We also suggest where possible limiting the number of locations, so that there is less of a need for transport. Shooting in a controlled environment like a sound stage studio can also increase sustainability. Do some cost comparisons between location and studio shooting. You might be surprised, depending on the size and scope of your needs. 

DGC Ontario Sustainability Committee has supported Ontario Green Screen to create a Grid tie-in Map similar to British Columbia. Knowing where Ontario locations can tie in to existing energy sources reduces the carbon footprint. The Map is currently being upgraded to provide more accurate data.

https://digitallibrary.ontariocreates.ca/DigitalLibrary/OGS/AboutUs.aspx

OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 44 OOPH! GET THAT CARBON FOOTPRINT OFF MY NECK! 45

Fly? Train? Drive? Bicycle? Walk?

Film productions require a lot of transportation, from moving equipment to transporting actors and crew. By using sustainable transportation, like electric or hybrid vehicles, and traveling by train, we can reduce emissions and save on fuel costs. Carpooling is another way to reduce emissions and save money. In some cities pre-production teams can bicycle to work or use public transit.

Education is the Key

In order to keep making strides forward on this important issue, It’s essential to educate the cast and crew about sustainable practices and encourage them to participate in sustainability efforts. 

By following sustainable practices, filmmakers, whether you are part of a social enterprise or not, can reduce your environmental impact, save money, and create a better world for future generations. Heck, I think there should be awards given for the most sustainable departments or productions (we do like our award shows ;))

There’s lots more going on out there than I could cover in this article. Here are links to more information:

https://www.linkedin.com/pulse/sustainability-film-production-inside-job-productions/

https://www.researchgate.net/publication/353652694_Sustainability_and_eco-friendly_movement_in_movie_production

Thanks to:

Astra Burka, Andrew Gainor, and the DGC Canada/Ontario Sustainability Committee, and Dawn Snyder at the Art Director’s Guild for their invaluable help and research. Both organizations are constantly updating their lists of sustainable vendors and new ideas. 

Finally, I leave you with a link to a great video about a great straw experiment that Director/Host Auri Jackson created at Buzz Feed. It’s amazing how something as small as replacing plastic straw usage with more sustainable practices can have an amazing impact:

This doesn’t have to be painful. It’s important to keep the ball moving forward so stay light, informed, and proactive so we can keep growing our industry and keep our jobs in sustainable ways forever. 

]]>
https://www.provideocoalition.com/ooph-get-that-carbon-footprint-off-my-neck/feed/ 8
AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more… https://www.provideocoalition.com/ai-tools-part-4/ https://www.provideocoalition.com/ai-tools-part-4/#comments Mon, 28 Aug 2023 15:40:20 +0000 https://www.provideocoalition.com/?p=269852 Read More... from AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more…

]]>
Here we are at the end of August 2023. So much is happening so fast in Generative AI development, that I obviously can’t keep up with everything, but I will do my best to give you the latest updates on the most impressive AI Tools & techniques I’ve had a chance to take a good look at. Starting with this fun example made with ElevenLabs AI TTS and a HeyGen AI animated avatar template:

While the animated avatar is “fun” it obviously still has a ways to go to serve any real application for video or films purposes, but see the first segment below and get a peek into the near future of just how realistic they’re going to be!

But let’s focus on that cloned voice recording for a minute! I’ll go into more detail about cloning in the AI Generative Audio segment in this article, but just to show you that this was completely AI generated, here’s a screenshot of that script:

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 54

Exciting times for independent media creators, trainers and marketers to be empowered with these tools to generate compelling multimedia content!

I’m not going to add a lot of hypothetical fluff and opinion this time – you can get plenty of that on social media and continue the ethics debates, etc. on your own (including who’s on strike now and how AI is going to take away jobs, etc.). I’m just here to show you some cool AI Tools and their capabilities for prosumer/content creators.

So in case this is your first time reading one of my AI Tools articles, go back in time and read these as well and see just how far we’ve come in only 8 months this year!

AI Tools Part 1: Why We Need Them

AI Tools Part 2: A Deeper Dive

AI Tools Part 3: The Current State of Generative AI Tools

And ALWAYS keep an eye on the UPDATED AI TOOLS: THE LIST YOU NEED NOW!

AI Tools Categories:

Generative AI Text to Video

In this segment we’re going to look at one of the leading AI tools that can produce video or animated content simply from text. Yes, there are several who claim to be the best (just ask them) but from what I’ve seen to date, one stands out above them all. And the best part is if you have some editing/mograph production skills, then you can do much more than just use a standard templates, as I outline below. And it’s only going to get much better very soon.


HeyGen (formerly Movio.ai)
https://app.heygen.com/home

I’ve seen several AI text to video generators over the past several months and frankly, most of them seem rather silly, robotic and downright creepy – straight out of the Uncanny Valley. And granted, some of the basic default avatars and templates in HeyGen can take on that same look and feel as well. For now, that is.

I tried a couple of their templates to test the capabilities and results, and what I found might be okay for some generic marketing and messaging on one’s customer facing website if they’re a small business or service. And that definitely serves the needs of some businesses on a budget. But I found their TTS engine wasn’t all that great and the robotic effect of the avatar is only more distracting by the robotic voice. I then ran the same script through ElevenLabs AI to produce a much clearer and human-like voice and applied it to the template as well and the results are more acceptable.

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 55

In this following example, you’ll hear the HeyGen voice on the first pass, then the ElevenLabs synthesized voice on the second pass.

Of course, that was the same process I used in my opening example video at the top of the article, but I used my own cloned voice in ElevenLabs to produce the audio.

To take this production process a step further, I created a short demo for our marketing team at my day gig (a biotech company) and included one step at the beginning – using ChatGPT to give me a 30-second VO script on the topic of Genome Sequencing written for an academic audience. I then ran through the process rendering the avatar on a green background and then editing with text/graphics in Premiere.

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 56

The result was encouraging for potential use on social media to get quick informative info out to potential customers and educators. The total time from concept to completion was just a few hours – but could be much faster if you used templates in Premiere for commonly-used graphics, music, etc.

But just wait until you see what’s in development now…

The next wave of AI generated short videos is going to be much harder to tell they aren’t real. Check out this teaser from the co-founder and CEO of HeyGen, Joshua Xu, as he demos this technology in this quick video:

So yes, I’ve applied for the beta and will be sharing my results in another article soon! In the meantime, I was sent this auto-generated personalized welcome video the HeyGen team – which serves as yet another great use for these videos. This is starting to get really interesting!

Stay tuned…

Generative AI Outpainting

Since Adobe launched their Photoshop Beta with Firefly AI, there have been a lot of cool experimental projects shared through social media – and some tips are pretty useful, as I’ve shown in previous AI Tools articles – but we’ll look at one particularly useful workflow here. In addition, Midjourney AI has been adding new features to its v5.2 such as panning, zoom out and now vary (region).

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 57
Expanding the scene from Lawrence of Arabia with Generative AI

Adobe Photoshop

Adobe Photoshop Beta’s AI Generative Fill allows you to extend backgrounds for upscaling, reformatting and set extensions in production for locked-off shots. (You can also use it for pans and dolly shots if you can motion track the “frame” to the footage, but avoid anything with motion out at the edges or parallax in the shot.

For my examples here, I simply grabbed a single frame from a piece of footage and open it up in Photoshop Beta, then set the Canvas size larger than the original. I then made a selection just inside of the live image area and inverted it to select a Generative Fill to paint out the outside areas.

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 58

Photoshop Beta typically produces three variations of the AI generated content to choose from, so make sure you look at each one to be find the one you like the best. In this example, I liked the hill the best of the choices offered but want to get ride of the two trees as they’re distracting from the original shot. That’s easy with Generative Fill “Inpainting”.

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 59

Just like selecting areas around an image frame to Outpaint extensions, you can select objects or areas within the newly-generated image to remove them as well. Just select with a marquee tool and Generate.

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 60

The resulting image looks complete. Simply save the generated “frame” as a PNG file and use it in your editor to expand the shot.

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 61

I really wish I had this technology the past couple years while working on a feature film that I needed to create extensions for parts of certain shots. I can only imagine it’s going to get better in time – and especially when they can produce Generative Fill in After Effects.

Here are a few examples of famous shots from various movies you may recognize, showing the results of generated expansion in 2K.
(For demonstration purposes only. All rights to the film examples are property of the studios that hold rights to them.)


Midjourney (Zoom-out)

In case you missed my full article on this new feature from Midjourney AI , go check it out now and read more on how I created this animation using 20 layers of rendered “outpainted” Zoom-out frames, animated in After Effects.

There are also two new features added since for pan/extend up/down/left/right and Vary (Region) for doing inpainting of rendered elements. I’ll cover those in more detail in an upcoming article.

 

Generative AI 3D

You’re already familiar with traditional 3D modeling, animation and rendering – and we have a great tool to share for that. But we’re also talking about NeRFs here.

What is a NeRF and how does it work?

neural radiance field (NeRF) is a fully-connected neural network that can generate novel views of complex 3D scenes, based on a partial set of 2D images. It is trained to use a rendering loss to reproduce input views of a scene. It works by taking input images representing a scene and interpolating between them to render one complete scene. NeRF is a highly effective way to generate images for synthetic data. (Excerpted from the Datagen website)

A NeRF network is trained to map directly from viewing direction and spatial location (5D input) to opacity and color (4D output), using volume rendering to render new views. NeRF is a computationally-intensive algorithm, and processing of complex scenes can take hours or days. However, new algorithms are available that dramatically improve performance.

This is a great video that explains it and shows some good examples of how this tech is progressing:


Luma AI
https://lumalabs.ai/

Luma AI uses your camera to capture imagery from 360 degree angles to generate a NeRF render. Using your iPhone, you can download the iOS app and try it out for yourself.

I tested the app while waiting for the water to boil for my morning cup of coffee in our kitchen and was pleasantly surprised at how quick and easy it was from just an iPhone! Here’s a clip of the different views from the render:

View the NeRF render here and click on the various modes in the bottom right of the screen: https://lumalabs.ai/capture/96368DE7-DF4D-4B4B-87EB-5B85D5BDEA37?mode=lf

It doesn’t always work out as planned, depending on the scale of your object, the environment you’re capturing in and things like harsh lighting/reflections/shadows/transparency. Also working on a level surface is more advantageous than me trying to capture my car on our steep driveway in front of the house on our farm.

View the NeRF render yourself here to see what I mean – the car actually falls apart and there’s a section where it jumped in position during the mesh generation: https://lumalabs.ai/capture/8AE1BF7D-5368-427F-8CC4-D02B1B887D31?mode=lf

For some really nice results from power users, check out this link to Luma’s Featured Captures page.


Flythroughs (LumaLabs)
https://lumalabs.ai/flythroughs

To simplify generating a NeRF flythrough video, LumaLabs has created a new dedicated app that automates the process. It just came out and I’ve not had a chance to really test it yet, but you can learn more about it – the technology and several results from their website. It’s pretty cool and I can only imagine how great it’s going to be in the near future!


Video to 3D API (LumaLabs)
https://lumalabs.ai/luma-api

Luma’s NeRF and meshing models are now available on our API, giving developers access to world’s best 3D modeling and reconstruction capabilities. At a dollar a scene or object. The API expects video walkthroughs of objects or scenes, looking outside in, from 2-3 levels. The output is an interactive 3D scene that can be embedded directly, coarse textured models to build interactions on in traditional 3D pipelines, and pre-rendered 360 images and videos.

For games, ecommerce and VFX – check out the use case examples here.


Spline AI
https://spline.design/ai

So what makes this AI based online 3D tool so different from any other 3D modeler/stager/designer/animator app? Well first of all, you don’t need any prior 3D experience – anyone can start creating with some very simple prompts/steps. Of course the more experience you have with other tools – even just Adobe Illustrator or Photoshop, the more intuitive working with it will be.

Oh, and not to mention – it’s FREE and works from any web browser!

You’re not going to be making complex 3D models and textures or anything cinematic with this tool, but for fun, quick and easy animations and interactive scenes and games, literally anyone can create in 3D with this AI tool.

This is a great overview of how Spline works and how you can get up to speed quickly with all of its capabilities, by the folks at School of Motion:

Be sure to check out all the tutorials on their YouTube channel as well – with step-by-step instructions for just about anything you can imagine wanting to do.

Here’s a great tutorial for getting started with the spline tool So intuitive and easy to use:

 

Generative AI Voiceover (TTS)

There are several AI TTS (Text To Speech) generators out on the market and some are built into other video and animation tools, but there is one that stands-out above all the rest at the moment – which is why I only focused on ElevenLabs AI.


ElevenLabs Multilingual v2

It’s amazing how fast this technology is advancing. In less than a year we’ve seen the results of early demos I shared in my AI Tools Part 1 article in January 2023, to now having so much more control over custom voices, cloning, accents and now even multiple languages!

With the announcement this week of Multilingual v2 out of beta, they offer these supported languages: Bulgarian, Classical Arabic, Chinese, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hindi, Indonesian, Japanese, Italian, Korean, Malay, Polish, Portuguese, Romanian, Slovak, Spanish, Swedish, Tamil, Turkish & Ukrainian (with many more in development) and join the previously available languages including English, Polish, German, Spanish, French, Italian, Hindi and Portuguese. Check out this demo of voices/languages:

Not only does the Multilingual v2 model provide different language voices, but it also makes your English voices sound much more realistic with emphasis and emotion that is a lot less robotic.

As I’ve mentioned in my previous AI Tools articles, we’re using ElevenLabs AI exclusively for all of our marketing How-to videos in my day gig at the biotech company. What I’m most impressed with is the correct pronunciation of technical and scientific terminology and even most acronyms. I’ve rarely had to phonetically spell out key words or phrases, though changing the synthesized voice parameters can change a lot of inflection and tone from casual to formal. But retakes/edits are a breeze when editing! Besides, some of the AI voices sound more human than some scientists anyway (j/k) 😉

Here’s an example of a recent video published using AI for the VO.

 

Cloning for Possible use with ADR?

As you can hear in my opening demo video (well, those that know me and my voice that is), the Cloning feature in ElevenLabs is pretty amazing. Even dangerously so if used without permission. That’s why I’ve opted to NOT include an example from another source in this article, but only to point out the accuracy of the tone and natural phrasing it produces on cloned voices.

For film and video productions, this means you can clone the actor’s voice (with their permission of course) and do dialogue replacement for on-air word censoring, line replacements and even use the actor’s own voice to produce translations for dubbing!

I recorded a series of statements to train the AI and selected a few for this next video. Here is a comparison of my actual recorded voice compared to the cloned voice side-by-side. Can you guess which one is the recorded voice and which one is AI?

You can probably tell that the ones where I spoke a bit sloppy/lazy pronunciations were my original recorded voice, which makes me think this would be a better way to record my voice for tutorials and VO projects so I can always have a clear, understandable voice that’s consistent – regardless of my environment and how I’m feeling on any given day.

So to test the different languages, I used Google Translate to translate my short example script from English to various languages that are supported by ElevenLabs, plugged that into Multilingual v2 and was able to give myself several translations using my own cloned voice. So much potential for this AI technology – I sure wish I could actually speak these languages so fluently!

Again – this would be great for localization efforts when you need to translate your training and tutorial videos and still stay in your own voice!

JOIN ME at the AI Creative Summit – Sept 14-15

I’ll be presenting virtually at the AI Creative Summit on September 14-15, produced by FMC in partnership with NABshow NY.

I’ll be going into more detail on some of these technologies and workflows in my virtual sessions along with a stellar cast of my colleagues who are boldly tackling the world of Generative AI production!

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more... 62

Register NOW! Hope to see you there!

]]>
https://www.provideocoalition.com/ai-tools-part-4/feed/ 2
AI Tools Part 3: The Current State of Generative AI Tools https://www.provideocoalition.com/ai-tools-part-3/ https://www.provideocoalition.com/ai-tools-part-3/#comments Thu, 29 Jun 2023 04:18:56 +0000 https://www.provideocoalition.com/?p=267326 Read More... from AI Tools Part 3: The Current State of Generative AI Tools

]]>
As expected by mid-year, we’re much further along with AI tool development (and Generative AI Tools) than what one could have imagined back in January 2023 when I first launched this series in AI Tools part 1: Why we need them, and then in March with updates in AI Tools Part 2: A Deeper Dive, Generative AI has grown to a point of practical usability in many cases, and advanced at a rate where we can more clearly see the path it’s heading in aiding production and post processes for video, imaging and creative content industries.

AI Tools Part 3: The Current State of Generative AI Tools 85

But as much as I’m excited about sharing the latest technological updates with our readers this month, I also need to open up this forum to address the elephant in the room: “Is AI going to take my job?” All we have are facts and opinions… and the line that divides them is pretty blurry. Mostly because we can’t really predict the future of AI development as it’s happening so fast. But for now, let’s take a minute and look at where we are – and what AI really is.

I’ve seen a lot of talk about what AI is exactly, and why is it called “Artificial Intelligence” if it requires human interaction to make it work correctly?

AI Tools Part 3: The Current State of Generative AI Tools 86

The above quoted response from Stephen Ford on Quora is probably the most succinct response I’ve seen to this question – fueled with a bit of speculation and sci-fi novel appeal. But in the end, we really don’t know the outcome of what we’re developing right now. You can read his whole response in the link.

Most everyone in the developed world has already been “feeding the machine” for the past several decades, in one form or another. Ever since we started communicating electronically, our clicks, words, images and opinions have been collected and used in a form of data harvesting and marketing back to us. At least since the early 90s, everything you buy at the store, or that Crapaccino you get at Starbux, or the shows you’ve watched on Cable or Dish network, or even content you shared on AOL and searches made in Yahoo were being collected and used to target messaging back to you in the form of direct mail or other advertising materials. (I used to create much of it for ad agencies back in the day). The only difference is, since everyone has connected to the Internet many times over (the iOT included) it’s happening at lightning speed.

And while it might feel like it, none of this has happened overnight. It’s just the developers are now taking all this data and applying it to machine learning models and spitting it out in various forms. And yes, the machines are learning faster and increasing accuracy with their results

But I think it’s equally important to understand HOW all this works, to best answer the “Whys”.

How does Generative AI work?

A good explanation in layman’s terms provided by Pinar Seyhan Demirdag of Seyhan Lee in her brilliant AI course on LinkedIn:
https://www.linkedin.com/learning/what-is-generative-ai/how-generative-ai-works

AI Tools Part 3: The Current State of Generative AI Tools 87
Pinar Seyhan Demirdag of Seyhan Lee

Generative AI enables users to quickly generate new content based on a variety of inputs. Inputs and outputs to these models can include text, images, sounds, animation, 3D models, or other types of data.

NVIDIA’s website explains how Generative AI models work:

Generative AI models use neural networks to identify the patterns and structures within existing data to generate new and original content.

One of the breakthroughs with generative AI models is the ability to leverage different learning approaches, including unsupervised or semi-supervised learning for training. This has given organizations the ability to more easily and quickly leverage a large amount of unlabeled data to create foundation models. As the name suggests, foundation models can be used as a base for AI systems that can perform multiple tasks.

Examples of foundation models include GPT-3 and Stable Diffusion, which allow users to leverage the power of language. For example, popular applications like ChatGPT, which draws from GPT-3, allow users to generate an essay based on a short text request. On the other hand, Stable Diffusion allows users to generate photorealistic images given a text input.

The three key requirements of a successful generative AI model are:

  1. Quality: Especially for applications that interact directly with users, having high-quality generation outputs is key. For example, in speech generation, poor speech quality is difficult to understand. Similarly, in image generation, the desired outputs should be visually indistinguishable from natural images.
  2. Diversity: A good generative model captures the minority modes in its data distribution without sacrificing generation quality. This helps reduce undesired biases in the learned models.
  3. Speed: Many interactive applications require fast generation, such as real-time image editing to allow use in content creation workflows.

AI Tools Part 3: The Current State of Generative AI Tools 88

So just to be clear – this isn’t a case of “search/copy/paste” from collected content harvested across the internet. It’s much more complex, and will continue to be.

In regards to developing Diffusion models (Midjourney, DAL-E, Stable Diffusion, etc.) this is provided:

  • Diffusion models: Also known as denoising diffusion probabilistic models (DDPMs), diffusion models are generative models that determine vectors in latent space through a two-step process during training. The two steps are forward diffusion and reverse diffusion. The forward diffusion process slowly adds random noise to training data, while the reverse process reverses the noise to reconstruct the data samples. Novel data can be generated by running the reverse denoising process starting from entirely random noise.

AI Tools Part 3: The Current State of Generative AI Tools 89

But what about generative text models like ChatGPT?

An explanation in layman’s terms from Zapier.com’s blog helps it to make sense:

ChatGPT works by attempting to understand your prompt and then spitting out strings of words that it predicts will best answer your question, based on the data it was trained on.

Let’s actually talk about that training. It’s a process where the nascent AI is given some ground rules, and then it’s either put in situations or given loads of data to work through in order to develop its own algorithms.

GPT-3 was trained on roughly 500 billion “tokens,” which allow its language models to more easily assign meaning and predict plausible follow-on text. Many words map to single tokens, though longer or more complex words often break down into multiple tokens. On average, tokens are roughly four characters long. OpenAI has stayed quiet about the inner workings of GPT-4, but we can safely assume it was trained on much the same dataset since it’s even more powerful.

This humongous dataset was used to form a deep learning neural network […] modeled after the human brain—which allowed ChatGPT to learn patterns and relationships in the text data […] predicting what text should come next in any given sentence.

All the tokens came from a massive corpus of data written by humans. That includes books, articles, and other documents across all different topics, styles, and genres—and an unbelievable amount of content scraped from the open internet. Basically, it was allowed to crunch through the sum total of human knowledge.

This humongous dataset was used to form a deep learning neural network—a complex, many-layered, weighted algorithm modeled after the human brain—which allowed ChatGPT to learn patterns and relationships in the text data and tap into the ability to create human-like responses by predicting what text should come next in any given sentence.

Though really, that massively undersells things. ChatGPT doesn’t work on a sentence level—instead, it’s generating text of what words, sentences, and even paragraphs or stanzas could follow. It’s not the predictive text on your phone bluntly guessing the next word; it’s attempting to create fully coherent responses to any prompt. (follow the link to read further)

But the bottom line is, HOW you use ChatGPT or Bard or any other generative text model, will garner the results you might expect. In most cases, blindly asking them for help or information on a subject like you might with Google – without any kind of input or “training” on the topic/subject, you’ll get sketchy results and incomplete or just plain wrong information.

Here’s a great start to understanding “Prompt Engineering” from the All About AI YouTube channel:

So is AI really going to take my job?

That depends on what you consider your job is.

Are you simply doing only ONE TASK in your job such as creating generic graphics or editing someone else’s marketing copy or scripts? Then chances are, eventually, yes. Anything involving writing or editing, analysts, basic programming, design and content creation/conceptualization and even V.O. artists are already at risk. Ref. Business Insider 4 June 2023

You must conform and adapt and diversify your abilities and embrace the change, or you will be found redundant.

My experiences so far have been everything the opposite – with a newfound creative vigor and excitement for what these new tools and technologies have to offer. I’ve reinvented myself so many times over the past 40+ years in may career (starting as an airbrush illustrator/mechanical engineering draftsman) and changed what I “do” all along the way as the technology changed courses. And I’m still looking ahead to see how it’s going to change much more before I finally (if ever) decide to retire!

So make sure you’re constantly diversifying your capabilities and get in front of the wave NOW. Don’t wait for the inevitable before you make changes in your career/income stream.

Write your comments below at the end of this article and tell us your thoughts about AI and the industry.

Major AI Tool Updates

I’ve been working on getting you all a deeper dive with some updates on the major players in Generative AI tool development, and this month is NOT disappointing!

Adobe Photoshop (Beta) & Firefly AI Generative Fill

The public beta of Photoshop 2023 was released last month and I covered some details in a short article on the release so there might be a little redundancy in this section if you read that already. But it’s worth pointing out what a game changer this is.

Accessible from your Creative Cloud app, download the latest Photoshop (Beta) and start the fun of exploring new capabilities within your own images using the Adobe Firefly Generative Fill.

Here’s more info and be sure to watch the demo video: https://www.adobe.com/products/photoshop/generative-fill.html

AI Remove Tool

I’m really excited about this new Remove Tool that is part of the healing brush palette.

AI Tools Part 3: The Current State of Generative AI Tools 90

It’s pretty straight-forward. Apply it as you might the Spot Healing Brush tool over an object you want removed, and Presto!

Check out this city scene where I removed the cars and people in less than a minute! It’s quite amazing!
(Click image to see details)

AI Tools Part 3: The Current State of Generative AI Tools 91

Firefly AI Generative Fill

It’s quite easy to use the Generative Fill – either to “fix” images, place new objects into a scene or expand the boundaries of an existing image. This is also called “outpainting” or “zooming-out” of an image.

When working in marketing, you often need to utilize stock images for ads and brochures. Since I work for a biotech company at my day gig, I thought it was appropriate to look for some bioscience images in Adobe Stock and found one as a good example of an issue we often run into with stock photos. The image is perfect except what if the client wants to run an ad with lots of type on the side or they want tradeshow booth graphics that fit a specific dimension. Usually we need to generate a fade off to the side or crop out the top/bottom to accommodate and then it’s too busy for text on top.

AI Tools Part 3: The Current State of Generative AI Tools 92

The solution is really simple now. Expand the Canvas size of the image and selecting that negative space with just a slight overlap onto the image layer and just hit Generate Fill and let it do it’s magic.

AI Tools Part 3: The Current State of Generative AI Tools 93

The resulting image is about 95% of the way there – including the extension of the scientist’s gloves, arm and lab coat sleeve, as well as the DNA strand that runs vertically down the page. There’s very little additional work to this image that would smooth it out to work for various marketing materials.

AI Tools Part 3: The Current State of Generative AI Tools 94

Another example was to use a photo that our team shot a few years ago of a scientist leaning against the railing at one of our HQ buildings on campus. I simply used the Object Select Tool to mask her out.

AI Tools Part 3: The Current State of Generative AI Tools 95

I reversed the selection and entered “on a balcony in a modern office building” in the Generative Fill panel and it produced a remarkable image – complete with reflections into the chrome and glass from her lab coat and matching the light direction and shadows. Literally two clicks and a little text prompt away from the original image.

AI Tools Part 3: The Current State of Generative AI Tools 96

Working with AI Generated Images

I used a Midjourney image to outpaint the edges to fill an iPhone screen. (You can read more about how the image was initially generated using ChatGPT for the prompt in the Midjourney section below)

I opened the image in Photoshop (Beta) and increased the Canvas size to the pixel dimensions of my iPhone. I then selected the “blank space” around the image and just let Photoshop do an unprompted Generative Fill and it created the results below:

AI Tools Part 3: The Current State of Generative AI Tools 97
Original image

Using Photoshop (Beta) Generative Fill to zoom out to the dimensions of the iPhone screen:

AI Tools Part 3: The Current State of Generative AI Tools 98
Expanded Zoom in v5.2

I was really impressed that it maintained the style of the original AI generated art and embellished on that “fantasy” imagery as it more than doubled in size.

For a good tutorial on using both the Remove Tool and Generative Fill to modify your images in the Photoshop (Beta), check out this video from my friend and colleague, Colin Smith from PhotoshopCAFE:

Midjourney 5.2 (with Zoom Out fill)

Some big changes come to Midjourney with it’s latest build v5.2 and worthy of a spotlight here.

With v5.2 the quality of the image results are much more photorealistic – perhaps at the sacrifice of the “fantasy art” and super creative images we’ve been used to in previous versions. But let’s just look at that quality a moment.

This is from a year ago June 2022 in Midjourney, where faces, hands and even size of the rendered results were much lower – then v4 in Feb of 2023 and then today in June of 2023. All straight out of Midjourney (Discord).
My prompt has remained the same for all three of these images: “Uma Thurman making a sandwich Tarantino style”. (Don’t ask me why – it was just some obscure prompt that I thought would be funny at the time. I can’t recall if I was sober or not.) 😛

AI Tools Part 3: The Current State of Generative AI Tools 99
Midjourney 6/22 – click image to see full size render
AI Tools Part 3: The Current State of Generative AI Tools 100
Midjourney 2/23 – click image to see full size render
AI Tools Part 3: The Current State of Generative AI Tools 101
Midjourney 6/23 – click image to see full size render

The image I created for my first article in this series, AI Tools Part 1: Why We Need Them back in January of this year, I utilized a prompt that was generated by ChatGPT.

“As the spaceship hurtled through the vast expanse of space, the crew faced a sudden malfunction that threatened to derail their mission to save humanity from a dying Earth. With limited resources and time running out, they must come together and use their unique skills and expertise to overcome the obstacles and ensure their survival.”

One of several results was the image I used (I added the text of course – Midjourney still messes up text horribly)

AI Tools Part 3: The Current State of Generative AI Tools 102
Midjourney image from Jan 2023

Using the exact same prompt today in v5.2 gave a very different result:

AI Tools Part 3: The Current State of Generative AI Tools 103
Midjourney image from June 2023

v5.2 Zoom-out Feature (Outpainting)

One of the most significant updates in v5.2 is the ability to “Zoom Out” of an image. Similar to how Generative Fill works in Photoshop (Beta), only the process isn’t as selective and you still guess at what Midjourney is going to create outside the boundaries of your initial image.

Currently, it only works with images generated in v5.2 and I couldn’t get it to work with uploaded images. This image was based off another prompt I did months ago – using another image to generate a more stylized result. I got a variation of images and chose this one to upscale to provide the Original pictured below on the left. I then used the Zoom Out 1.5x option to create the image in the center and Zoom Out 2x from there to generate the image on the right.

AI Tools Part 3: The Current State of Generative AI Tools 104
Midjourney v5.2 with Zoom Out Generative Fill

Here’s a great tutorial from Olivio Sarikas on how to use MJ v5.2 with the new Zoom-out options:

Taking other images created with v5.2, I used the Zoom Out option to further expand the image to create something completely different. This was a variety of images generated from the original up above. Click on the image to see details.

AI Tools Part 3: The Current State of Generative AI Tools 105

Another example with the original on the left – and the multiple-stage Zoom out on the right:

AI Tools Part 3: The Current State of Generative AI Tools 106

It’s often surprising where Midjourney decided to take an image – often with multiple options to drill down further with variations and further zooming.

AI Tools Part 3: The Current State of Generative AI Tools 107

AI Tools Part 3: The Current State of Generative AI Tools 108

AI Tools Part 3: The Current State of Generative AI Tools 109

Wonder Studio

Wonder Dynamics has completed the closed, private beta of Wonder Studio that we’ve been playing with the past few months, and will be opening its doors to new users on June 29th and will be announcing their pricing structure soon as well.

Wonder Studio is an AI tool that automatically animates, lights and composes CG characters into a live-action scene. Currently, you can only choose from pre-made characters, but in the future you can add your own rigged 3D characters to animate on-screen to replace your actors on-camera – all without motion tracking markers and massive roto work.

It can produce a number of outputs for your production pipeline as well, such as a Clean Plate, Alpha Mask, Camera Track, MoCap, and even a Blender 3D file. I could immediately see the character alpha useful for color work at minimum.

The workflow is really simple: upload your video to the Wonder Studio web portal and select an actor to track from your scene. Then select a 3D character from the library and set to render. Wonder Studio does all the work for you in the cloud.

AI Tools Part 3: The Current State of Generative AI Tools 110

I used this viral video from @DanielLaBelle on YouTube of his Various Ways People Walk as a test to see how close Wonder Studio could match his motion. Judge for yourself the side-by-side results. Note that this video took approximately 30-40 minutes to render fully.

You can see some artifacts and blurriness at times where the original actor was removed – or action scenes that didn’t quite capture the original actor completely and they “pop on screen” for a moment, like in this action scene posted by beta tester Solomon Jagwe on YouTube.

It’s still super impressive this was done all in Wonder Studio on a web browser! I can only imagine how this AI technology will improve over time.

ElevenLabs AI

A couple big updates from ElevenLabs – AI Speech Classifier and Voice Library

AI Speech Classifier: A Step Towards Transparency (from their website)

Today, we are thrilled to introduce our authentication tool: the AI Speech Classifier.This first-of-its-kind verification mechanism lets you upload any audio sample to identify if it contains ElevenLabs AI-generated audio.

The AI Speech Classifier is a critical step forward in our mission to develop efficient tracking for AI-generated media. With today’s launch, we seek to further reinforce our commitment to transparency in the generative media space. Our AI Speech Classifier lets you detect whether an audio clip was created using ElevenLabs. Please upload your sample below. If your file is over 1 minute long, only the first minute will be analysed.

A Proactive Stand against Malicious Use of AI

As creators of AI technologies, we see it as our responsibility to foster education, promote safe use, and ensure transparency in the generative audio space. We want to make sure that these technologies are not only universally accessible, but also secure. With the launch of the AI Speech Classifier, we seek to provide software to supplement our wider educational efforts in the space, like our guide on the safe and legal use of Voice Cloning.

Our goal at ElevenLabs is to produce safe tools that can create remarkable content. We believe that our status as an organization gives us the ability to build and enforce the safeguards which are often lacking in open source models. With today’s launch we also aim to empower businesses and institutions to leverage our research and technology to bolster their respective safeguards.

Community Voice Library

Voice Library is a community space for generating, sharing, and exploring a virtually infinite range of voices. Leveraging our proprietary Voice Design tool, Voice Library brings together a global collection of vocal styles for countless applications.

You can equally browse and use synthetic voices shared by others to uncover possibilities for your own use-cases. Whether you’re crafting an audiobook, designing a video game character, or adding a new dimension to your content, Voice Library offers unbounded potential for discovery. Hear a voice you like? Simply add it to your VoiceLab.

All the voices you find in Voice Library are purely artificial and come with a free commercial use license.

In addition to making your generated voices sharable, they can now be part of the extensive Voice Library on the ElevenLabs site (provided you have any level paid account).

Sharing via Voice Library is easy:

  1. Go to VoiceLab
  2. Click the share icon on the voice panel
  3. Toggle enable sharing
  4. Toggle allow discovery in Voice Library

You can disable sharing at any time. When you do, your voice will no longer be visible in Voice Library, but users who already added it to their VoiceLab will keep their access.

AI Tools Part 3: The Current State of Generative AI Tools 111

Here are just a few examples currently available in the library of hundreds of voices generated and shared:

AI Tools Part 3: The Current State of Generative AI Tools 112

AI Tools Part 3: The Current State of Generative AI Tools 113

AI Tools Part 3: The Current State of Generative AI Tools 114

AI Tools Part 3: The Current State of Generative AI Tools 115

Okay – I made that last one… and provided the sample text for her to speak, of course! You can now find it in the Voice Library too 😀

AI Tools: The List You Need Now!

Be sure to check in with the ongoing thread AI Tools: The List You Need Now! as I update it regularly to keep it current and relevant.

]]>
https://www.provideocoalition.com/ai-tools-part-3/feed/ 1
The HPA Tech Retreat and the Avatar II Deep Dive https://www.provideocoalition.com/the-hpa-tech-retreat-and-the-avatar-ii-deep-dive/ https://www.provideocoalition.com/the-hpa-tech-retreat-and-the-avatar-ii-deep-dive/#comments Thu, 23 Feb 2023 06:23:49 +0000 https://www.provideocoalition.com/?p=263626 Read More... from The HPA Tech Retreat and the Avatar II Deep Dive

]]>
The HPA Tech Retreat and the Avatar II Deep Dive 125In Palm Springs every February, a group of Hollywood industry professionals get together without the fanfare and noise of a tradeshow. Many of the industry’s leading luminaries, along with the techs in the trenches, assemble for the Hollywood Professional Association’s Tech Retreat: four days of discussion about where the industry is and where it’s going.

Day 1: TR-X

The retreat began Monday with what’s called TR-X (Tech Retreat Extra). It’s a less formal structure with a lot of panels and general discussion around current trends. What really was interesting here were the themes that kept showing up in all the discussions: the dizzying speed of AI/ML encroachment into creative endeavors, the persistence of WFH post-Covid, cloud everything, the challenge of navigating an overly complex global supply chain, the vocational challenge of keeping job skills in line with the speed of technology advancement and replacement turnover, and the struggle to recruit talent in a streaming market desperate for more content.

The conversation traveled in many directions, but here are a few random pieces that struck me as particularly interesting:

Atul Phadris of Vitrina AI is pursuing the lofty goal of using AI to map the global supply chain of the film and television industries. His company’s goal is to create a system that dynamically updates supply chain availability in real-time. Perhaps an overambitious goal, but he had some interesting facts and stats in presenting the problem he’s trying to address. To his point he quoted a Netflix executive as saying, “A supply chain directory is stale the day after it is launched.” Some interesting takeaways form his talk: the film industry has a $255 billion supply chain; the emergence of globally sourcing companies like Disney, Netflix, and other streamers has produced major destruction of traditional, local supply chain relationships; the result is a supply chain that’s almost impossible to predict, producing chaos in this industry of hard project deadlines. He also emphasized the emergence of localization as a massive new sector in the supply chain. Where localization was previously relegated to basic subtitles and poor overdubs, the advent of AI techniques and the demand for content means a major increase in business for those repurposing content for other language markets.

Paul Debevec discussed the emerging technology. He focused on the fact that AI lacks creative control. The ability to “art direct” the results of AI effectively will be pivotal to its advancement in the film industry. He also addressed the fact that machine learning-trained data scientists are going to be pivotal members of the film industry community going forward. Finally, he pointed out that existing technologies like the Lightstage are not made obsolete by AI; they provide the data needed to train the machine learning models.

Tuesday Supersession: All Things Avatar

Tuesday’s sessions were entirely devoted to the production of Avatar: the Way of Water, with lengthy discussions by key members of the production team, including a rare shop-talk interview with producer Jon Landau. To date the film has generated over $2.2 billion in revenue.

The HPA Tech Retreat and the Avatar II Deep Dive 126

While few films will approach the “full VFX” production of Avatar, so many aspects of the groundbreaking production are already filtering down into standard filmmaking practice. As such, these sessions were an extremely valuable window into the practical workflows involved in bringing the tentpole to screen. Let’s take a walk through some of the salient information shared in these sessions.

First session was a panel interview with key members of the production and postproduction team: Robin Charters, 3D CamSys Engineer, Russell Carpenter, ASC DP, Tashi Trieu, Colorist, and Simon Marsh, Camera Specialist.

Russell Carpenter

The session began with DP Russell Carpenter describing some of the Herculean efforts expended in principle photography. While this seems a little odd in a movie that’s almost entirely CG, it turns out that there was actually a lot involved in creating a camera system and pipeline that could integrate the live action sets and live action actors with the performance capture.

Out of the gate Russell was asked by the moderator to address the issue of the production timeline: pre-production began in 2013 and the film was released December, 2022. He quoted James Cameron as saying something to the effect of, “We don’t know how to make it, but we’ll know how we made it after we made it.” Russell explained that the film was made like a layer cake, with the 2nd layer being built before the 1st layer was even done. In fact, the entire movie was essentially in a state of flux throughout production. As a basic timeline, in 2013 the script was being developed simultaneously with the production designers’ work developing the imagery for the scenes and characters. At that point it was decided that the entire arc of the story needed four movies, not two.

A year after concept design was completed, the virtual sets were at a place where Jim Cameron could enter the volume with his virtual camera. This was—in Russell Carpenter’s words—“perfect for Jim as a control freak.” Jim would call attention to specific features of the sets and their positioning, then designers would custom-build the locations based on his feedback.

By 2017 the actors came in for performance capture. Since the camera work occurred separately (thanks to the virtual camera system) the entire focus was on performance. When it came to principle photography, Russell needed to match visuals with existing virtual production footage generated from the performance capture sessions (stitched together from multiple takes), and the virtual camera work created by James Cameron. In some cases a technodolly was used to perfectly line up camera moves with the virtual production. A depth camera was used to composite the live action footage with the previously generated virtual camera work in real-time.

A large part of the challenge in cinematography was the development of a camera system that could handle stereoscopic shooting at high resolution while being light enough to be maneuverable by a single operator. This is where Simon Marsh comes into the picture.

Simon Marsh and the Sony Rialto

Simon is the product manager for the Venice and Rialto Sony camera ecosystem. In October of 2017 Marsh met with the Lightstorm team (James Cameron’s production company). At the meeting Lightstorm stated that they loved the Venice Camera system but needed 48 frames-per-second and wanted to separate the sensor block from the body by at least 2 meters. This was essential to get the light-weight maneuverability that Jim needed in a stereoscopic camera rig.

The HPA Tech Retreat and the Avatar II Deep Dive 127The 48 frames-per-second was already anticipated and within reach. The idea of separating the sensor block, however, was a whole other feat of engineering. After initial tests they managed to get 18 inches of separation. By NAB in 2018 that had expanded to 9 feet, and by a third meeting with Lightstorm in May of 2018 they had achieved 18 feet of separation on a stereo rig, in part thanks to repeaters along the length of the cable. When you consider just how much raw data needs to be pumped from a camera sensor to its processing hardware, this was no trivial achievement. By September of the same year they had delivered preproduction units.

Russell Carpenter was extremely impressed with the color depth of the camera system. In fact, they were unable to find a field monitor capable of displaying the full dynamic range of the output. What Russell found most impressive was the performance at the 2500 ISO. In fact, he decided to shoot the entire picture at 2500 ISO. This allowed the use of what he called surgical lights more akin to rock and roll stage lighting than conventional feature production lighting. At the high ISO what would have normally required 100 foot-lamberts only required 20. It enabled him to perfectly match the lighting of the live action character Spider with his virtual co-stars, right down to the dappling of tree foliage.

The Spider Challenges

As the only main non-CG character, scenes with the character Spider were the most significant for live action cinematography. Initially he was motion captured with his costars. Two years later he then re-enacted his part to mesh with the final motion performances of the others (those performances being the composite of multiple takes carefully edited together).

What turned out to be a significant issue was the fact that Spider actor Jack Champion had grown around a foot from the time of the initial capture to the time of his live action performance. This obviously caused issues with things like eyeline, but also meant the live action crew had to be extremely accurate when lining their work up with the motion capture content. The speed of Jack’s growth also necessitated that all of film three and some of film four in the Avatar series were captured at the same time.

The HPA Tech Retreat and the Avatar II Deep Dive 128

Another challenge in production was his face mask. In the final production the glass was added digitally; the curvature of the wrap-around design of the mask meant that practical glass or plastic would have captured all the stage lighting, crew and rigging. The result would have been a massive VFX paint job, hence the decision to add the glass and scene reflections in post.

A tank for all water conditions

A massive all-purpose water tank was built at Manhattan Beach to accommodate all the filming done underwater, including significant amounts of underwater motion capture. The heated tank was several stories high, held close to 900,000 gallons of water, and could create both 10 knot currents and 2 meter waves. Large modified earth movers were used to generate the waves, while a powerful pump generated the currents. The current was necessary to create accurate and plausible motion capture as the actors swam against it.

The water had to be prepared each day to ensure that it was clear enough to shoot through. Ultimately the size of the tank meant that it could be adapted to just about any scene.

The HPA Tech Retreat and the Avatar II Deep Dive 129

One interesting note with respect to virtual production techniques: LED panels were used to play realistic fire as reflections for the end battle scene. However, it was impossible to get the LED panels low enough relative to the water surface without creating a dark band on the border of the water. (LED electronics and water don’t tend to work well together for long and can lead to electrocution of crew and cast. Not good for insurance rates…) The solution was to reflect the LED light against mirrors placed into the water. Great reflections, no electricals in the water.

Tashi Trieu

Next up to the mic was colorist Tashi Trieu, talking about the color workflow to and through Blackmagic Design’s DaVinci Resolve. In fact, what seemed to be a common theme throughout the retreat was the heavy use of Blackmagic Design throughout the industry. It truly seems that here in 2023, Blackmagic gear has become an essntial 1st tier staple in the industry. Both Avatar and Lord of the Rings: The Rings of Power featured Blackmagic Design hardware at an infrastructure level, and software at their color pipeline level.

Takeaways from Tashi’s talk include the attention to detail given to acquisition quality. The effects of different beam splitters and polarizers on the color of the image were evaluated in the stereo rigs to ensure the minimum artifact between left and right eye. (To get two cameras close enough to mimic the inter-pupillary distance of the average human eye requires a beam splitter.) Robin Charters later elucidated that new mirrors were actually developed by and sourced from an aerospace company.

An important concern to James Cameron in the grading process was that the water “held its volume.” A lot of time was spent working on the saturation and increasing gamma so that elements didn’t feel like they were floating in space, but instead were suspending in substantive ocean water.

Throughout the grading process in New Zealand Tashi graded directly on a 38 foot Dolby Vision laser projector in 3D. In fact, 95% of the grading was done in 3D, with perhaps 5% being spent grading the 2D version. Tashi noted that despite anticipating fatigue from the lengthy stereoscopic viewing sessions, he experienced none that he could discern. (The audience only sees the movie in 3D in a single viewing session; Tashi as a colorist was viewing it for several hours each day.) He attributes this largely to Jim Cameron’s belief that when cutting 3D, the subject of interest should almost always be placed at screen plane, so that the viewer’s eyes focus and converge at the same distance. In typical 3D viewing, the human eye is often led to focus in front of or behind the screen, forcing the eyes to converge at a different distance to their focus. This is something that the eye almost never does in natural viewing and is often a source of eye strain.

The HPA Tech Retreat and the Avatar II Deep Dive 130

Tashi also posited that the dynamic range of the Dolby projector may have alleviated eye strain.

As to the shot delivery, there were only 3 shots in the film that weren’t visual effects, and every shot in the movie was a preprocessed, debayered EXR before arriving at the grade.

One interesting side-note: Tashi worked on remastering Avatar 1 at the same time he was grading Avatar 2. He was able (admittedly with a little help from Blackmagic Design) to upgrade the Avatar 1 grading sessions to Resolve 18 from their original version 13 years prior. Thirteen years is an eternity in project file structure, so that in itself is a testament to the Resolve engineers’ efforts toward backwards compatibility.

And a final lesson Tashi learned working with James Cameron: If there’s something that takes 10 seconds, you’d better figure out how to shave 9 seconds off that time before showing it to Jim.

Robin Charters

Robin’s official title on the project was “3D Systems Engineer Architect,” but it’s pretty clear from his talk and the other speakers’ deference toward him that he was involved at a deep level in general pipeline implementation. One of the mandates for this film was to buy everything, rent nothing. And if it couldn’t be bought, build it in-house. This might sound extravagant given the price of high-end cameras, but due to the duration of production, rental fees would have exceeded purchasing significantly over the course of the project.

The team created their own assembly space—the “Coral Circle Skunkworks”—down the road from the Manhattan Beach studio, complete with everything from CNC mills to custom SoC development boards. They designed and built three custom stereoscopic underwater rigs: a “traditional” rig, a 15mm Nikonos wet lens that allowed for a wet mirror, and a nano camera using Sony sensors mounted into a housing called Dream Chip. The design requirements were that the camera rigs had to be handheld and come in under 30 lbs (in addition to being 47.952 fps and capable of a dynamic range that would satisfy the final Dolby Visual master). All up the manufacturing processes took a team of about 8 people 9 months.

One interesting anecdote with respect to attention to detail: Robin and his team tested several monitors before getting them in front of Jim. Of paramount importance was the signal latency of the monitors. Since depth compositing of the virtual elements over the live action cinematography needed to happen in real-time, anything that added latency needed to be avoided. This was so religiously followed that the crew would actually turn monitors upside down to avoid hitting a frame buffer needed to invert the image digitally.

Modified Blackmagic Terranex units were used as the backbone of the signal chain and Robin noted the great collaboration between Sony and Blackmagic, with Sony even providing a complimentary Venice to Blackmagic’s team in Melbourne to help them work with the data pipeline.

In all 17 Venice camera bodies were used, along with two additional sensors, the latter allowing them to repurpose the body without having to remove sensors from the rigs. 46 Terranex format converters made their way into the signal chain. USB depth cameras were used, with the USB3 camera signals delivered over fiber. Ubiquity Unify networking equipment was used for PoE networking.

One unexpected element that became crucial was the quality of the cabling to ensure power and data got where it needed to go. Improving the cable connections turned out to be the best way to improve the cameras, with cables needing to sustain a 600W power draw while reaching a length of 500 feet.

Lastly, Robin’s team had a hard time sourcing 3D preview monitors. The stereoscopic home theatre craze had run its course and it was impossible to find any newly-manufactured stereo monitors. Finally they discovered that a medical division of Sony still made a 3D monitor for surgery that was available for purchase.

Jon Landau

A highlight of the sessions was a video call-in from legendary producer Jon Landau. Jon started by discussing the philosophy behind stereoscopic 3D at 48 frame per second. 3D was chosen due to its sense of immersion. It’s interesting to note that while Jon appreciates VR technology he has no interest in making a film for VR. His argument is that in VR the viewer directs their own attention. In filmmaking, the audience is fed the point of view of the narrative. Hence stereoscopic 3D works for added immersion, without losing the ability to control and direct the audience’ attention.

The HPA Tech Retreat and the Avatar II Deep Dive 131Why 48 frames per second? About 10 years prior to release they internally did tests at different frame rates. They shot actors at frame rates up to 60 frames per second and determined that for certain shots, 48 frames per second was the ideal frame rate. It wasn’t necessarily the best for all shots, but it was for action sequences. (During the super session, we were actually shown action sequences at both frame rates as a comparison. There was definitely a jarring difference between 3D action sequences at 48 frames per second versus 24).

Jon then outlined 4 key areas where Avatar 2 has pushed filmmaking technology:

  • Underwater performance capture
  • Water simulation advancement
  • Facial capture and performance
  • Real-time depth compositing system

The HPA Tech Retreat and the Avatar II Deep Dive 132

In particular he feels that the depth compositing has the broadest application for filmmaking in general. While not every film is going to require underwater capture or facial performance capture, the ability to composite CG elements in real-time over live action cinematography is useful for any production that integrates visual effects.

In a post-mortem of Avatar 1, depth compositing and facial capture were two of the technologies that emerged as important to improve upon. Working with Weta, the team developed a radically more sophisticated facial capture system. A deep learning algorithm was trained on a library of custom takes of actors delivering lines and emoting, and was then used to drive not just the skin deformation of the virtual characters, but the underlying muscle movements as well.

Obviously water was a huge component of the second movie, and the production team went to extreme lengths to achieve realism. That began with training the actors and crew in “breath hold” diving. Since not even the crew could be “on air” (the bubbles would have affected the capture), the cast and crew all had to learn to stay down for lengthy periods on breath alone. They all became certified divers and even went to Hawaii on a night dive where they encountered giant manta rays—a very “Pandoran” experience.

Weta worked tirelessly in research and development on water simulation. The trickiest effects were when actors breached the water surface, working out how hair, skin, and clothing should transition from full immersion to above water. For many of the water effects they didn’t “crack the code” on realism until the very end of the film production. It often wasn’t until looking at final renders that the creative team cold identify shots that “didn’t feel right,” then determine what was lacking or not performing correctly.

Landau also discussed editorial a little, but since we have comprehensive coverage of the editorial process in our interview with Jim Cameron, we’ll eschew the details here.

Cloud computing became a major part of production toward the end. As Weta attempted to render final shots, they actually maxed out the New Zealand power grid in the region where the studio was located. As a result they had to turn their attention to the cloud. They ended up using 3.3 billion core hours of rendering using Amazon Web Services.

Finally, Jon drew attention to the production’s efforts toward sustainability: everything from solar panels to craft services. Cast and crew were served vegan meals and provided with permanent water bottles, along with refilling stations. In general the efforts are commendable, but it does beg the question: just how much energy was blown on those 3.3 billion core hours of cloud rendering, not to mention the energy consumed by Weta locally…

The HPA Tech Retreat and the Avatar II Deep Dive 133

Delivering Avatar

The day rounded out with a discussion of what it took to actually deliver Avatar: The Way of Water. It turns out a staggering amount of work. There were 1065 different builds of the movie. Color and contrast were evaluated for different screen brightnesses and aspect ratios, to ensure that every local cinema had an optimal viewing experience. By the time all of the various subtitles were added, there were 6,338 total deliverables. 28 languages supported dubs, and 51 languages supported subtitles. It was a 15 reel movie with a total runtime of 3 hours, 12 minutes, and every permutation and combination of subtitles, dubs, aspect ratio, and output level had to be QC’d.

If that weren’t enough, the creatives in New Zealand manually placed each subtitle in screen X, Y, and Z space for the best possible experience, based on where the screen action was. They even subtly animated the depth of the subtitles to optimize for audience comfort. I left with a new, deep appreciation for titling, dubbing, and distribution supply chain in general.

That’s it for the first two days coverage of the HPA Tech Retreat. Stay tuned for day three…

]]>
https://www.provideocoalition.com/the-hpa-tech-retreat-and-the-avatar-ii-deep-dive/feed/ 1
AI Tools: The List You Need Now! (Final Installment 2023) https://www.provideocoalition.com/ai-tools-the-list-you-need-now/ https://www.provideocoalition.com/ai-tools-the-list-you-need-now/#comments Wed, 22 Feb 2023 02:55:44 +0000 https://www.provideocoalition.com/?p=262935 Read More... from AI Tools: The List You Need Now! (Final Installment 2023)

]]>
FINAL UPDATE: December 2023  (most recent updates in RED)

Wrapping up the year’s list of AI Tools and updates this month. Be sure to look for my final article in the series AI Tools Part 5: 2023 A Year in Review to show just how far this technology has come in just 12 months! We’ll continue with only feature articles and tutorials going into 2024, that will continue the AI Tools moniker, but I won’t be able to keep up this list now the larger industry overview articles, as the technology is developing faster than we can write about it. Literally. We’re talking daily updates now, as I predicted!

So I’m going through this list tonight and updating all the latest tools that are still developing. Today. And tomorrow. Until Dec 31st.

*Original article text:
We’re just taking a moment to catch our breath a bit in all the AI fury and start compiling a list that we can use for reference by category or all the most popular and useful AI Tools, portals and apps available to create, modify, enhance or edit your projects and creative content (I’ll add in a few that might be just for fun, too.) 🙂

I’ve purposely left off several that are either scammy in their model, not relevant to our industry or just plain garbage. If you know of any you’d like to see added to the list, please comment below and inquire for me to research it further. (Or I may tell you why it’s not on the list already)

AI Tools: The List You Need Now! (Final Installment 2023) 146

Since my first article, AI Tools Part 1 went live, there’s been a lot of development in the tools and technology, along with concerns and opinions expressed about the legality, ethics and plausibility of their usefulness in the creative content communities. *UPDATE: Be sure to also check out these in-depth articles of this emerging technology!

AI Tools Part 2: A Deeper Dive

AI Tools Part 3: The Current State of Generative AI Tools

AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more…

AI Tools Part 5: The Year in Review

 

So why The List You Need Now?

Before proceeding with all the more detailed deep dives into the various tools, I wanted to have one central location that everyone can reference and get updates on whatever the latest versions of  tools for various uses of AI technology – specifically related to the film, video and imaging production, editing and creative content creation industries. I will try to update this list about once a week until things begin to die-down (yeah, right!), but always check the “updated” date below, as I will mark tools that also have major updates even if previously listed.

I am also only adding tools and links to the list that I personally check out and get whatever info I can pass along, and will provide more info and details in subsequent articles for various AI tool tactics, like Image Generation, Enhancements, Video & Animation tools, Text Generation/Writing tools, Audio and Music Generation/enhancement tools and more.

NOTE: There are several Apps in the Google Play and Apple App store that carry the names of some of these tools or are just a shell that uses some of the core tools that may or may not work. Proceed at your own risk and costs.

Also note that I’m always learning, researching and testing this stuff as I go. Some information provided will be incomplete or I may use incorrect terminology or explanations of certain technologies. Admittedly, I don’t really know what I’m doing yet! If you see something that needs correction, please let me know and I’ll get it fixed! This is an evolving list of resources and ultimately, we want it to be right.

AI Tools: The List You Need Now! (Final Installment 2023) 147
Graphic source: unknown

————

AI Tools: The List You Need Now! (Final Installment 2023) 148

The List

FINAL UPDATE: December 2023 (most recent updates in RED)

Category Menu:

AI Image Generators

AI Image & Video Enhancement Tools

AI Audio Tools

AI TTS (Text To Speech) Tools

AI Text/Scripting Tools

AI Music Generators

AI Video Generators

AI 3D Tools

AI Utilities & Productivity Tools

AI Tool Resources and Articles

————

AI Tools: The List You Need Now! (Final Installment 2023) 149

AI Image Generators

^ main menu

Midjouney AI v5.2 with Zoom Out (outpainting), Upscaling and Vary (Region) (inpainting)
https://midjourney.com/ (Sign up page) uses Discord to access servers. Paid plan gives you more images and a private MJ bot to develop your prompts in. Currently, this is the best way to get into generating AI images online and achieving great results with minimal prompting.

COST: Free trial then $10/30/60 monthly plans for use. You can find out more about pricing plans here.

TOOLS: Online through Discord/Discord Mobile App

DALL-E 2 (Open AI)
https://labs.openai.com/ Popular and fun AI image generator. Also provides outpainting and repainting of generated images and photos.

COST: Free first month (50 credits/200 images) Then 15 credits/60 images per subsequent mo. ($15 buys additional 115 credits/460 images).

TOOLS: Online

Stable Diffusion Playground (Online)
https://stablediffusionweb.com/ A public portal for generating images on the Stable Diffusion engine in a user-friendly (and FREE) website.

(SD is available to run on many private servers and platforms if you’re into doing some coding and are wanting to explore – and we will too when we dig deeper. Get more info from the parent site at Stability.AI) See Resources below for download and installation info and instruction videos.

COST: Free

TOOLS: Online

Adobe Firefly (Online)
https://firefly.adobe.com/ It’s what you’d expect from Adobe. It will eventually be integrated into the Creative Suite of tools, to produce images, text, inpainting/outpainting, image enhancements, vector tools and templates and more. Current beta only has limited text to image generation and creative type creation. Updated Public Beta – new features like Generative Fill, Recolor and more.

COST: Free Public Beta

TOOLS: Online

BlueWillow
https://www.bluewillow.ai/ From logos, landscapes and characters to digital artwork,  just describe the image you want and our AI image generating tool will create the perfect graphic for your project. It’s all FREE.

COST: Free Beta

TOOLS: Online

Simplified
Simplified.com A complete set of AI marketing tools for graphic design, social media, copywriting and video editing/generation in one online app.

COST: Free demo/Pricing varies across the tools/terms

TOOLS: Online

Ideogram AI
https://ideogram.ai/ an image generator that delivers results in various styles and actually produces pretty good text and clean graphic illustrations

COST: Free

TOOLS: Online

NOWAI (Alpha)
https://nowai.ai/ NOWAI is a community-centered brand in a quest to close the gap between humanity and AI. Access NOWAI from your favorite platforms. Enjoying a seamless experience across all of them.

COST: Free trial (Discord)/$8mo (Basic)/$16mo (Pro)/$40mo (Elite) *Annual pricing

TOOLS: Online/Discord/

Disco Diffusion v5.7
https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb A clip-guided diffusion model for generating detailed and abstract images written in Python. The code is free to run on your own server or colab and you can get the resources online here.

Note that DreamStudio Lite beta (below) runs on DD.

COST: Free

TOOLS: Server/Colab/3rd party tools

DreamStudio Lite (beta)
https://beta.dreamstudio.ai/ Online text to image generator portal running on Disco Diffusion model.

COST: Free (beta)

TOOLS: Online

DeepAI
https://deepai.org/ Online text to image generator, image editor and outpainter, image colorizing.

COST: Free/$5mo (Pro)

TOOLS: Online

OpenArt
https://openart.ai/ A complete range of generative AI tools, including text-image, face swap, animated GIFs, sketch to image, image blender, sticker generator, magic brush, background removal, stock image generator/modifier, image QR codes and more.

COST: Free /$12mo Starter/$24mo Hobbyist/$56mo Pro and Annual plans

TOOLS: Online

Craiyon
https://www.craiyon.com/ Basic online text to image generator and re-imager (ad based). Formerly known as DALL-E Mini.

COST: Free with ads/$5/$20mo plans

TOOLS: Online

Leonardo AI (now featuring Motion)
https://leonardo.ai/ Create production-quality visual assets for your projects with unprecedented quality, speed, and style-consistency. NEW: Motion produces stunning animated scenes from your text prompts.

COST: Free (whitelist)/$10mo Apprentice/$24mo Artisan/$48mo Maestro (annual sub pricing)

TOOLS: Online/Discord

Lexica
https://lexica.art/ AI text to image generator and search engine using Stable Diffusion. Shows results from other creators’ images as well as generates your own.

COST: Free

TOOLS: Online portal

Mage.space
https://www.mage.space/ Online text to image generator portal running on Stable Diffusion v5 model.

COST: Free/$8 Basic/$15 Pro/$30 Pro Plus

TOOLS: Online

NightCafe Studio
https://creator.nightcafe.studio/ Online text to image generator portal running on multiple diffusion models; Stable Diffusion, DALL-E 2, CLIP-Guided Diffusion, VQGAN+CLIP and Neural Style Transfer are all available on NightCafe. There is also a Discord group here in the NightCafe Lounge.

COST: Free

TOOLS: Online

ShutterStock AI
https://www.shutterstock.com/generate Online text to image generator portal from stock Image provider, ShutterStock. Seems legit and it’s free, but there are some licensing restrictions in the FAQ so please read and understand.

COST: Free

TOOLS: Online

InvokeAI 2.3.0
Watch the YouTube video first. (download installer link in description) Server-based AI text to image generator. InvokeAI now supports the brand new Diffusers Models and now has Model Merge and Textual Inversion Training.

COST: Free (public)

TOOLS: Server/Colab

Stability AI Photoshop Plugin
https://christiancantrell.com/#ai-ml The Stability Photoshop plugin allows you to generate and edit images using both Stable Diffusion and DALL•E 2 right inside Photoshop.

COST: Free

TOOLS: Photoshop Plugin (Adobe Exchange)

Blockade Labs Skybox Lab (with new sketch to image tools)
https://www.blockadelabs.com/ Skybox Lab – text to 360 panorama generator – the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. (Also see the Nadir Patch tool to make your 360 pano work in social media and online.)

COST: Free

TOOLS: Online portal

Bing Image Creator
Announcing the next wave of AI innovation with Microsoft Bing and Edge – The Official Microsoft Blog – We recently announced the integration of Bing Image Creator into the new Bing chat experience making Bing the only search experience with the ability to generate both written and visual content in one place, from within chat. And today, I’m excited to share that we are expanding Image Creator to all languages in Bing – that’s more than 100 languages – so now you can create images in your native language.

COST: Free

TOOLS: Online (Microsoft Edge)

Krea (beta)
https://www.krea.ai/home Amazing real-time text/shape/color to image generator plus image enhancement.

COST: Free (beta)

TOOLS: Online

Playground
https://playgroundai.com/ Pre-generated galleries from various contributing artists with their prompts/negative prompts. Allows you to copy the prompts or regenerate the images to your own liking.

COST: Free

TOOLS: Online

Artbreeder
https://www.artbreeder.com/ Collager: Make a simple collage from shapes and images, describe it with a prompt, and watch as Artbreeder brings it to life. Splicer: lets you create images by mixing them together and editing their genes. Create portraits, landscapes, paintings, and more — then watch as others take your art in exciting new directions.

COST: Free/$7.50mo (Starter)/$16mo (Advanced)/$31mo (Advanced)

Magenta (Google AI)
https://magenta.tensorflow.org Make Music and Art using Machine Learning. An open source research project exploring the role of machine learning as a tool in the creative process. Review the blog, papers and sample demos. You can try the Magenta Studio plugins for Ableton Live or the standalone apps.

COST: Free beta

TOOLS: Desktop/Colab/Plugins/Apps

TOOLS: Online

Flair (beta)
https://flair.ai/ This tool looks quite promising (I’ve applied for the beta). The concept is to take your product images and the AI develops slick environments/settings/lighting and renders finished images for your marketing materials.

COST: Free (Private beta)

TOOLS: Online

Calligrapher AI
https://www.calligrapher.ai/ A very simple AI generated handwriting script designer tool. Some limitations such as kerning, etc. but quick results.

COST: Free

TOOLS: Online

Patterned AI
https://www.patterned.ai/ AI seamless pattern generator for textures, backgrounds and graphics. Create new patterns or explore generated patterns you can modify further. Royalty free with attribution (Free plan).

COST: Free (10 patterns)/$10 (100)

TOOLS: Online

Scribble Diffusion
https://scribblediffusion.com/ I’d have to classify this as more like entertainment than actual useful AI generation, but it can be fun to see how messed-up the results often are! Good for stirring up your creativity at least.

COST: Free

TOOLS: Online

PromptBase
https://promptbase.com/ An extensive library/store for professionally generated prompts for only a few bucks. A great starting point to build on for the looks you want to achive. While not an actual image generator, still a very helpful tool to get what you REALLY want to create in Midjourney, Dal-E and Leonardo AI generators listed above.

COST: Varies between the prompts

TOOLS: ONLINE Resource

————

AI Tools: The List You Need Now! (Final Installment 2023) 150

AI Image & Video Enhancement Tools

^ main menu

Remini Web
https://app.remini.ai/ AI Photo and Video enhancer. Absolutely the best results for photo restoration on faces. I’ve yet to try the newest video enhancement options, but watch for those updates in my subsequent articles. New enhancements, options for background treatments and now works with Video.

COST: $18/mo (Personal)/$25/mo (Business) and more plans

TOOLS: Online

Magnific
https://magnific.ai/ AI Photo upscaler and enhancer. Advanced AI tech to achieve insanely high-res upscaling. Not only upscale, but enhance. Magnific can hallucinate and reimagine as many details as you wish guided by your own prompt and parameters.

COST: $39mo (Pro)/$99mo (Premium)/$299mo Enterprise

TOOLS: Online

Runway
https://runwayml.com/ Dozens of AI-powered creative tools to help you ideate, generate and edit content like never before. With new tools being added every week. There’s really nothing else like it – which is why it’s featured in two separate categories.

COST: Free/$12mo (Pro)/$28 (Team) and custom Enterprise plans

TOOLS: Online

Adobe Photoshop 2024
https://helpx.adobe.com/photoshop/using/generative-fill.html New AI tools inside of Photoshop such as updated Neural Filters, Intelligent Refine Edge, Generative Fill & Expand, AI Remove Tool, Sky Replacement and more.

COST: Beta available through your Adobe Creative Cloud portal

TOOLS: Desktop beta

DaVinci Resolve Studio 18
https://www.blackmagicdesign.com/products/davinciresolve Major updates adds new AI based tools, new Resolve FX, text based editing, speech to text and audio classification, new Cut page menus and Fusion USD support for faster workflows.

COST: Free (DaVinci Resolve 18)/$295 (DaVinci Resolve Studio 18)

TOOLS: Desktop App

Picture This (Intelligent Assistance)
https://intelligentassistance.com/picture-this-pp.html Call out stunning AI text-to-image artwork from Red Sequence  Markers in your Premiere Pro Sequences. Picture This… takes those markers and generates multiple image options for each prompt, and adds the selections directly to a new Sequence, with the image options on higher tracks, for Premiere Pro and FCP.

COST: $24.99/Mac App Store

TOOLS: MacOS

Simplified
Simplified.com A complete set of AI marketing tools for graphic design, social media, copywriting and video editing/generation in one online app.

COST: Free demo/Pricing varies across the tools/terms

TOOLS: Online

Latte
https://www.latte.social/Uses AI to select the most engaging parts of your video, then adds subtitles and reformats into vertical. Saving you hours of editing, and thousands in editor costs. (this one belongs to two categories as well – along with AI Video Generators below)

COST: Free trial/£10mo (Starter)/£40mo (Grow)/£90mo (Pro)

TOOLS: Online/iOS Mobile

AImages
https://aimages.ai/ Online AI video enhancement/clean/color and upres tool.

COST: Free 720p previews/€11mo (Standard)/€25mo (Premium)/€61mo (Elite) plans

TOOLS: Online

Deep Dream Generator
https://deepdreamgenerator.com/ Our AI image generator makes it easy to turn your photos or text into mesmerizing masterpieces in a variety of styles. With our advanced neural network, you’ll get stunning results instantly that keep getting better and better. Plus, our user-friendly interface is perfect for artists of all levels.

COST: Free trial/$19mo (Advanced)/$39mo (Pro)/$99mo (Ultra)

TOOLS: Online

Cleanup.pictures
https://cleanup.pictures/ It does one thing really well for such a simple tool – removes objects from photos. Using only a paintbrush tool (similar to the heal brush in Photoshop) it fills-in with AI to eliminate the selected subjects.

COST: Free 720dpi/$3mo/$7mo plans

TOOLS: Online/Mobile Apps (ClipDropPro)

Palette
https://palette.fm/ AI colorize B&W photos with various filters and/or textual input to determine the final look and feel of the colorized image.

COST: Free 500px/$6mo-$28mo HD plans

TOOLS: Online

Topaz Photo/Video AI
https://www.topazlabs.com/ A suite of AI desktop apps for auto-scaling, noise removal, sharpening and restoration of details and faces, There’s also a Video AI app that works great.

COST: $199 for the new Topaz Autopilot app. Others listed by product on website.

TOOLS: Desktop Apps

Vidyo.ai
https://vidyo.ai/ Edits-down long videos to short form videos using AI

COST: Free/$30mo (Pro)/$50 (Pro+)

TOOLS: Online

ColorGPT.app
https://chrome.google.com/webstore/detail/colorgpt/kmfgbheojkabgbebnimgibjkdpkenhhm

Generate, preview and copy beautiful colour palettes with the help of ChatGPT

COST: Free

TOOLS: Online

————

AI Tools: The List You Need Now! (Final Installment 2023) 151

AI Audio Tools

^ main menu

Adobe Podcast
https://podcast.adobe.com/ several AI powered tools to create high-quality podcasts and professional voiceovers.

COST: Free

TOOLS: Online

Cleanvoice
https://cleanvoice.ai/ AI recorded speech scrubber – removes unwanted pauses, filler words like uhh/umm and mouth sounds,

COST: Free trial/subscriptions and pay-as-you-go plans

TOOLS: Online

Krisp
https://krisp.ai/ Real time AI noise cancelling, echo cancellation, HD voice quality (great for recording Zoom interviews and podcasts).

COST: Free demo/$8mo (Pro)/$10mo (Business)

TOOLS: Online

Podcastle
https://podcastle.ai/ Studio quality AI enhanced podcast recording/editing/sharing. Also has TTS & syntheses.

COST: Free basic/$12mo (Storyteller)/$24mo (Pro)

TOOLS: Online

FineVoice
https://www.fineshare.com/finevoice/ FineVoice is an AI-powered digital voice assistant that comes with a range of features, including a real-time voice changer, unlimited audio & sound effects, and a studio-quality voice recorder. FineVoice utilizes advanced speech processing algorithms to help you enhance and change your voice in real time, generate lifelike voices, and transcribe recordings.

COST: Free trial/$10mo/$40yr/$70 Lifetime

TOOLS: Online/Mobile

Voicemod
https://www.voicemod.net/ Voicemod adds real-time voice changing and custom sound effects to every game and communication desktop apps. Several apps/modules offer Voice Changer, Soundboard SFX, Text to Song and Tuna; a sound library.

COST: Free

TOOLS: Online/Desktop (Win)/Mobile

Vocal Remover & Isolation
https://vocalremover.org Separate voice from music out of a song free with powerful AI algorithms (It REALLY WORKS!) – Also, more tools for changing pitch and tempo of the song by adjusting musical key and bpm sliders, audio editing (cut & join), BPM & song key finder, and a Karaoke recorder using your mic.

COST: Free

TOOLS: Online

Moises.ai
Moises.ai  Take recorded music and remove vocals and key instruments, change timing and key – all using AI in their web portal, desktop apps and mobile apps. Great for musicians trying to learn a part or practice with the band! Use their AI tools to make your voice sound great and even mastering capabilities and lyric writing assistance. Plugins for your DAW coming soon.

COST: Free trial

TOOLS: Online/Desktop (MacOS-PC)/Mobile Apps

Seamless Communication Translation Demo (Meta AI)
https://seamless.metademolab.com/ Uses your microphone to record and translate text in both written and audio formats. This is a research demo supporting translation of nearly 100 input languages and 35 output languages.

COST: Free

TOOLS: Online

————

AI Tools: The List You Need Now! (Final Installment 2023) 152

AI TTS (Text To Speech) Tools

^ main menu

ElevenLabs AI with Multilingual v2
https://beta.elevenlabs.io/ The best AI TTS tool out there with voice cloning and speech modifications & synthesis. Now available in several languages, Speech Classifier & synthesized Voice Library PLUS: New Diverse Default Voices and Professional Voice Cloning unlocked for Creator plans. Multilingual v2 comes with enhanced conversational capability, higher output quality, and the ability to better preserve each speaker’s unique voice characteristics across all languages. Supported languages: Bulgarian, Classical Arabic, Chinese, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hindi, Indonesian, Japanese, Italian, Korean, Malay, Polish, Portuguese, Romanian, Slovak, Spanish, Swedish, Tamil, Turkish & Ukrainian!

COST: Free/$5mo-$22mo plans

TOOLS: Online

WellSaid
https://wellsaidlabs.com/ Realistic sounding AI voices in a natural delivery for commercial, marketing and training productions.

COST: Free trial/$49mo-$199mo + Enterprise plans

TOOLS: Online

Murph.AI
https://murf.ai/ AI TTS tool with 100s of voices, styles, speed and editing options. Preview with your video.

COST: Free trial/$19mo (Basic)/$26mo (Pro)/$59mo (Enterprise)

TOOLS: Online

Play.HT
https://play.ht/ AI TTS tool with 100s of voices, styles, Voice Cloning and more. Emotions can be applied to any voice script on playback.

COST: Free trial/$32mo (Creator/Annual)/$80mo (Unlimited/Annual)/Enterprise (Custom)

TOOLS: Online

Revoicer
https://revoicer.com/ Emotion based natural human sounding AI TTS with controls of delivery and speed. Various languages and voices available.

COST: Free trial/$49mo-$199mo + Enterprise plans

TOOLS: Online

Paradiso AI
https://www.paradiso.ai/ With Paradiso AI Media Studio you can easily convert text to speech, transcribe speech to text, record your screen, translate text in multiple languages, and even create AI video avatars with human-like avatars

COST: Free trial/$25mo-$99mo plans/$250-$990 annual plans

TOOLS: Online

————

AI Tools: The List You Need Now! (Final Installment 2023) 153

AI Text & Scripting Tools

^ main menu

ChatGPT
https://chat.openai.com/chat A natural language processing AI tool that responds to your questions or commands and delivers written content in the form of explanations/answers, writing letters/email, essays/white papers, stories and scripts and even code.

COST: Free basic/$20mo (ChatGPT Plus)

TOOLS: Online

Jasper AI
https://www.jasper.ai/ A generative AI platform for creative writing, ads, scripts, letters – all in stylized and human derived written form.

COST: Free trial/$40-$100 (usage plans) plus Custom plan for business

TOOLS: Online

Google Bard
https://bard.google.com/ Similar to ChatGPT, a natural language processing AI tool that responds to your questions or commands and delivers written content in the form of explanations/answers, writing letters/email, essays/white papers, stories and scripts and even code.

COST: Free

TOOLS: Online

Copy.AI
https://www.copy.ai/ A generative AI platform for creative writing, ads, scripts, letters – all in stylized and human derived written form.

COST: Free trial/Customizable usage rate plans

TOOLS: Online

Quillbot
https://quillbot.com/ AI Paraphrasing Tool for shortening long text blocks or articles/papers and stories.

COST: Free trial/$20mo or $140 yr (Premium)

TOOLS: Online

Writesonic
https://writesonic.com/ Creative writing AI tool – from generating ideas to refining your thoughts/stories or writing articles.

COST: Free trial/$13-$600mo (sliding scale based on content) – Plus custom plan option

TOOLS: Online

Moonbeam
https://www.gomoonbeam.com/ AI Paraphrasing Tool for shortening long text blocks or articles/papers and stories.

COST: Free trial/$15mo (Pro)/$50mo ea seat (Team)

TOOLS: Online

Thundercontent/Thunderchat 
https://thundercontent.com/ AI Writing & Text chat tools and AI audio tool options.

COST: Free trial/$39mo-$63mo and Annual plans

TOOLS: Online

————

AI Tools: The List You Need Now! (Final Installment 2023) 154

AI Music Generators

^ main menu

Musicfy
https://musicfy.lol/ Use AI to create music with your voice and Leverage the latest in AI technology to supercharge your music.

COST: Free trial/$10mo (Starter)/$25mo (Professional)

TOOLS: Online

Suno.ai
Suno.ai This is the first Generative AI music creator I’ve seen to date that actually sounds like something! Type in a text prompt that includes the style and the topic and wait to be impressed!

COST: Free trial/$8mo (Pro)/$24mo (Premier) plus Enterprise

TOOLS: Online

Beatoven
https://www.beatoven.ai/ Royalty-free AI music track generator.

COST: Free basic/$20mo (Pro)

TOOLS: Online

Epidemic Sound
https://www.epidemicsound.com/ Royalty-free AI music track generator with unlimited access to our music and sound effects catalog for your videos, streams and podcasts. License comes with all necessary rights included.

COST: Free trial/$9mo (Personal)/$19mo (Commercial) plus Enterprise plans

TOOLS: Online

Soundraw
https://soundraw.io/ Royalty-free AI music track generator.

COST: Free trial/$17mo (Personal)

TOOLS: Online

SmartSound
https://www.smartsound.com/ Multilayer customizable music track generator.

COST: Pay per library/song (several packs/plans available)

TOOLS: Online/Desktop App

Aiva
https://aiva.ai/ Royalty-free AI music track generator.

COST: Free trial/e15mo (Standard)/e49mo (Pro) + annual plans

TOOLS: Online/Desktop App

Lyrical Labs
https://lyricallabs.io/ Write song lyrics with AI prompts in the style of song and artist.

COST: Free trial/$10mo (Hobbyist/$30mo (Songwriter)/$100mo (Pro)

TOOLS: Online

Jukebox (OpenAI)
https://openai.com/research/jukebox A neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.

COST: Free (Exploration)

TOOLS: Online

MusicFLow
https://www.trymusicflow.com/ With musicflow, you don’t need to be a musician or producer. All you need is imagination. Simply type in your text prompt, and let our advanced AI transform it into a symphony, a rocking beat, or the next chart-topping melody.

COST: $49mo (Creator)/$99mo (Pro)

TOOLS: Online

SoundStripe
https://app.soundstripe.com/ AI Music, Video and SFX Search engine.

COST: $9.99mo (Creator)/$19.99mo (Pro) – Annual plan pricing (Enterprise avail)

TOOLS: Online

The Orb & David Gilmour (experimental)
https://metallicspheres.io/ Collaborate with AI & other fans to create unique album art and remix tracks based on Metallic Spheres in Colour from The Orb & David Gilmour, with original artwork by Simon Ghahary

COST: Free (Exploration)

TOOLS: Online

MusicLM (Google AI)
https://google-research.github.io/seanet/musiclm/examples/ A model generating high-fidelity music from text descriptions such as “a calming violin melody backed by a distorted guitar riff”. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes.

COST: Free (Exploration)

TOOLS: Online

Magenta (Google AI)
https://magenta.tensorflow.org Make Music and Art using Machine Learning. An open source research project exploring the role of machine learning as a tool in the creative process. Review the blog, papers and sample demos. You can try the Magenta Studio plugins for Ableton Live or the standalone apps.

COST: Free beta

TOOLS: Desktop/Colab/Plugins/Apps

————

AI Tools: The List You Need Now! (Final Installment 2023) 155

AI Video Generators

^ main menu

Runway
https://runwayml.com/ Dozens of AI-powered creative tools to help you ideate, generate and edit content like never before. With new tools being added every week. There’s really nothing else like it – which is why it’s featured in two separate categories.

COST: Free/$12mo (Pro)/$28 (Team) and custom Enterprise plans

TOOLS: Online

Pika 1.0
https://pika.art/ Text to video clips/animation. Stunning results from the images/inputs you give it to create.

COST: Free (beta)

TOOLS: Online

Simplified
Simplified.com A complete set of AI marketing tools for graphic design, social media, copywriting and video editing/generation in one online app.

COST: Free demo/Pricing varies across the tools/terms

TOOLS: Online

Synthesia
https://www.synthesia.io/ AI Avatar based talking head video creator – including TTS and text editor or use your own video and graphics.

COST: Free/$30mo (Personal) and custom corporate plans

TOOLS: Online

D-ID Creative Reality Studio
https://www.d-id.com/ AI talking pictures (photos turned to talking avatars) as seen on MyHeritage. Uses TTS and template avatars, or models your photo to create a talking head video. (It’s a little creepy, but cool)

COST: Free trial/$6mo-$300mo plus custom corporate plans

TOOLS: Online

Invideo
https://invideo.io/ InVideo simplifies video creation with ready-made templates that you can quickly customize even if you’ve never done it before.

COST: Free Demo/$30mo (Business)/$60mo (Unlimited) – Save 50% for annual sub.

TOOLS: Online

Vidyo.ai
https://vidyo.ai/ Edits-down long videos to short form videos using AI

COST: Free/$30mo (Pro)/$50 (Pro+)

TOOLS: Online

FlexClip
https://www.flexclip.com/ Multi-purpose AI content creation and editing tool, including text to video & image generation, animated text/GIFs, text to speech, subtitles and more. Very intuitive and lots of templates and help to get started right away.

COST: Free Trial/$14mo (Plus)/$21mo (Business)

TOOLS: Online

Maverick
https://www.trymaverick.com/ Create personalized short video messages from your original video clip and use AI to replace the names.

COST: N/A without signup

TOOLS: Online

Descript
https://www.descript.com/ Full production AI video scripting, recording, editing and mixing tool. Script and record your talking head video, edit out the ums/uhs and add audio enhancements, mix in B-roll and graphics as needed and take edit notes from your team.

COST: Free trial/$12mo (Creator)/$24 (Pro) and custom corporate plans

TOOLS: Online

HeyGen
https://app.heygen.com Create AI animated avatars, video translations with lip sync, video templates and more with TTS Voices and more UPDATE: Video & voice cloned translations and 2.0 Avatar generation *SEE MY REVIEW HERE: https://www.provideocoalition.com/ai-tools-heygen-software-review/

COST: Free Demo/$48mo Creator/$144mo Business/Enterprise

TOOLS: Online

Colossyan Creator
https://www.colossyan.com/ Uses AI to generate TTS plus mouth movement on moving actors in either 1/2 body or face only avatars, plus allows you to design your backgrounds and text/graphics all on screen. Good idea but not the best TTS voices and they don’t really match the mouth movements as well as expected.

COST: Free trial/$21mo (Basic)/$70mo (Pro) and custom Enterprise plans

TOOLS: Online

Latte
https://www.latte.social/ Uses AI to select the most engaging parts of your video, then adds subtitles and reformats into vertical. Saving you hours of editing, and thousands in editor costs.

COST: Free trial/£10mo (Starter)/£40mo (Grow)/£90mo (Pro)

TOOLS: Online/iOS Mobile

Genmo v2 (Alpha)
https://alpha.genmo.ai/create This could actually be listed in the AI Image Generators category as it will take text to image generation requests, but more impressively, you can upload an existing image and give it a prompt to create a somewhat time-lapse animation between the original image state and an evolved/transformed state. This is Alpha so you need to sign up to test it.

COST: Free

TOOLS: Online

Adobe Express (Beta)
https://www.adobe.com/express/feature/video/animate/audio Animate stock characters by uploading a prerecorded voice track or directly record from your computer’s microphone.

COST: Free

TOOLS: Online

DeepBrain AI Studios
https://www.deepbrain.io/aistudios 99% Reality AI Avatar. Generate “realistic” AI simulation videos quickly and easily. Simply prepare your script and use our Text-to-Speech feature to receive your first AI video in 5 minutes or less.

COST: Free trial/$29mo (Starter)/Custom enterprise plans

TOOLS: Online

Lumen5
https://lumen5.com/ Creating with Lumen5 is just like making a presentation. Point and click, drag and drop, done and done.

COST: Free trial/$19mo (Basic)/$59mo (Starter)/$149mo (Pro)

TOOLS: Online

Kaiber
https://www.kaiber.ai/ Start with an uploaded image or text to create a short video animation.

COST: Free trial/$10mo (Pro) plus studio/enterprise plans

TOOLS: Online

Verbilate
https://verbalate.ai/ Video translation and lip sync software. Reach a global audience, unlock new revenue, and scale your video content production.

COST: $9mo (Starter)/$49mo (Creator)/$749mo (Producer)

TOOLS: Online

————

AI Tools: The List You Need Now! (Final Installment 2023) 156

AI 3D Tools

^ main menu

Move.ai
https://www.move.ai/ Capture HD motion anywhere using standard cameras. This technology enables motion capture in any environment using off-the-shelf cameras, mobile phones, computer vision and deep learning. Set up in minutes, shoot anywhere and perfectly digitize any movement for export into every game engine.

COST: Beta/$1 day

TOOLS: Online/Mobile

Luma Imagine 3D
https://captures.lumalabs.ai/imagine Prototype and create 3D with text. 3D models created with iPhone captures.

COST: Free (access to generation expanding to waitlist)

TOOLS: Online/Mobile App

Video to 3D API (LumaLabs)
https://lumalabs.ai/luma-api for developers of apps, games and VFX plugins, this API tool will help you achieve photorealistic NeRF renders quickly.

COST: $1 per capture

TOOLS: API for developers

Flythroughs (LumaLabs)
https://lumalabs.ai/flythroughs Create 3D Generative NeRF environments and objects and animate a flythrough animation with your iPhone.

COST: Free

TOOLS: iOS

Make Image 3D
https://make3d.app/ Converts 2D image to embossed 3D depth image using your reference image.

COST: Free/Beta

TOOLS: Online

Kaedim
https://www.kaedim3d.com/ Uses machine learning to generate 3D models from 2D art and images. Edit and export your completed 3D models.

COST: $3 (30-day limited trial)/$30 (30-day premium trial) – $150mo and up!

TOOLS: Online

Point-E
https://github.com/openai/point-e Creates an AI generated point cloud model from text-to-3D prompting.

COST: Free/Beta download

TOOLS: Local/Server

Wonder Studio
https://wonderdynamics.com/ An AI tool that automatically animates, lights and composes CG characters into a live-action scene. Pretty amazing technology – watch the demo video here.

COST: Closed beta

TOOLS: Mixed

————

AI Tools: The List You Need Now! (Final Installment 2023) 157

AI Utilities & Productivity Tools

^ main menu

CLIP Interrogator
https://huggingface.co/spaces/pharma/CLIP-Interrogator Want to figure out what a good prompt might be to create new images like an existing one? The CLIP Interrogator is here to get you answers! Upload a photo or illustration and it will provide text prompts you can use in your image generator.

COST: Free

TOOLS: Online

Namelix
https://namelix.com/ This clever AI tool helps you brainstorm a new business name (or product/service, etc.) by supplying the info prompts and delivers dozens of brandable ideas in return. I think it does a great job.

COST: Free to generate with multiple registration fees and professional services.

TOOLS: Online

Checksub
https://www.checksub.com/ Subtitle, Caption and Translate your videos in minutes. Reach more people with your videos thanks to the Checksub platform. Add subtitles, translate and dub your videos automatically.

COST: Free trial/$12mo (Junior)/Business plan available

TOOLS: Online

Brandmark
https://brandmark.io/ From the same people who make Namelix, Brandmark develops unique and brandable logos.

COST: Free to design/$25 basic (one-time)/$65 Designer/$175 Enterprise

TOOLS: Online

Looka
https://looka.com/ AI Logo design generator with customization and previews in use.

COST: Free to design/$20 (Basic logo package) to $192 (Brand Kit Web Subscription)

TOOLS: Online

Watermark Remover
https://www.watermarkremover.io/ Magically removes embedded watermarks using AI. (Useful for AI generated images that include watermarks or stray text). – I do question the ethical nature of this as a photographer though.

COST: Free/$15mo (1 member) up to $159mo (team up to 50)

TOOLS: Online

Ocoya
https://www.ocoya.com/ Uses AI to help you design, write and create content for social media including scheduling ad runs and posts all in one location.

COST: Free trial/$29mo/$89mo/$299mo

TOOLS: Online

PromptBase
https://promptbase.com/ An extensive library/store for professionally generated prompts for only a few bucks. A great starting point to build on for the looks you want to achive. While not an actual image generator, still a very helpful tool to get what you REALLY want to create in Midjourney, Dal-E and Leonardo AI generators listed above.

COST: Varies between the prompts

TOOLS: ONLINE Resource

————

AI Tools: The List You Need Now! (Final Installment 2023) 158

AI Resources

^ main menu

There’s an AI for that
https://theresanaiforthat.com/ An overwhelmingly current directory all things AI. There’s also a community page you can join to contribute or have discussions/questions.

A very unofficial Midjourney manual
https://docs.google.com/document/d/1ivAYy_JXJsGE-9Rh97iMyXkWlmF_MxO2NFshrIvuns4/edit Word Doc that’s always updated that has tons of info on using Midjourney, with prompting tips, instructions and more.

Emotion Prompts
https://rentry.org/yfcd5 A simple list of accepted emotional prompts and emojis.

Have I Been Trained?
https://haveibeentrained.com/ Are you an artist or photographer with lots of images online? Is your work in part of an online stock library? Use this tool to add your image URL or upload it to see if it’s been used in one of the machine learning libraries.

AI and Music: What Musicians Should Know!
https://diymusician.cdbaby.com/music-career/ai-music/ Can artificial intelligence help you find gigs, write a press release, or design your album cover? Could AI even finish a lyric you’ve been stuck on for months or plug holes in your music marketing strategy?

Alpha Signal – Weekly Summary of Machine Learning
https://alphasignal.ai/latest-summary Deep level news about what is hot, what’s new and what’s changed or updated in AI apps and technology.

Flawless AI
https://www.flawlessai.com/ A world-first system unleashes the power of VISUAL DUBBING. Opening up a new world of possibilities, from fast and efficient AI RESHOOTS to the creation of immersive, authentic, perfectly lip-synced VISUAL TRANSLATIONS. To keep a PG-13 rating, the creative team behind 2022’s Fall had to digital replace expletive-filled dialogue. But the technology behind it is crucial for an international market.

What is Google Colab: A Beginner’s Guide
https://bytexd.com/what-is-google-colab-a-beginner-guide/ In this article we’ll cover what Google Colab is, in a beginner friendly way, how to use it, and some of its most popular features so you can get started more easily with this platform.

How to use Stable Diffusion for Free on Google Colab
https://decentralizedcreator.com/use-stable-diffusion-for-free/ Unlike, other AI text-to-image models, you can use Stable Diffusion on your PC. However, it requires a basic knowledge of Git and Miniconda3 installation. Also, a system requirement 3xxx series NVIDIA GPU with at least 6GB of RAM is required to get good results from Stable Diffusion.

How to use Disco Diffusion for Free on Google Colab
https://decentralizedcreator.com/use-disco-diffusion-for-free/ Disco Diffusion is one of the machine learning models that convert text to image. With Disco Diffusion, you can not only create images from prompts but also videos! This AI art generator is similar to Stable Diffusion.

Disco Diffusion Resources Download
https://github.com/alembics/disco-diffusion A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations.

Disco Diffusion v5.7 Updates Download
https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb

Vid2Avatar
https://moygcc.github.io/vid2avatar/ 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

MuseNet (OpenAI)
https://openai.com/research/musenet A deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. MuseNet was not explicitly programmed with our understanding of music, but instead discovered patterns of harmony, rhythm, and style by learning to predict the next token in hundreds of thousands of MIDI files. MuseNet uses the same general-purpose unsupervised technology as GPT-2, a large-scale transformer model trained to predict the next token in a sequence, whether audio or text.

AI-Duet (Google AI Magenta)
https://experiments.withgoogle.com/ai-duet 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

Google Deepdream
https://github.com/google/deepdreamhttps://github.com/google/deepdream

This repository contains IPython Notebook with sample code, complementing Google Research blog post about Neural Network art. See original gallery for more examples. You can view “dream.ipynb” directly on github, or clone the repository, install dependencies listed in the notebook and play with code locally.

Gen-1 (Runway Research) The Next Step Forward for Generative AI
https://research.runwayml.com/gen1

OpenAI Whisper
https://openai.com/research/whisper trained and open-sourced a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.

Seyhan Lee – Cuebric 
https://seyhanlee.com/ Cuebric allows filmmakers, concept artists, and production studios to create stunning, fully segmented, inpainted, virtual environments in seconds. A quick and cost-effective alternative and partner to CGI VFX production processes.

————

New Technology & Video Resources

NEW! DreaMoving sandbox
https://huggingface.co/spaces/jiayong/Dreamoving

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
https://vcai.mpi-inf.mpg.de/projects/DragGAN/

Getting Started in Midjourney

Midjourney Settings Explained – EASY Guide with Sample Images

Midjourney AI – First Steps – Easy Beginner’s Guide

ChatGPT & Midjourney: One Prompt to Transform Imagination

Make Better Prompts – Midjourney AI

Stable Diffusion Version 2: Power to the People… for Free!

A Beginners Guide to Machine Learning with ml5.js

AI and Image Generation (Everything is a Remix Part 4)

Promptcraft: Learn the Language of Prompt Writing for Generative Creatives
https://amoration.square.site/

————

UPDATE: AI Tools Part 2: A Deeper Dive

UPDATE: AI Tools Part 3: The Current State of Generative AI Tools

UPDATE: AI Tools Part 4: The Latest for Video, Film Production, 3D, Audio and more…

UPDATE: AI Tools Part 5: 2023 A Year in Review

]]>
https://www.provideocoalition.com/ai-tools-the-list-you-need-now/feed/ 14
Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team https://www.provideocoalition.com/exclusive-interview-with-james-cameron-and-the-avatar-2-editorial-team/ https://www.provideocoalition.com/exclusive-interview-with-james-cameron-and-the-avatar-2-editorial-team/#comments Thu, 19 Jan 2023 18:21:04 +0000 https://www.provideocoalition.com/?p=262289 Read More... from Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team

]]>
Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 165In an exclusive edition of Editors on Editing, Glenn Garland interviews the editorial team behind the new blockbuster, Avatar: The Way of Water. Read the transcript or subscribe to our podcast, “Art of the Frame.”

AVATAR: THE WAY OF WATER


Glenn: Welcome to the Editors on Editing podcast in collaboration with American Cinema Editors and Pro Video Coalition. I’m Glenn Garland and I’m joined by James Cameron, Stephen Rivkin, and John Refoua.

James’ credits as director, writer, and producer are legendary. As editor, his credits include Titanic, for which he was co-nominated for the BAFTA and won the Eddie and Oscar, Avatar for which he was nominated with Steve and John for the Eddie, Bafta, and Oscar.

Steve’s credits include My Cousin Vinny, The Hurricane, Ali, Pirates of the Caribbean: The Curse of the Black Pearl for which he won the Eddie, and the two Pirate sequels: Dead Man’s Chest and At World’s End for which he was co-nominated for the Eddie, Avatar, and Alita: Battle Angel.

John’s credits include Reno 911, CSI: Miami, Avatar, Olympus has Fallen, The Equalizer, Southpaw, The Magnificent Seven, and Transformers: The Last Knight.

Now they have collaborated to bring one of the most mind-blowing epics of the year: Avatar, The Way of Water.  Jim, John, Stephen, it’s such a pleasure to have you guys. I just loved what you guys did with Avatar. I was just blown away.

Jim: Oh, thanks.

Glenn: Thank you so much for joining me.

Jim: It’s our pleasure.

Stephen: I want to add: There’s a missing member of our crew, David Brenner, who we lost last February. And he is very much here with us today.

Glenn: Yes. I wanted to talk to you a little bit about that. There was a beautiful mention of him at the end of the film. He was an amazing editor, a beautiful man. It’s a real loss.

Jim: He was a critical member of the team. And did actually, some of the most interesting and difficult scenes in the film just happened to fall on his plate.

It might be good to just get an overview of the process so that some of the things we mentioned or talk about have a context. I think it behooves us to talk about how our process diverges—and is similar.

Stephen: Yeah, I’ve got to say, there is a point where making a movie like this becomes like making other movies, but everything that comes before that is completely different. All of the performance capture that’s based on reference cameras of the actors, and the edit that goes along with it, and all the preparation for virtual photography is quite unique to our workflow. And that could be years in a preliminary process, but only then when Jim picks up the virtual camera, does it become more like a conventional edit.

The basics of performance capture and building a cut using the reference cameras was something we perfected by the time we finished Avatar one. Jim likes to joke that by the time we were finished, we figured out what we were doing.

Jim: Technically, the quote is, “The only time you’re qualified to make a movie is when you’ve just finished it.”

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 166
Concept art from 20th Century Studios’ AVATAR 2. Photo courtesy of 20th Century Studios. ©2021 20th Century Studios. All Rights Reserved.

Stephen: That’s right, right. So I thought we were going to take all that knowledge and things were going to be easier on Avatar 2. Well, of course, Jim wrote a script that involved many, many more characters. And combining live action with virtual characters, that complicated the process exponentially.

But that’s getting a little ahead of the game. So for now, I’ll just say that the basic performance edit is John, David, myself and Jim, looking very carefully at the reference cameras. Now, these are reference cameras of the actors’ performances. They are not shots, but they are reference cameras.

And most of the time, every camera would be assigned to a character. They would shoot a big close-up of their face because it is essential to have good facial reference of what’s being recorded in the volume. Now, the volume is a big studio stage that’s empty, and these actors are sort of black box acting.

It’s Jim and the actors not worried about shots or dollies or lighting or anything. It’s just purely director-actor relation. Right there. And when we get into virtual photography, it’s very liberating because he doesn’t have to concern himself with the performance because we have picked all of the very best from every actor.

Jim: From a directorial standpoint, I understood very early on that the reference cameras were absolutely critical. Our reference camera team I should point out, on Avatar 2, was 16 camera operators.

Glenn: Oh wow.

Jim: We always shot between 12 and 16—these were HD video that had a very long zoom lens so that these guys could zoom in. Typically they would be handheld and they would move around as a very agile crew, limited only by their cabling.

It was the funniest thing, because I’d come in in the morning, I’d rehearse with the actors. We’d just kind of noodle around for a while. And then once I started to lock in on a staging, and the second I defined an axis, they would all move like a flock of geese to one side of that line and start picking off their angles that they knew were gonna be needed. So somebody would grab what we called the stupid wide, just so that we had a reference point for everybody. But the key to it was that I knew standing there as a director that later wearing the editor hat, if I didn’t have that closeup I wouldn’t be able to evaluate the actor’s performance. And this is all about evaluating a performance that later would run through a big complicated system and come out the other end as the actor’s performance translated into a CG character.

So my role there as a director was not only to get the performances, but to make sure that they were shot in a way that the editorial department could use. And the critical thing to understand here is I’m not shooting the final images for the scene at all.

I’ve got that in the back of my mind. It’s informing what I’m doing. But that’s not the goal. The goal is to preserve those performances in an edit friendly way, so that we can look at them in the cutting room, see where everybody was, see what they were doing, and have a tight closeup of every single split second of the actor’s work.

Glenn: Yeah. And the performances are so excellent throughout. I know that sometimes people get worried about mocap not feeling real, but there were a lot of emotional scenes and you really felt for the characters.

Jim: Thank you.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 167
Director James Cameron behind the scenes of 20th Century Studios’ AVATAR 2. Photo by Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

John: Yeah. I think one of the things that’s really important to recognize in this movie—all these movies—is that the actor’s performance is what drives everything. It’s a one-to-one—as much as possible—relationship between what the actor did on the empty stage and what you see on the screen with all the effects going and everything like that.

And, for everything that you see on the screen, there was an actor that did that.

Glenn: So nothing’s manufactured.

Jim: No, no.

John: It’s not animated. I mean, the animation is part of the process of all the backgrounds and all that, but the actors’ performances are a genuine one-to-one ratio with what the actors did.

Glenn: Fantastic.

Jim: Animation plays a huge role in the movie when it comes to the creatures: the flying creatures, the swimming creatures, how they jump, how they move, the forest creatures, all that.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 168That’s pure keyframe animation. But anything that one of the humanoid characters is doing, any one of the Na’vi or the Recoms, that was performed by either an actor, or a body double, or a stunt double, and it’s a human performance no matter how outlandish the action appears to be.

It’s a human performance.

Glenn: Yeah, that’s, I’m sure why it just feels so real and so emotional.

Jim: Exactly.

Stephen: After a very careful analysis of all the dailies of the reference cameras, we build an edit that represents that scene. And we’ll do it with a combination of just cutting the reference—as Jim said, the dumb wide shot—just to show where everybody is.

And then we’ll use those cowboy size shots and closeups to try and emulate an edit of the scene. And we put sound effects and music in and everything, and try to build something that Jim can look at and say, “Okay, let’s turn that over to our internal lab for processing.” And it may involve combining takes from one actor from Take one, one from Take five… It may involve stitching different takes together to create something that represents all the very best. There is one virtual camera that Jim usually shoots during the capture process that shows the environment, that shows the characters in their virtual form. So we do a combination of peppering all this stuff together to create an edit for each scene.

John: Yeah, I mean, it took us a few months to figure out what the hell was going on…

Jim: The thing is that you got 16 cameras operating. So what I’m seeing on stage is what we call a matrix.

And the matrix is basically a multi-screen of 16 images. I have to look all over that 16-image matrix to follow a given character, and then we have to go through that process again in the cutting room.

So, the very first step is to look at that matrix, and then the assistants will break it out into individual quads where you can see four images. And we’ll go through the quads and we’ll look at performance playback, basically. Right now we’re just looking at raw, performance coverage. So then I make all my selects. It’s like, “Okay, I like what Sam did there. I like what Zoe did there.” And then maybe we’ll have a discussion, and this will be done as a group thing with Steve and John and David. And we’ll say, “Well, we could, maybe stitch Sam to Sam.”

Like, “We really liked the way he did the first five lines. The scene started strong in take three, but then in take five he did this amazing thing. Let’s see if we can find a place to stitch Sam together across those.” And then of course it’s up to everybody to figure out the logistics of keeping it all in sync so that the other actors who might also be from different takes are in a reasonable sync with each other.

This is kind of similar to what you do when you cut normal picture. That you’re using an actor from different takes to create the full scene. But we’re doing it in a way that is shootable. That in theory, I could come in and do it as a one-er. Right? So in theory I could come in and do some steady cam shot that just never stops.

And Steve may say, “Well, that closeup is so good, that wants to be a closeup moment. And if I’m willing well before the fact to commit to that being a closeup, that frees us up to create a sub master on either side of that closeup that could be done as a one-er up to that point, and then as a one-er after that point. And that that takes pressure off us trying to build these hellacious constructions.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 169All the constructions are truthful in the sense that we don’t mess with what the actor did, but we try to order it in a way that we’re getting the best out of the actor. And by the way, you might have six actors in a scene—like that beach scene that had all those characters in it.

Glenn: And those 16 cameras are capturing those six actors at that time.

Jim: Yep.

Stephen: You can imagine the logistics of accounting for every character in a section when they come from different takes and making sure that they relate to each other in a similar timing that they did in the original native take.

Imagine it as a multi-track, and each character has their own track and you’re trying to line them up to each other. Now, Jim has the ability to slip their relationships downstream. If he wants an answer to a line to come quicker, he can actually manipulate the time.

Jim: When you’re in an over and the actor took a long time to respond, you cheat a line onto the back of somebody’s head to tighten up the timing. It’s the same thing. You know, like taking a three second pause out of a scene.

Glenn: It almost sounds like three dimensional chess…

Stephen: Yeah.

Glenn: …to figure this out.

Jim: It’s four dimensional chess.

John: Yeah, we haven’t even gone through some of the other stuff that’s going to happen once we get past this stage of the process.

Stephen: So, downstream, let’s say months later, after all of this preliminary work is done and Jim is signed off on these performance edits, we’re building the film in performance-edited form, so there’s always something represented once we get the whole picture laid out. And then it’s a process of shooting these scenes for real with the virtual camera—and creating the shots that go in the movie.

And this is where Jim explores the scenes. And any reference shot can become a closeup, a medium, a wide shot, a crane shot, a dolly shot. And as he mentioned before, every actor has their best moment, that can be the closeup or a wide shot, so you don’t have to worry about, “This was their best take, but the dolly guy bumped the dolly,” or the lighting wasn’t great. None of that exists yet.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 170
Director James Cameron and actor Sam Worthington behind the scenes of 20th Century Studios’ AVATAR 2. Photo by Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

Glenn: So are the actors then reperforming it?

Jim: They’re gone. They’re on other movies. They’re hanging out in Tahiti.

John: At that point you’re basically using the reference camera edit that you have to generate an EDL for the lab to pull up their files of the motion capture and assemble them as in the order of the edit. The motion capture files are actually being accessed, and they’re put into the right environment and all that. And then at that point, Jim can come in and shoot his virtual camera, basically.

Jim: Right. So on the day that we capture with the actors, what we get is a bunch of data. We get body data and we get facial data.

So we just use the reference cameras as a kind of intermediate language between us and what we call our lab. But our lab are people that will take those performances and put them into a landscape or into a set that’s a virtual set, meaning it’s 3D geometry that the virtual art department has created.

It could be a forest, it could be a ship, right? It could be an underwater coral reef. So they’re taking the data that’s been captured and they’re putting it into the environment, and then they’re going to manipulate it so that feet and hand contacts are worked out properly, and it’s all lit beautifully.

And then I’ll go out into an empty volume. There’s literally nothing there. And I’ll just walk around with the virtual camera and handhold all the shots for the scene. And that’s what Steve called the exploration. Like I could do a helicopter shot that starts a thousand feet out and fly in and land on the bridge of Sam’s nose for a line of dialogue. I could do anything I want.

Glenn: Woh.

Jim: At that point it becomes the creative process of photography. So the lighting, the camera movement, all those things are now being thought of. Stuff that I would normally—as a live action director—be doing at the same time as my performance stuff. Basically what we’ve done is we’ve separated the act of performance with the actors, from the act of photography.

Sometimes this is months—or even years, literally years plural, later. I’ll go back into the scene with my virtual camera. Then it becomes the second stage of editing. And what we do is we have either Steve or John or David—when he was with us—would sit in an open Zoom session with me all day long while I go through these cameras and I pick them off one by one. And at this point, John would constantly be sort of tugging up my sleeve—metaphorically speaking—and saying, “You know, this load is really for the moment where Jake does this or that.” And he’d remind me because I’d get so fascinated by the image I’d forget what the performance moment was for. So it becomes a highly collaborative process with the editor right there at the moment that the shots for the movie are being made.

And even Steve or John or David would say, “Hey, you know, if you ended the shot on this character, that would make a better cut to this other character over here. So I’m keeping my editor hat on while I’m directing.

They’re putting a little bit of a director hat on while they’re supporting me and there’s no giant crew and a thousand extras standing around. It’s just me and five or six people at what we call the brain bar who are running the system. And me and the virtual camera and the editor—who’s in by Zoom usually, just because I was in New Zealand and they were working out of LA. When we didn’t have COVID and we could all be in one place, the editor would literally just stay on the set with me and support me from 10 feet away.

Glenn: Wow. Fascinating.

Stephen: I want to interject something. This director-editor interaction is quite unique and happens very seldom in movie making, where you’re as an editor able to participate in the shots that are being created and actually assembling them as they’re being shot. And Jim is looking at the progress of these scenes and getting ideas of how he may approach shooting it, maybe in a different way.

And it’s quite amazing to see the evolution, the way these scenes develop during this virtual camera process. And it is very gratifying to be involved in the actual shot creation process.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 171
Director James Cameron and crew behind the scenes of 20th Century Studios’ AVATAR: THE WAY OF WATER. Photo by Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.scenes of

Jim: Well, sometimes stuff doesn’t cut the way I think it’s going to when I create two virtual shots. I’ll say, “Hey, just cut that together and make sure that works.” And then we try it and it’s not all we hoped and dreamed. And so then I’d go back out and I’d try a different approach that maybe cut better. And sometimes we’d wind up—as you always do—with competing cuts of different sections of a scene. Steve, do you remember the Quaritch self-briefing scene where Quaritch is watching the video of himself, and how many gyrations we went through to get that to cut smoothly?

Stephen: Yeah, absolutely. And one other thing is that during this process, there are many times where you will say, “I want to bifurcate my idea,” or even, you know, two, three, four different ideas and cutting patterns. We would develop all of these ideas and present a slew of alternate sections and edits for scenes for Jim to look at that would follow each of his explorations into how to cover and cut a scene, which is also very unique.

Glenn: Yeah. Almost choose your own adventure.

Jim: This is all happening in this kind of real-time collaborative way. That’s pretty cool.

Glenn: Yeah, I mean, it sounds a little bit like animation cutting in a way where you’ve got the storyboards and you keep altering them to find the very best film. But it’s happening immediately.

Jim: And it’s not storyboards, it’s actual shots. At this point, we’re looking at an image that’s probably, I would say about half real, like half or maybe one third of reality, reality being our final goal. But we have what we call Kabuki face.

And the Kabuki face is a projection map of the actor’s facial performance camera, projection-mapped onto a blank model of the face. So we actually see the expressions, we hear the dialogue, we see eyes flicking from one character to another. We see everything we need to see from an editorial standpoint and it’s the characters, actually. all up fully clothed with all their props in the environment, properly lit with shadows, with smoke, fire, rain, all that sort of thing. It’s just not photoreal yet, but it’s a shot. It’s not previs, it’s an actual shot and it’s the actual performance.

The editors are getting it as a stream within a couple of minutes, and then by the next day, those streams are cleaned up as proper stage renders, and that’s what we live with. Sometimes we live with that for a year or more until we start getting the stuff back from Weta. And it’s an excellent proxy. This is basically, very finitely—almost to the pixel—the final shot. It’s just not a photoreal render.

Glenn: Incredible.

John: And that is what we call that template, the shot that Jim comes up with on that stage. We carry that in our timeline all the way through. And Jim refers back to it every once in a while when shots come in from Weta as to, “Why is it four pixels over?”, to that precise a degree. Weta tries to match that template basically, that Jim shot on the virtual camera.

Glenn: Mmm.

John: And the great thing about that day when we’re shooting cameras is that the performance is the same. So we have a recording of the performance and you can play back precisely the same thing over and over again until you get the right shot the way you want it to be.

Glenn: Wow.

Jim: But the cool thing here, at that moment in time when I’m standing there on that stage working with the editor, going through the virtual cameras, we can literally say, “Hey, would it work better if this scene was a night scene? I know we conceived it as a day scene, but what if it was a night scene?”

And then they’ll go off for a couple hours and come back and then all of a sudden it’ll be a night scene. And it’s like, “Well, what if it was raining? Would that add something? Would that add a sense of somberness or whatever?”

So they’ll just add rain and so we have all this “authoring” capability at that stage. It’s highly creative. And then we gotta think about, “Okay, what are the consequences if we’ve just been at a dusk scene?” So it has a ripple effect through the film.

And we were making some pretty radical decisions. Doesn’t change the actor’s performance at all, but it definitely changes your perception of the movie.

Glenn: Mm-hmm.

Stephen: Try doing that with a live action film.

Glenn: Yeah. And then how do you deal with the water? I’m assuming that’s not CG…or is that CG? You guys are shooting in tanks? Or how’s that being done?

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 172
Director James Cameron on set of 20th Century Studios’ AVATAR 2. Photo by Mark Fellman. © 2021 20th Century Studios. All Rights Reserved.

Jim: Well, we would capture in a tank, and it’s critical to remember it’s capture not shooting. Although we are shooting reference angles, we’d capture in a tank. So everything that you saw somebody doing—jumping in the water, swimming at the surface, diving under, riding a creature—that was all performed. We had creatures that were machines that were piloted, machines that people could ride at high speed around the tank and surface and dive in the water and all that sort of thing. So everything you see a character doing was performed by a human being, in the tank. So we had a big capture tank that was a hundred feet long.

We could generate a six foot wave, we could crash it onto a shoreline. We could build coral sets underwater and all that sort of thing. David for some reason drew the short straw. John and Steve let David figure out the first big underwater scene, which was very daunting because we hadn’t done underwater capture before. And we had all kinds of different performance stuff. And sometimes we’d take a piece of swimming that was you know, Bailey Bass, the young actress that played Tsireya. And we might like one of the stunt doubles for one of the other characters.

And we’d be making these kind of “mashups.” But the difference is, [in a traditional capture volume] they’ve got floor marks and they’re located spatially by a physical set. When you start combo-ing people together that are in free space, sometimes they crash through each other, and the lab has to kind of sort it out so that it all works.

So oftentimes David would propose something. He’d say, “I think this could be really cool,” and then we’d give it to the lab and it would turn into kind of a mess, and then the lab would kick it back to David and he’d make some timing adjustments. Or he’d take somebody from a different take. So it became a refining, like a closed loop between editorial and the lab before it got to me. They would solve all these problems so that when it got to me, I wasn’t bogged down for hours and hours trying to solve technical problems.

Glenn: Wow.

Jim: Wouldn’t you say that’s accurate? In terms of the underwater stuff that you guys put together?

John: Yeah.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 173
(L-R, Front to Back): Zoe Saldana, Sam Worthington, Kate Winslet, and Cliff Curtis on the set of 20th Century Studios’ AVATAR: THE WAY OF WATER. Photo by Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

Stephen: It should be noted Jim was pioneering something that I don’t believe had ever been done before. This idea of doing performance capture underwater. It was kind of an insane thing. And of course it was proposed to him shooting dry for wet and hanging actors on wires, moving them.

I think we did a low tech version of this in Avatar 1, where Sam was in a river and his character Jake was floating through some rapids and they put him on a chair with wheels and had him float down. And the capture worked great, but this involved swimming and diving and riding creatures.

And it was a challenge to get good facial reference. I mean, they had underwater reference camera operators and underwater cameras. They also had windows on the tank that they could shoot through to shoot the reference of the motion that was being captured.

There was a whole layer of complexity that went along with both capturing underwater and editing the reference from the underwater capture. This was something brand new to Avatar 2, and it was monumentally challenging for the entire crew.

Jim: One thing we haven’t talked about is the use of picture in pictures and how you guys had to become the masters of multi-screen formats with sometimes up to, I don’t know, maybe like eight or ten picture in pictures. In the same master image.

Stephen: Yeah. Sometimes accounting for all the characters and showing their facial performance… Because we didn’t know who Jim wanted to be on at any one given moment, so it was good to be able to look at those reference cameras. John, do you want to talk about building those PIPs for the scenes? When we built performance captures?

John: Yeah. The PIPs are very valuable because it’s hard to see what everybody’s doing. So there would be a picture-in-picture at a smaller size of every actor that’s in that scene. And if they’re from a different take, then that would be indicated. So a lot of times when we’re looking at things, I would say, “Jim, make sure you like this.”

And we’d look at one of those squares, either the whole performance of Sam throughout the whole thing, or the whole performance of Zoe. Because as you progress to the scene, you’re cutting back and forth and maybe you miss something while you’re cutting away, of something the actor did.

And so, those picture-in-pictures were useful in pinpointing the exact time that you want to be at. You don’t want to miss that, for example, if you saw something you liked.

Stephen: Yeah, well, it would remind Jim of the performances that we had selected together, months, sometimes years before. And he’d rewatch the performance edit and know what it was that was important to him.

And when he picked up the virtual camera he’d say, “Okay, I know I want to be on Neytiri for this closeup because I can see in that PIP that her performance is extraordinary.” So everything may revolve around a moment in a scene. And the picture-in-picture is important when we’re dealing with two different takes from different actors.

So you know that you want to be in take B5 for the Zoe moment where she says ‘this.’ So it really does inform what to shoot when we get to virtual. And again, I’ll say, “Any shot can become a close up,” but also Jim’s very big on having foreground backgrounds, so we can never as editors depend on the fact that this will be a single close up. Because quite often that performance is going to be stacked against someone in the foreground. So we had to account for everyone.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 174
Director James Cameron behind the scenes of 20th Century Studios’ AVATAR: THE WAY OF WATER. Photo courtesy of Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

Jim: When I do the virtual cameras, now, we got dailies. Now we got a ton of dailies. And basically it just exists as a big string out.

Glenn: And you cut that before it goes to Weta?

Jim: Oh, absolutely. Yeah. So what we call the template is a finely edited scene. And I mean to the frame, because Weta does not give us handles. So we have to know that our cut is a hundred percent frame accurate. That’s the movie that we’re making. Before we hand it to Weta, we have to love it.

Glenn: And why no handles?

Jim: Well, because it costs money.

Stephen: Yeah, it’s like something over a thousand dollars a frame.

Glenn: Oh, wow.

Stephen: You don’t really want to ask for 24 frame handles on each side.

Jim: Well, it’s over a million dollars a minute for their finished process. I won’t say how much over, but the point is that we made up our minds a long time ago that our template process could be precise enough that we didn’t need handles.

I mean, every once in a while when you see the final render, you see some tiny nuance of performance that makes you want to extend a shot by six frames just for an expression to resolve that was a bit too subtle when working in our proxy environment. But I think we might’ve done that six or eight times across the whole 3000 shots. And so typically what would happen is that the three editors—John, Stephen, David—would propose various versions of a scene and I would get into it and I’d play around. This is me now putting on my editing hat for the first time really, because I didn’t deal with the performance edits and getting it all teed up for the virtual process.

I’ll look at their proposed cuts and then I’ll pick and choose from different cuts. Every once in a while I’ll go right back to the dailies and I’ll do my own selects because there’ll be maybe things I know I’ve shot that I had an idea for, and I’ll do my own little couplets and triplets and sections and then I’ll mash it all up.

I’ll take their cuts and my cuts and I’ll mash it all up into a few-ish cuts, and then I’ll bring them in and I’ll say, “Well, what do you think of this? What do you think of that? What do you think of that?” And we’ll talk about it. And they’ll say, “I like this version better.”

And I’ll go, “Okay, all right, let’s do that.” So then I’ll take that version and then that’ll kinda stand for a while. But sometimes I’ll give it back to them and say, “Well, why don’t you try to perk this up? Or play with a music crescendo that maybe moves the picture cut a little bit.” So it becomes a back and forth looping process.

And then we’ll wind up with a fine cut out of that. And then that’s what gets turned over to Weta effects.

Stephen: I was just going to interject one thing. The fact that we had a principal character who would ultimately be human, required an extra layer of basically capture and editing. So the Spider character was captured along with the other characters in a scene and acted his whole part.

And he was quite a bit younger, and his voice hadn’t changed yet, but Jim worked very closely with us signing off on the CG version of Spider in every scene that he was in. And that became a blueprint for the live action photography, so he actually acted every scene at least twice.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 175Jim: Right. So we already had a cut of the scene. We already knew all the shots, and now we just had to shoot Jack for real, instead of having Jack’s captured performance. So we had Jack’s captured performance at the age of 13, and we were shooting Jack at the age of 15. But it was the same moment, so these were the actual final performances of all the other actors that appeared as Na’vi or avatars. They were the final performances. So Jack had to hit his marks and he had to hit eyelines for performances that we weren’t going to change because we had captured those with our key cast a year and a half earlier.

It was an interesting process. Because it was extremely precise, they could take my virtual camera in that rough edit and actually program the techno dolly with it. And that could be our shot. But it was a very time-consuming process. I’d say we spent probably 40% of our time on 20% of the movie, doing the live action stuff.

There’s one other thing we haven’t talked about, which is FPR, which is kind of interesting. Do you want to mention that Steve?

Stephen: Well, you know, in traditional movie making, there’s ADR and we had the capability of doing what we call FPR, which was facial performance replacement. So there may be an instance where Jim decided that he wanted one of the principal actors to say something else, either because of story clarification, or we cut something out and we wanted to take a line that was removed elsewhere and put it in an another scene so that the idea was still represented.

And the actor could come into our studio, put on a face rig—exactly the same face rig they wore when we did the capture. And they could say a different line and we could put it on their body. Unlike ADR, where if you were going to change a line, you’d have rubber sync, they could not only restate the line in its new form, but all the expressions, the eyes, the mouth, every facial nuance would be recorded and basically planted on their own body again. And then we’d turn a separate face track over to Weta and they would process it. And it would look like it was the native performance. It was an amazing process.

Glenn: Wow.

Jim: We figured, ‘Why try to sink to a piece of facial capture and then not know if it was good sync until we went through the whole process with Weta?’ Let’s just replace the whole face sometimes just to make sure that we had good sync.

The other thing is that when you do a live action movie with stunts, the stunt players gotta kind of hide their face. Well, we don’t have that problem. We can use FPR to have any actor put their actual facial performance, their grunts and groans, and efforts and yells and all that onto a stunt performance. So all they have to do is match the physicality and timing of the stunt player, and then it’s actually Sam, or it’s actually Zoe, or it’s actually Sigourney.

And that way I could literally go from a stunt right into a closeup in one shot.

Glenn: Wow.

Jim: FPR turned out to be a powerful tool—not one to be abused. Obviously you never want to go against the intention of the actor’s performance, but I won’t let them do action that might injure them. And we almost always do any kind of body doubling with the actor right there. The actor could say, “Hey, you know stunt person so-and-so, can you do it a little bit more like this?” So that it’s the way they would do it. I gave Sigourney a lot of latitude to instruct her body double, a performer named Alicia Vela-Bailey.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 176
(L-R): Sigourney Weaver as Dr. Grace Augustine, Director James Cameron, and Joel David Moore as Norm Spellman behind the scenes of 20th Century Studios’ AVATAR: THE WAY OF WATER. Photo courtesy of Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

Because there were a lot of things at 70 that Sigourney couldn’t do like a 14 year old or like a 15 year old. I said, “Sigourney, you direct Alicia. And make sure that what she’s doing is your interpretation of the character.” And so Sigourney really liked that idea. It took the pressure off her to not necessarily try things that she couldn’t do.

Sigourney did maybe 85% of it herself, because she had a full body facial performance. She played the character, but every once in a while…I mean, she couldn’t lively hop up onto a log at 70. So she would tell Alicia what she wanted to do and then she would FPR that moment.

Stephen: One of the things I want to interject here: Traditionally in filmmaking when you have a stunt, you would have a wider shot show the stunt, have a little bit of overlap of action and cut to the actor, right? So in this instance, we could actually create a stitch to the actor, and have it all one piece with their face planted on the stunt body so that we don’t force a cut.

It could be all in one shot. Now that’s pretty revolutionary.

Glenn: Mm-hmm.

Jim: We might find later that a cut works fine there, but the way we built these loads was to try not to force a cut any more than we had to, because that gave me wearing my director’s hat a lot of freedom for how I shot the scene.

Stephen: By the way, if all of this sounds complicated, it is.

All: (Laughter)

Stephen: I gotta say I don’t think there’s any more complicated way to make a movie than what we’ve done. And hats off to everyone involved because you’ll look at it on the screen and you don’t know how it’s done and you appreciate it for whatever it is, but those of us who’ve been through this process know pretty much there is no more complicated way to make a movie than this.

Jim: I just want to say though, that for every time that it’s a giant pain in the ass, there’s a time when you’re doing something so cool that you could never have done in live action or in normal photography or even normal effects. So it balances out. I mean, we get pretty excited and we get pretty far down our own rabbit hole, but I would say when we discover a new way of doing something and then it actually works, we get pretty excited about it.

Glenn: Yeah. And then talk to me about the 3D, because that was so organic looking, and 3D is typically very tricky because it can make you feel dizzy and it can give you headaches. And this just felt so integrated.

Jim: Well, we made an interesting decision. I mean, AVID supports 3D and you can run 3D on your monitors and all that. We made a decision not to mess with it in the cutting process, to literally just leave it out completely and have it be a parallel track. So what we did was we had a cleanup team in LA that we called Wheels.

Wheels is a department that does some motion smoothing and some cleanup on camera movement and so on. And at that time, that’s when they look at the stereo. We have a guy named Jeff Burdick, who’s vice president at Light Storm, but he’s been through all of the 3D right from the beginning, from when we first started experimenting with the cameras back in 1999.

And he was the one that supervised the 3D conversion for Titanic and for Terminator 2. And he’s been right there, with every moment of live action photography, watching in a little projection space that we call the Pod. And he looks in real-time at 3D projection of the live action scenes.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 177
On set of 20th Century Studios’ AVATAR 2. Photo by Mark Fellman. © 2021 20th Century Studios. All Rights Reserved.

And he’ll call me on the radio if he thinks the interocular is off, or some foreground branches are screwing up a shot. But he also is what I call my “golden eyes,” the guy that goes through and sets the interocular for every single 3D shot. So by the time it gets to me and I’m looking at finished 3D shots or proposed finished shots coming back from Weta, it’s done.

I very rarely make adjustments at that point. So in a funny way, it was the least of my problems. Now I compose for 3D when I’m doing live action. I look at the 3D monitors. I double check my 3D at the time. But when I’m doing virtual production, which was about 80% of the movie, I don’t look at the 3D.

I compose for it in my mind. I’ve got a pretty good visual spatial imagination. And I’ll often say to the team while I’m doing a shot, “This’ll be good in 3D,” and sometimes even when we’re working together as editors I’ll say, “Let’s hold on this shot a little longer because this will be a good 3D moment.”

Because part of good 3D is about holding and trying to let two shots do the job of three on a normal movie. And our overall cutting rhythm is far below the industry average for an action movie. And we do that by design.

Glenn: And you’re doing all this without the 3D while you’re cutting?

Jim: Yeah, but I can see it in my head. But the quick answer to your question is that it’s institutionalized. It’s part of our culture that we’ve developed for over 22 years at this point, and everybody up and down the chain knows their 3D so well that there are no surprises for me as the director. And I’ve encouraged everybody to be more aggressive with the 3D on this film. We were very conservative on the first Avatar because we didn’t know; a two hour plus 3D movie had never been done.

Glenn: Mm-hmm.

Jim: We didn’t even know if people could hold the glasses on their noses that long.

Glenn: Yeah. that’s another thing. The movie’s over three hours long, but it felt like it was two max. It just flew by. The pace that you guys created was incredible.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 178Stephen: Look, Jim, made a decision he was going to deliver a film that was approximately three hours long. And of course, as editors, we can make suggestions, but he really felt that in this day and age with all of the Marvel-like films…

And has always made the point, when the studios say, “Well, you’re going to lose a whole show capability on a Friday or Saturday,” he’s always pointed out that the most successful films of all time have all had long running times. You know? And it didn’t impact Avatar 1 being some two hours and thirty-eight minutes.

And he believed that today more than ever with the way people binge watch things, that it was not going to be an issue. And of course, as I said, we would suggest cuts and some of them he’d say “Yes,” and some of them he’d say “No.” And as Jim rightly has said many times, it’s not about length. It’s about engagement of the viewer.

Glenn: Relating to the characters.

Stephen: If they’re engaged in the characters and the story, you won’t feel the time.

Glenn: And there’s a lot of world building too with the whole island culture, you had to build all that.

Jim: Yeah. So I like engagement better than pace. A lot of people would say, “Well, it’s not about length, it’s about pace.” But pace implies a kind of rapid image replacement that has kind of a rhythm to it. And I think it’s more about engagement. And sometimes engagement can be holding on a shot and letting the eye explore.

But I think it has to do with the replacement of ideas. Like, ‘What’s your new idea?’ And then what are you doing with that idea? And then how does that idea recur? You know? So we spent a lot of time talking about this stuff after we’d screen the picture for ourselves, and then we’d have these day-long discussions about where it’s flagging and what can we radically do to just take some big chunk, just haul it out bleeding, and see if it still works.

And we did stuff like that. But here’s one interesting thing: If you ask people, “Where was a movie slow?” You’ve put the answer to the question in the question, which is: People are going to think of something as slow.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 179And so the answer we always got was there was ‘too much swimming around,’ too much underwater, too much exploring. And yet, if you ask anybody what they really loved the most from the movie, it was the swimming around. It was the underwater, it was the exploring, it was the creatures. So if you literally just acted on the notes, you’d have cut out the part that people liked the best.

And we found exactly the same thing in the first movie. The studio is adamant that we had to cut the flying in half. I said, “We’re not going to cut a frame of the damn flying! The flying is what people are going to come out of the theater talking about.” That feeling…It’s about lingering in a space. It’s about lingering in an emotional state. Right?

John: Yeah. I’ve got to give a lot of credit to Jim in insisting that certain things stay in there. We had a lot of discussions about, “What can we take out?” and the things that Jim insisted should be in there and leave in there. And now I’m hearing back from people saying, “Well, that part was a little slow, but I loved it. I didn’t want to get out of that scene.”

Stephen: Yeah, there’s also the experience of being immersed in a cinematic experience in 3D that people I think are hungry for after years of being shut in and perhaps not seen a movie on the big screen. The timing of this and people coming out, it’s almost like the rain stopped, the the sun came out, and now they’re going to the cinema. I think it’s encouraging.

Glenn: Oh, absolutely. And there haven’t really been any 3D movies in a while, so I think people are really excited to experience that. And also just to experience Pandora again, but in a totally different way with the whole water angle, which is incredible.

Jim: It’s interesting, the 3D, I mean, for me, from an authoring standpoint, it’s 10% of my consciousness, if that. You know? I’m focused on cinemascope composition versus 16:9.

Here’s an interesting thing that nobody commented on on the first movie—and to date, nobody has commented on on the new movie: We literally released in two formats simultaneously. Scope and 16:9, based on the individual theater and the nature of its mechanized masking. This is absolutely true. We contact everything single theater in North America and we find out if they pull their masking open, left and right for a cinemascope frame, we give them a Cinemascope DCP because it’s more screen size.

If they pull their masking open vertically, or leave it fixed at 16:9, we give them a 16:9 DCP—which I happen to like better for 3D, but I don’t like it as well for 2D. We literally deliver a DCP to each theater based on how their masking is set up.

I don’t think anybody’s ever done that in history and we did it 13 years ago on the first film, and nobody commented. I mean, they literally were seeing two different movies in two different theaters and nobody commented. It’s the strangest damn thing.

Glenn: It’s great that you do that because we’ve all been to theaters where they don’t open up the screen and part of the image is cut off. And I’m sure that when you created the 16:9 and the scope version, you watched those to see how they translated.

Jim: Right. So we’re literally authoring every step of the way, along the way, for both aspect ratios.

Glenn: Smart.

Jim: The other thing we did: We set up our viewing system such that we had an additional—I think it was 6% of overscan available. So Weta was actually delivering an image that was a 6% overscan image. And then we could actually use that to slightly reposition the shot vertically or horizontally right up to the end.

Literally right up to days before the final delivery. I was doing little tiny readjustments of the frame just for headroom and for composition and for cut-ability and things like that. It’s easy enough to blow up shots, but you don’t normally come wider on a shot. But sometimes there’d be a little bit of a slightly bumpy cut, and I could zoom in on the A side of the cut and come slightly wider on the B side of the cut, and then all of a sudden it flowed beautifully.

Glenn: Interesting.

Stephen: You’re getting not only the Jim Cameron as editor, but as virtual camera operator, director, writer…You’re getting a lot of information that is not just editing here.

Glenn: Fantastic.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 180
Director James Cameron behind the scenes of 20th Century Studios’ AVATAR: THE WAY OF WATER. Photo courtesy of Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

Stephen: And it’ll be a great insight to a lot of people to have a window into the process that drove this entire project. And that primarily exists in Jim’s head and I think—what, did we have 3,800 people working on this film? Something like that.

Jim: Yeah. 3,800 people. Not all at the same time. It definitely peaked in the last year with Weta having about 1200 to 1300 running. And that’s across the virtual production: Our lab in LA—which was 300 people, the live action crew of about 500 down here in Wellington, and then the editorial team, which was not insignificant. I mean, I think we had about 10 or 12 assistants at our peak.

John: We had about 22.

Stephen: Yeah. More.

Glenn: And with those assistants, how were they helping you guys? As far as this process?

John: You know, the assistants were always busy. There was so much work to do just from when the first batch of reference cameras came in. You had to organize them and prep them so that we could edit them and we could look at them, all the way through to the virtual cameras coming in.

They all had to be organized. And then when it comes time to send things to the lab, they had to turn what we as the editors did into something that the lab could understand.

And so that was a very time consuming process also. They’re doing capture, the editors are editing, and stuff is being turned over to the lab all at the same time.

Stephen: The other factor is that when a scene is ready to go to Weta Effects, everything has to be vetted. So if it’s a 10 character scene, every single piece of motion and face track has to be accounted for in the count sheets. So there’s a very, very complicated turnover process. And the assistants did an amazing job vetting every piece of motion and face.

And, you know, there were times where Jim would move lines around and he’d say, “Match what I do in the audio.” So we don’t want to find out when a shot comes back from Weta that the face track was not in sync with a line that got moved 10 frames, you know what I mean?

Jim: Sometimes I’ll just slip the audio, especially on a background character. And then the assistants will catch that, that there’ll be a sync mismatch. And they’ll either move the full character, face, and body in sync to the line.

Because you see, the thing is, it’s like multi-tracking. So we can move people around in the background all day long, change their timing, change their spatial position, change their temporal position, and change foreground characters that we’re using as an out-of-focus wipe across the foreground to improve a cut.

So I’d be constantly firing off these render requests and then the assistants would take them to the lab, bring them back, put them back in the cut, as part of the finishing process. So it’s kind of hideously complex, but it also gives you all these amazing choices.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 181
Director James Cameron behind the scenes of 20th Century Studios’ AVATAR 2. Photo by Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

Glenn: Yeah, you’re not locked in, which a lot of times…We try all kinds of tricks when we’re editing live action to get around, but don’t have that kind of capability that you’re talking about.

Jim: So we had this really experienced team and we had this culture around how we did things. So obviously retention was really important. We wanted people to enjoy their working environment and enjoy what they were doing. We didn’t have much turnover.

And that’s across a five-year span, and everybody was busy as hell the entire time. It’s amazing how labor intensive the process is. But it was really critical for us, once people knew how to do this. And it usually took—what do you think, Steve? Six months to figure it out?

Stephen: I have to say, I remember when David came on and you[James Cameron] said, “You work with Steve and John and try to figure this whole thing out, because they’ve been there before and you’re probably not going to be of much use for six months.” And he picked it up very quickly, maybe half the time. He was an expert; it was phenomenal.

And it’s a testament. His skill, how smart he was, how adaptive he was. But I also want to add, we had well over 20 assistants and VFX editors and assistants on the LA side, and we probably had six or so in New Zealand.

And we were able to work in either LA remotely, because once the pandemic hit, we were working from home and remoting into either our studio at Manhattan Beach or in Wellington, New Zealand. And we could work in either place, which I think is amazing. I mean, the technology at our time now, I can only imagine the disaster this would’ve been if it happened 10 or 15 years ago.

Jim: Five years ago.

Stephen: …Or even five. Yeah, right. But the fact that we could actually jump on each other’s Avids regardless of whether we were here or in New Zealand…We could communicate through the various softwares like Slack, Blue Jeans, Zoom, Evercast…It gave us the ability to jump in and review sequences together. The timing was perfect that we could actually continue working.

Jim: Remote controlling the Avid in LA from here—I’m in Wellington, New Zealand right now—and I could actually drive the Avid on the stage in LA, and I could direct scenes in LA—capture scenes—and I could run the Avid to play back for an actor in LA to do FPR.

So we wound up doing something kind of crazy. You know the normal ADR booth kind of setup where you hear “beep, beep, beep,” and then you talk?

Glenn: Sure.

Jim: We didn’t do that at all. We did a hundred percent of our ADR on this movie using what I call the “Hear it, say it,” method.

Glenn: Mm. A loop.

Jim: Like old style looping, right. I literally would just put an in and an out mark on the line I wanted the actor to say, and I’d push play in Wellington and the Avid would play back in the recording room in LA, and the actor would hear the line and say the line. That’s how we looped the entire movie.

Glenn: I think that style of looping is really effective.

Jim: And especially working with the kids, right? We had a seven-year-old, we had four or five teens, and these kids had never looped before in their lives. And I wasn’t going to put them through ‘Beep, beep, beep…Okay, now act!” All they had to do was hear how they said it, you know?

Glenn: And was there any lag with your AVID in LA from Auckland?

Jim: It was like a quarter second, half second, something like that. I couldn’t see sync in real time. I’d have to wait until they played it back for me. Like it would be out just enough that I literally couldn’t see it in real time. But that’s okay.

Glenn: One of the things I thought was so great about this movie is sometimes you watch action scenes and it just seems like it’s chaos. And here there was a clarity of action. There were really good peaks and valleys that kept your interest and you always had a point of view as to who you were following and made you really care about the action scenes.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 182
Jake Sully (Sam Worthington) in 20th Century Studios’ AVATAR: THE WAY OF WATER. Photo courtesy of 20th Century Studios. ©2022 20th Century Studios. All Rights Reserved.

Jim: Well, we were blessed on this movie that we had three really good action editors. The thing that I like about Steve and John and David is that they’re all highly humanistic, emotional, full spectrum editors, but they can all cut action. I like cutting action myself.

Sometimes I’d break off a sequence and do my own riff on it, but then I’d go back and I’d look at like, the akula chase that David cut and I’d go, ‘You know, he juxtaposed something there that I wouldn’t have thought of.’

Then I’d do like a mashup. I think we’re all pretty good action editors, and I call myself an ‘Axis Nazi.’ Stuff has got to be on axis for us to understand it spatially. And I think a lot of these big multi-camera shoots, they get a lot of cameras that are over the line and stuff is going this way and then that way and I lose the geographical relationship of what’s going on.

Glenn: Geography’s huge.

Jim: Yeah, I’m probably pretty pedantic old-school in terms of axis. Steve, you are pretty much as well, I think.

Stephen: I really do love the fact that you are as dedicated to the orientation of axis. I agree with you: There are many, many movies that will have action scenes that are, in my mind, incoherent. They may still be very entertaining. It may be more like a montage of action, and it still works in its own right.

But I’m a classic kind of action guy. I have a need to understand the geography of the action, and I really do love that you feel that way too; and I know John does as well, and certainly David did. We were all classical cutters in that sense, you know? And it’s great that we were all on the same page.

John: We always talk about, you know, “The axis is wrong. We’re going to have to fix this.” To me, I just want to know what’s going on. I don’t want to get blasted with a bunch of fast cuts…that I have no idea what happened

Glenn: Sure.

John: And it was great that Jim was concerned with that, and that was always a topic of discussion when we were doing cameras. Or quality control cameras to set things up for Jim.

Stephen: It wasn’t just in the action, by the way. It wasn’t just in the action. It was any scene. We couldn’t violate the basic rules of axis. And as John said, we knew that if something crossed axis and it was confusing, that Jim was going to need that fixed. Interesting to point out as well, that the reference cameras—although they tried—there were moments where we had a closeup in the wrong axis.

Now, it didn’t mean that that performance could be ruled out. Because remember, it’s just a reference of what’s actually recorded in the volume.

So any shot, as I said, could be a close-up/medium/wide shot, but also could be shot from any axis. There were times in a performance edit where we would put a performance in and—just for the sake of making the performance edit play—we’d flop the shot to the correct axis. Knowing full well that didn’t have to be that way during the virtual photography process; it would just represent the shot from a different axis.

Jim: Here’s where it gets interesting. If John or Steve encountered a problem that they couldn’t solve with one of my virtual cameras, I encouraged them to go to Richie Baneham—who was doing the QC cameras ahead of me, but he was also doing his own kind of second unit stuff and just feeding shots into the movie.

They would say, “Well, hey Richie, can you just bang me off a shot that’s on this axis or is a little looser, a little tighter?” Or whatever. And so they would literally just be ordering shots and sticking them in the cut. So it kind of went beyond editing in the classic sense.

Glenn: Like you said, they did a little bit of directing as well.

Stephen: We could also fix motion by the way. So if we needed to cut out of a shot early, but we missed an arm movement or something that may cross a cut and make a cut smoother to blend it, we could work with Richie to advance the A side of the cut so that we could retain that action across the cut. It’s an editor’s dream to be able to fix motion across cuts, and we were able to do that.

Glenn: Wow.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 183
Behind the scenes of 20th Century Studios’ AVATAR 2. Photo by Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

Stephen: Because we could compress the time. Like, let’s say you take three seconds off the end of a shot, but it had something that bridged to the next shot. So you could actually pull up the tail of the shot three seconds or back up the motion from the incoming shot into the outgoing shot, and have them blend the motion to retain what you cut out.

Jim: You gotta think of every shot as an approximation of what it could be, right? You could slip the sync of the camera, move relative to the action of the characters. Or you could slip the sync of an individual character relative to the other characters in the shot so that they completed an action such that it cut to the next shot as a proper flow. So there’s a lot of flexibility. You never lose a take because it’s out of focus, because it’ll always be in focus. You never lose a take because of something that happened that you don’t like, you just fix it. So when you have infinite choices, it really forces you to think, ‘All right, why do I want this shot?’ It kind of forces you to really think editorially in terms of the narrative. Like, ‘What is the shot doing? What is this shot doing that another shot doesn’t do quite as well? And if I really like this shot and I’m willing to build a structure around this shot, what am I sacrificing on either side of this shot by committing to this particular angle here?’

You run through this endless kind of ‘program’ in your mind of what is best for the scene, what is telling the story most clearly.

Because, you know, picture editing on a live action picture, I think is a process of, all right, you start by throwing out the bad performance takes. Then you start by throwing out the out-of-focus frame ranges. Then you start by throwing out the stuff where the extra dropped the tray in the background, or some stupid bullshit happened, and you get down to a relatively limited number of ways to try to tell your scene. And we don’t have any of that. So it forces us to be a lot more analytical about our narrative process. You keep mentioning kind of animation, and I think what you mean by that is an animator can do anything. They can work from thumbnails, from storyboards, from previs. You’ve got an infinite number of choices, and that’s kind of a problem and a blessing at the same time, but it forces you to be really analytical about your storytelling.

Glenn: Yeah.

Stephen: Yeah, there are no limits here. There’s nothing that has to be thrown out. It can be manipulated. So with the infinite possibility, to maximize the storytelling, the characters, the performances…You begin to understand why it takes so long to make a movie like this.

Jim: I think we go into it with open eyes, knowing how complex it is, knowing what the possibilities are, and believing that we can achieve something that’s dreamlike on the screen, that puts the audience into a state of cognitive dissonance where they’re looking at something that cannot possibly be real, and yet they can’t figure out why it looks so real. So that’s a cognitive dissonance. They’re looking at something truly impossible. Our characters have eyeballs that are volumetrically four times the size of a human eyeball.

It’s impossible and yet you see it. And there’s something compelling and empathetically triggering about that. And they have cat ears, and cat tails, and these lithe bodies and it’s not achievable any other way.

You couldn’t do it with makeup. You couldn’t do it prosthetics. There’s no live action way to do it. If you did it with conventional animation, it would feel animated and your cognitive dissonance wouldn’t be triggered. So we exist in this narrow bandwidth where you’ve got human actors performing characters that are humanoid enough to justify that, and yet different enough from human to justify all of this process.

And we ask ourselves if it’s worth it, and then we put it up on the screen and we go, “Holy shit, we just did that.”

All: (Laughter)

Jim: And then we all know the answer.

Stephen: Yeah. We’re all crazy to embark on this. But it is extraordinary. And one thing I just want to go back to is: One of the most exciting things for me is to look at the final render and see every nuance, every eye movement, every little facial movement and expression of these actors—that we picked years ago—come back in full render. And every detail is evident on the screen. And to see that come back, all that full expression in every actor’s face that we saw years ago, is extremely rewarding.

And I just want to say that it is heartbreaking to me that David Brenner did not live to see all of this hard work come to fruition. And I’ve got to believe that somewhere, maybe he’s got the best seat in the house, but I really hope he can see it some somehow.

Glenn: Yeah.

Jim: IMAX is brighter and bigger wherever he is watching from. Yeah. That’s kind of one of the heartbreaking aspects: He never got to complete the cycle and see, because every time these shots come in, it’s like Christmas morning. Suddenly seeing these characters popping to life in just shot after shot after shot.

And you know, as rigorous as our process is, we kind of live for that moment. I remember John particularly was a Nazi when we were working on the set, like, “Now don’t forget this little thing that Jake does here.” And I wouldn’t see it, I wouldn’t see it on my virtual camera because the image is too coarse.

As good as it is, it’s still coarse when it comes to very, very fine expression. But you know, all the editors really helped me through that process of really remembering what the moment was. And that all culminates when we see those final renders coming in from Weta.

John: It’s very satisfying. It is. It’s a lot of hard work, but when you see the final product, you go, “All right, that’s good stuff.”

Stephen: It’s amazing. It’s an amazing transformation between template and final render.

Exclusive: Interview with James Cameron and the Avatar 2 Editorial Team 184
(L-R): Spider and Director James Cameron behind the scenes of 20th Century Studios’ AVATAR: THE WAY OF WATER. Photo courtesy of Mark Fellman. © 2022 20th Century Studios. All Rights Reserved.

Jim: Well, I always say about Weta: First of all, that it’s extraordinary what they’re able to do with their toolset and their people. They’re unparalleled in the world. There’s a moment where we throw it over the fence to them, and then they start their magic.

And I always say excellence in yields excellence out. It’s like the garbage in, garbage out thing. But the corollary of that is if we put into them the maximum amount of excellence that we can muster, then they’re starting their process already at a very high level. And then what comes back from them is that much more extraordinary as a result.

Glenn: Yeah. That’s fantastic.

Stephen: Just want to give one last shout out to our amazing editorial crew, both in LA and New Zealand. And our additional editors—the next generation of Avatar editors: Jason Gaudio and Ian Silverstein. We work so wonderfully as a collaborative team, and we can never thank our assistants enough for the huge undertaking in helping us to get this film to the screen.

Glenn: Well, I love talking to you guys. This was fascinating. People are just going to be blown away by what you guys just described, and I love the movie. I love the cutting. And thank you so much for your time.

Jim: All right. Thanks Glen.

Stephen: Thanks.

Jim: I’m glad that we got to talk in detail.

 

 

]]>
https://www.provideocoalition.com/exclusive-interview-with-james-cameron-and-the-avatar-2-editorial-team/feed/ 1
AI Tools Part 1: Why We Need Them https://www.provideocoalition.com/ai-tools-part-1/ https://www.provideocoalition.com/ai-tools-part-1/#comments Wed, 18 Jan 2023 04:54:01 +0000 https://www.provideocoalition.com/?p=261729 Read More... from AI Tools Part 1: Why We Need Them

]]>
AI Tools? Aren’t you already exhausted from hearing “AI-this” and “AI-that” everywhere? We’re all constantly blasted with crazy AI generated fantasy images in our social media feeds these days (yes, I’m personally responsible for some of that… sorry!) but where is all this rapidly-growing AI technology really going? Is it just another fad? What is there besides just making crazy fantasy images and people with too many fingers?

Read on…

AI Tools Part 1: Why We Need Them 258

AI has been evolving for years already, but haven’t we all seen a major a rapid growth recently? We now have access to AI image generation, text and content generation, AI voice generation and image/video enhancements. Where will it go and how will it affect our viability as content creators? Where does AI pull it’s resource material from in the machine learning models? Can the resulting works be copyrighted or used commercially? Why is AI viewed as such a threat to some artists and creative writers? What is the potential for ethical and IP infringement cases?

It’s a heated topic full of questions and speculation at the moment.

I was originally going to just write about AI Image generation to follow up on my AI Photo Enhancement article last year, but with various new AI technologies emerging almost monthly, I decided to break this out and just start with an overview today, and expound on deep dives into the various AIs that come out on a regular basis. As I create subsequent articles on each particular AI tactic, I’ll update this article to use as an ongoing reference portal of sorts. This ongoing series of articles on AI will help us to get under the hood to better understand what AI is and how we might find the positives in this rapidly-evolving technology and how we might use it to our benefit as content creators, editors, animators and producers.

AI Tools Part 1: Why We Need Them 259
DAMN THESE HANDS!

AI descriptive image generation is still pretty far from being very accurate in it’s depiction of certain details, albeit people and animals are looking much better overall in the past year. We’ve seen a vast improvement of how AI is depicting humans compared to only 6 months ago, but still, TOO DAMN MANY FINGERS!!

Voice AI is improving by leaps and bounds and text generation can actually be pretty impressive recently. Everything right now is still in dev/beta so we’re really just getting a tiny peek behind the curtain of what is possible.

You’re going to read arguments online that AI is taking away our jobs, in one way or another, and that may be partially true eventually. But not really anytime soon. It’s another tool like photography was in the beginning; as were computer graphics and desktop publishing to traditional layout, and NLEs and DAWs were to the video, film and audio industries, and 2D/3D animation has been to traditional cell animation, etc. Technology evolves and so must the artist.

But what exactly are the REAL issues creative people are concerned about the future of AI? Is its resourcing legal/ethical? Is it going to replace our jobs as creatives?

AI Tools Part 1: Why We Need Them 260

 

    • AI will replace photographers
    • AI will replace image retouchers
    • AI will replace illustrators and graphic designers
    • AI will replace fine artists
    • AI will replace creative writers
    • AI will replace animators and VFX artists
    • AI will replace Voice Over artists
    • AI will replace music composers

 

But will they really? Or will they simply make these roles better and more efficient?

As recently cited by Kevin Kelly, Senior Editor at Wired wrote in his article Picture Limitless Creativity at Your Fingertips

“AT ITS BIRTH, every new technology ignites a Tech Panic Cycle. There are seven phases:

  1. Don’t bother me with this nonsense. It will never work.
  2. OK, it is happening, but it’s dangerous, ’cause it doesn’t work well.
  3. Wait, it works too well. We need to hobble it. Do something!
  4. This stuff is so powerful that it’s not fair to those without access to it.
  5. Now it’s everywhere, and there is no way to escape it. Not fair.
  6. I am going to give it up. For a month.
  7. Let’s focus on the real problem—which is the next current thing.

Today, in the case of AI image generators, an emerging band of very tech-savvy artists and photographers are working out of a Level 3 panic. In a reactive, third-person, hypothetical way, they fear other people (but never themselves) might lose their jobs. Getty Images, the premier agency selling stock photos and illustrations for design and editorial use, has already banned AI-generated images; certain artists who post their work on DeviantArt have demanded a similar ban. There are well-intentioned demands to identify AI art with a label and to segregate it from “real” art.”

While those fears may be valid for some industries, it’s time to take a closer look at how AI can actually empower us in our work instead of replacing us entirely. In this article we’ll explore one of the ways AI has changed the game for professionals like ourselves–by becoming an indispensable tool and invaluable partner when tackling complex projects. By understanding what AI can do for us now and in the future, we will be able to capitalize on its advantages while still preserving our singularly human skills and creativity.

UPDATE: Be sure to check out Iain Anderson’s article from October 2022 about Artificial Intelligence in Video Production

So let’s look at what all this AI technology really means to existing artists, photographers, video/film productions and audio producers and actors before throwing stones. (I won’t get into a huge legal/ethical discussion here, as that’s a completely different discussion). I’m only going to share what I’ve discovered in the AI communities and practical application of some of it so far. And I’m going to miss a LOT of tools at that right now, so this is NOT an exhaustive list of everything out there currently!

It’s only just the beginning with these tools – and they’re only that. TOOLS.

They’re not going away and yes, YOU WILL BE ASSIMILATED! 😉AI Tools Part 1: Why We Need Them 261

So, What is AI?

Well, let’s ask it and see if it can tell us!

I used two different textual content generators to provide answers to these questions; ChatGPT and Jasper. I’ll discuss more on these AI tools later in this overview article.

I first asked ChatGPT “How does AI work?” The response was:

AI Tools Part 1: Why We Need Them 262That’s a great formal response for AI technology in general, but what about Image generation models specifically?

I queried ChatGPT again and asked, “How do image generating AI tools like midjourney and dall-e work?”

Both Midjourney and DALL-E use Generative Adversarial Networks (GANs) to generate images. GANs consist of two neural networks: a generator and a discriminator. The generator produces new images, while the discriminator attempts to distinguish the generated images from real images. The generator and discriminator are trained together, with the generator trying to create images that can fool the discriminator, and the discriminator trying to correctly identify real and generated images.

Midjourney uses GANs to generate images of specific objects or scenes, while DALL-E generates images based on textual descriptions. DALL-E uses a transformer-based neural network architecture, which allows it to understand the meaning of text and generate images that match the description.

In summary, both Midjourney and DALL-E use GANs to generate images, but Midjourney uses GANs to generate specific images, and DALL-E uses a transformer-based neural network to generate images based on textual descriptions.”

And again, for audio, VO from TTS: How do TTS AI tools generate different voices from text in different languages?

“Text-to-speech (TTS) AI tools use a combination of machine learning algorithms and recorded speech samples to generate speech from text. For different languages, the TTS model is trained on a dataset of speech samples in that language. This dataset is used to learn the nuances and characteristics of how words and sounds are pronounced in that language. Additionally, the TTS model also takes into account the grammatical and syntactical rules of the language to ensure that the generated speech is grammatically correct. Once the TTS model is trained, it can be used to generate speech in the target language by synthesizing the sounds and words based on the rules and patterns learned from the training dataset.”

That’s all good technically, but what if we ask AI to break it down into layman’s terms? In layman’s terminology, please describe how AI tools generate images and text.

ChatGPT replied:

AI Tools Part 1: Why We Need Them 263

Jasper’s response to the same prompt:

#1 response [SIC]:

“AI text and AI imaging technologies have enabled machines to create content without the need for human creativity. AI tools are able to generate images and text based on collected data and pre-programmed artistic information. The AI tool assesses the data and visuals in its own database, recognising certain colour palettes or visual elements, before crafting an image or piece of text that best satisfies the criteria identified as part of its programming. AI technology can also be used to make predictions about what type of material could be created by a person in a particular context, using algorithms that analyse huge amounts of data. This AI-generated material can then be customised by humans to suit their specific needs.”

#2 response:

“AI tools have revolutionized the ability to generate images and text with relative ease. AI text generation technology combines a cutting-edge neural network architecture with natural language processing algorithms to enable AI to generate text that mimics human writing. AI imaging capabilities enable AI tools to generate realistic and high-quality pictures from photos, drawings, sketches and other visual artistry. AI is also able to guess what it sees and understand context, allowing AI models to construct new images that are based on old ones. This AI-generated creative content has opened up whole new avenues in terms of possibilities in both design and communication.”

Note that these were just really simple one line questions with no keyword prompting or details – and no post-editing of the raw text output. I’ll share more in the AI Text Generation section below.

I’d like to dig a bit deeper in my upcoming research to better explain HOW these tools actually gather their source library resources and convert those to images, text and audio files. We’ll get into that further in subsequent articles in the coming months, but for now, let’s take a look at a few examples of some commercially-available tool and others that are still in beta today.

—–            —–            —–            —–            —–            —–            —–

Examples of various emerging AI Technologies for the creative industry

AI is being used under the hood for almost every digital aspect of our daily lives already – from voice recognition, to facial recognition to push-marketing algorithms, to location identification and much more. So what are some (but obviously, NOT ALL) examples of how it’s being developed for creative producers? How can we eventually leverage AI technology to our advantage and producer better quality content in less time? We’ll start by taking a look at AI enhancement tools for images, audio and video. Then on to the AI image, music and text generation technologies.

UPDATED: Here’s a more complete list of AI Tools that’s updated regularly.

AI Image Enhancement and Scaling Tools

With Adobe embracing more AI into their apps and Neural plugins these days, lends me to think it’s more of an efficiency tool to produce better results in less time. Other apps out there, such as Remini.web that I covered in my AI Photo Enhancement article last fall, are continually improving with amazing results. I have personally benefitted in utilizing this software on a feature doc with over 1000 images to help restore clarity in poor quality prints and scans. Adobe is quickly adding retouching and facial enhancement tools to their Neural filters that actually are impressive. I can imagine they will only continue to get better.

Topaz AI

AI Tools Part 1: Why We Need Them 264

Topaz Labs AI has featured several photography enhancement products in their lineup, including DeNoise AI, Sharpen AI and Gigapixel AI. They’ve also just released an all-inclusive single product called Photo AI, which seems to have replaced Topaz Studio. I haven’t yet tried it out, but it looks pretty straightforward and includes all the features of the other standalone products, including Mask AI which is no longer listed on their website.

My go-to has been Gigapixel AI because most of the work I need it for is in upscaling and noise removal/sharpening and sometimes, facial enhancement if needed on older, low-res photos. I must use it at least once a week with some of the images that come to me for retouching or compositing.

Their Face-recovery is getting better as I’m sure their resource network is being refined for the AI modeling. I’ll have to try some older images in my next article to compare from last year’s article.

AI Tools Part 1: Why We Need Them 265

I’ve yet to find another software that handles the complexity of various image elements such as hair, clothing, feathers, natural and artificial surfaces and textures while upscaling 400% or more, and remove JPG compression and sharpening at the same time. It’s really quite remarkable and because of machine learning, it’s only getting better over time.

AI Tools Part 1: Why We Need Them 266

This example of singer/actress Joyce Bryant is a worst-case scenario just for testing purposes. She was a gorgeous woman and I wanted to see just how well the software would hold up, starting with an image this low-res at 293×420 px. This was featured in my article last year on photo enhancement software.

AI Tools Part 1: Why We Need Them 267

Really though, the best examples are on their website with interactive sliders on the before/after images.

As with many other tools in this article, I will dig deeper into Photo AI and provide more detailed analysis of the software in a future article.

Remini web

AI Tools Part 1: Why We Need Them 268

Remini Web was another tool that I featured in my article last year on photo enhancement software for documentaries. I go into a lot more detail there, but so far I’ve processed over 1100 old photos for a feature doc we’ve been working on the past couple years, and this tool has brought so many images back to life!

Here’s an example of an image that was originally only 332×272 px and was upscaled 400% in Gigapixel (without face recovery) to a more useable 1328×1088 px. Then I ran that image through Remini Web and the results were astonishing.

AI Tools Part 1: Why We Need Them 269

As was this photo of an old print of my Mom’s Senior Photo from the early 1950s:

AI Tools Part 1: Why We Need Them 270

And of course, an example from the public domain of historical photo of Abraham Lincoln – retouched and run through Remini Web:

AI Tools Part 1: Why We Need Them 271

Remini Web is still my favorite AI facial enhancement tool to date, but the list of options is growing rapidly!

Adobe Neural Filters

AI Tools Part 1: Why We Need Them 272

Located under the Filters tab in Adobe Photoshop, Neural Filters are a fun and easy way to create compelling adjustments and speed up your image editing workflows. Powered by artificial intelligence and machine learning engine Adobe Sensei, Neural Filters use algorithms to generate new pixels in your photos. This allows you to add nondestructive edits and explore new creative ideas quickly, all while keeping the original image intact.

The different types of Neural Filters (Featured and Beta)

Once you open your image in Photoshop, there are several featured filters ready for you to use. Choose one to enhance your shot or try them all and see what works best for you.

Smart Portrait

The Smart Portrait filter simplifies complex portrait editing workflows in a few simple steps. The Smart Portrait filter adjusts portraits creatively by generating new elements for Happiness, Surprise, Anger, Facial Age, Gaze, Hair Thickness, Head Direction, and Light Direction.

AI Tools Part 1: Why We Need Them 273

Smooth it over with Skin Smoothing.

Brush and touch up your subjects’ skin effortlessly with the Skin Smoothing filter. Simple sliders for Smoothness and Blur allow you to remove tattoos, freckles, scars, and other elements on faces and skin in an instant.

Excavate unwanted items with JPEG Artifacts Removal.

The more times you save a JPEG file, the more likely your image will look fuzzy or pixelated. You may see artifacting (obvious visual anomalies) due to the compression algorithms used to reduce the file size. Reverse the process with this filter and fine-tune it by adjusting the edge of your image with either a high, medium, or low level of blur.

Switch it up with Style Transfer.

Just like it sounds, this filter allows you to take the look of one image — the color, hue, or saturation — and put it on another. Sliders for Style Strength, Brush Size, and Blur Background as well as checkboxes like Preserve Color and Focus Subject let you customize how much of the look your picture ends up with.

Super Zoom

Zoom in closely on a subject while keeping details sharp. Enhance facial features, reduce noise, and remove compressed artifacts to let your subject — whatever it is — shine through in extreme close-up.

Colorize

Quickly convert black-and-white photos to eye-popping color in a flash. Designate which colors you want to appear in your capture, and Adobe Sensei will automatically fill the image. Focus points let you add more color to specific areas to fine-tune the filter.

AI Tools Part 1: Why We Need Them 274

There are several other beta filters if you want to experiment with them as well. I haven’t had a chance to actually demo them as of this article, but will in an upcoming article on this segment category.

Makeup Transfer

Apply the same makeup settings to various faces with this useful tool. Add new makeup to a photo or completely change your model’s current makeup in post-production to get exactly the right look for your shot.

AI Tools Part 1: Why We Need Them 275

Harmonization

Match the look of one Photoshop layer to another for natural-looking photo compositing. The Harmonization filter looks at your reference image and applies its color and tone to any layer. Adjust the sliders, add a mask, and enjoy the color harmony.

AI Tools Part 1: Why We Need Them 276

Photo Restoration

Need to restore old photos? Try the new Photo Restoration Neural filter, powered by Artificial Intelligence, to fix your old family photos or restore prints in Photoshop.

AI Tools Part 1: Why We Need Them 277

NOTE:

Even though I’ve already dedicated an article previously for this AI tech, I will be expanding my findings further with a new updated article in the coming months. Stay tuned.

—–            —–            —–            —–            —–            —–            —–

AI Video & Audio Enhancement Tools

Video and audio AI enhancement tools can help video producers and editors, where video stabilization can smooth out shaky footage, while AI-powered noise reduction can remove background noise from audio tracks. AI-powered color correction can automatically adjust the color balance and brightness of a video, and AI-powered object tracking can automatically follow a moving object in a video. Additionally, AI-powered video compression can reduce the file size of a video while maintaining its quality. These tools can save time and improve the overall quality of the final product.

Topaz Labs Video AI

AI Tools Part 1: Why We Need Them 278

Topaz Labs Video AI uses a combination of AI-based techniques and algorithms to improve the quality of videos. The software applies techniques such as deinterlacing, upscaling, and motion interpolation to the footage.

Deinterlacing is used to remove interlacing artifacts from videos that have been recorded in interlaced format. It does this by analyzing the video frame by frame and creating new frames by merging the information from the interlaced fields.

Upscaling is used to increase the resolution of a video. This is done by using AI algorithms to analyze the video and add more pixels to the image, while maintaining the integrity of the original image.

Motion interpolation is used to add more frames to a video. This is done by analyzing the motion in the video and creating new frames that are in between the existing frames. This results in a smoother and more fluid video.

AI Tools Part 1: Why We Need Them 279

The software also utilizes the latest hardware acceleration technologies to speed up the processing times, which allows you to enhance videos with minimal wait time.

Overall, Topaz Labs Video AI uses advanced AI algorithms to improve the quality of videos, by removing artifacts, increasing resolution, and adding more frames for a smoother and more natural video.

This example is a before/after screenshot from the video posted below. The original was standard definition from a compact VHS camcorder with interlacing, that was upres’d to HD and deinterlaced, as well as AI enhanced. The results are pretty remarkable.

AI Tools Part 1: Why We Need Them 280

This is a great demo video from Denver Riddle at Color Grading Central on YouTube that shows some various applications with older video footage. If you’re working on documentaries and need to put in old video archival footage, this is an invaluable tool that you MUST have in your toolbox! We’ve been using it for a couple years now on a feature doc that’s 4K and have been very impressed with what it’s done for old video and scanned film footage.

 

Adobe Podcast (web beta)

AI Tools Part 1: Why We Need Them 281

Adobe currently has an online tool in beta to quickly re-master your voice recordings from any source to sound better. Previously called Project Shasta, the new Adobe Podcast speech enhancement makes voice recordings sound as if they were recorded in a professional studio.

I’ve actually tried it on some VO recording we received that was professionally done, but still had some boomy artifacts from the small booth they recorded from in their home studio. I ran it through Adobe Podcast (drag and drop) and it spit out a clean version I could put into my animation project in Premiere and then apply some Mastering filters to really bring it to life without any artifacts.

To give a better idea of the effectiveness of this tool, I’m going to share with you a pretty thorough comparison video from PiXimperfect on YouTube (Subscribe to his channel for tons of great tips in all things Adobe too).

NOTE:

As with all of these tools, I’ll be exploring more and digging deeper in subsequent articles – but I really think that Adobe (and others) will be enveloping these technologies into their desktop and mobile apps before long!

—–            —–            —–            —–            —–            —–            —–

AI Image Generation Tools

Ok – so currently, the big controversial hubbub is this category right here; AI Image Generation. Creating “art” from text prompts. As I stated at the top of the article, I’m not going to debate in this segment whether or not it’s art, or if it’s ethical or stealing other people’s IP or whether I think it’s going to put people out of work or not. The jury is out on all that anyway, and as far as I can see, the results are still quite mixed in their ultimate usefulness, so for me, it’s just a visual playground to date. I can run most of mine from my iPhone while relaxing or waiting for video renders and uploads and post my unedited crazy results on my social media to stir up shit… and I do that a lot. 😉

I’m only featuring a couple of the most popular AI generators out there for this article as I will be most likely expanding on this segment in greater detail in next month’s article and we’ll dig deep into how each one works and the many ancillary apps that access this technology.

Midjourney

AI Tools Part 1: Why We Need Them 282

Midjourney is an independent research lab that produces a proprietary artificial intelligence program that creates images from textual descriptions. The company was founded by David Holz, who is also a co-founder of Leap Motion. The AI program, also called Midjourney, is designed to create images based on textual input, similar to OpenAI’s DALL-E and Stable Diffusion. The tool is currently in open beta, which it entered on July 12, 2022. The company is already profitable, according to Holz, who announced this in an interview with The Register in August 2022. Users can create artwork with Midjourney by using Discord bot commands.

Here’s a link to the Wikipedia page with more info about the development of this AI tool that’s gone from beta to an incredible AI image generation tool in 6 months!

You’ll notice that there isn’t really a lot of commercial visibility like a fancy home page or even an app. The only way to use Midjourney technology is through Discord (online or mobile app) and you’ll need some kind of paid account after your trial period is over. But you can sign up for the beta program through their main web portal.

The Midjourney beta produced fairly good images of people overall, but tended to get the eyes off weird and required further editing to resolve. I’ve found that taking the image into Photoshop and blurring the facial details a bit and running it back through Remini Web, it can regenerate a usable portrait image if needed.

AI Tools Part 1: Why We Need Them 283

Here’s an update of what this same prompt generated now in Midjourney v4. It’s amazing what only 6 months has done in the development of this technology so far!

AI Tools Part 1: Why We Need Them 284

This prompt was for “beautiful happy women of various races ethnicity in a group” – both the MJ beta from August 2022, to the MJ v4 image done today:

AI Tools Part 1: Why We Need Them 285

AI Tools Part 1: Why We Need Them 286

Also recently, I prompted for several expressions of Abraham Lincoln, based on his original portrait image that I retouched above with Remini Web. These were straight out of Midjourney v4. A few of them look like a crazy hobo. 😀

AI Tools Part 1: Why We Need Them 287

But I’ve found that with Midjourney v4, sometimes the simplest prompts deliver the most rewarding and spectacular results!

This is what “Star Wars as directed by Wes Anderson cinematic film –v 4” produced:

AI Tools Part 1: Why We Need Them 288

And based on the prompt “portrait of the most interesting man in the world –v 4” – replacing the word “man” for various animals. Nothing further was done to any of these images individually:

AI Tools Part 1: Why We Need Them 289

And we couldn’t do without “Walter White cooking in the kitchen on set of Baking Bad tv show –v 4” LET’S COOK!

AI Tools Part 1: Why We Need Them 290

The one thing even Midjourney v4 still can’t replicate well is HANDS! The prompt for this was “two human hands giving a high five photo –v 4”

AI Tools Part 1: Why We Need Them 291

There’s really so much more to go over in depth with Midjourney, and in the coming months, it may be even many more light years ahead.

(We’re still waiting on normal hands!!) 😀

 

DALL-E 2

AI Tools Part 1: Why We Need Them 292

DALL-E 2 is a variant of the DALL-E model (from OpenAI – who also created ChatGPT), which is a deep learning model that uses a transformer architecture to generate images from text descriptions. DALL-E 2 is trained on a dataset of images and their associated captions, and is able to generate new images by combining the features of multiple images based on a given text prompt. The model is able to generate a wide variety of images, from photorealistic to highly stylized, depending on the text prompt provided. It also can perform image-to-text and text-to-image tasks. Even very simple text prompts can deliver good results.

AI Tools Part 1: Why We Need Them 293

Most portrait shots of people come out fairly good at first glance in DALL-E 2. As with all AI generated imagery, further retouching is almost always necessary to produce anything usable.

AI Tools Part 1: Why We Need Them 294

AI Tools Part 1: Why We Need Them 295AI Tools Part 1: Why We Need Them 296I have found that DALL-E 2 may produce some decent portrait shots and other kinds of basic artwork designs, but it doesn’t have the richness and full environments that Midjourney outputs. For portraits though, it does a fairly good job, but seems to fail in facial details like the eyes, primarily. But just like my earlier Midjourney examples, I’ve found that taking the image into Photoshop and blurring the facial details a bit and running it back through Remini Web, it can regenerate a usable portrait image if needed.

It also doesn’t seem to be able to replicate famous celebrities well as I can never get it to produce a recognizable image. It looks like it could be SOMEBODY, but not who you’ve prompted for!

AI Tools Part 1: Why We Need Them 297

It’s still fun to see what DALL-E 2 can create, and of course when you discover advanced prompting, you can fine-tune your results even further.

Stable Diffusion

AI Tools Part 1: Why We Need Them 298

Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.

The Stable Diffusion technique is a method for training large language models that uses a technique called “gradient accumulation” to reduce the memory requirements of the model during training. This allows for the training of models with significantly more parameters than would otherwise be possible. Additionally, it uses a technique called “stabilization” to reduce the variance in the gradients, which in turn allows for a larger number of accumulated gradients before the update step. This leads to further reduction of memory requirements and training time.

The “stable-diffusion” model is pre-trained on a large corpus of text data and can be fine-tuned on a specific task using a smaller dataset. It can be used for various natural language processing tasks such as question answering, sentiment analysis, and text generation. You can use Hugging Face’s API to access the model and use it for your own projects.

I only spent a little time with Stable Diffusion when it was first in beta on Discord, but there are several new mobile apps and others running it on their own servers now as well. The prompt for this image was Georges Méliès Trip to the Moon with Apollo 13:

AI Tools Part 1: Why We Need Them 299

More recent tests in the Stable Diffusion Playground were from using prompts I’ve used in other AI generators like DALL-E 2 and Midjourney, have produced better results than in previous versions:

AI Tools Part 1: Why We Need Them 300

AI Tools Part 1: Why We Need Them 301

AI Tools Part 1: Why We Need Them 302

It certainly does a better job with celebrity faces than DALL-E 2, and with some proper negative prompting, they could be refined much further!

But like all the AI generators out there, hands still have major problems! Even with some specific negative prompts, it’s a big fail. 😛

AI Tools Part 1: Why We Need Them 303

What is Negative Prompt in Stable Diffusion?

A negative prompt is an argument that instructs the Stable Diffusion model to not include certain things in the generated image. This powerful feature allows users to remove any object, styles, or abnormalities from the original generated image. Though Stable Diffusion takes input known as prompts in the form of human language, it is difficult for it to understand negative words, such as “no”, “not”, “except”, “without”. Hence, you need to use negative prompting to gain full control over your prompts.

This article and video from Samson Vowles of Delightful Design, explains how negative prompts work to eliminate unwanted results in Stable Diffusion images. Sometimes they even work, but no on hands, obviously. 😉

Another good resource for Stable Diffusion negative prompting is this post on Reddit. The pinned guide walks you through how Stable Diffusion works, how to install it on your own server and of course Negative Prompting.

You can test out Stable Diffusion in their Online Playground and discover what it might generate for you!

NOTE:

I will be digging much deeper and exploring more with all of these AI image generation tools in my next article – AI Tools Part 2.

—–            —–            —–            —–            —–            —–            —–

AI TTS Voice Over Generation Tools

Ever since I got a new Macintosh in the mid 90’s that had Text to Speech (TTS) and Voice Command capabilities, I’ve been enamored with synthesized text and its development. We used to grok in the AOL chatrooms about how to create phonetic inflections to make the voice have better characteristics and inflections – and even sing!

But really, very little development and enhancements were made for at least a decade, and much of the functionality of the technology was removed in subsequent versions of Mac OS. Yes, there have been other synthesized voice apps and tech that has been developed over the years for commercial applications (SIRI, Alexa, automotive navigation, etc.) but in general, synthesized voices from user-input text has always sounded robotic and unhuman. Pretty much the way image generating AI can’t quite make “real” humans yet. (The hands, Chico, THE HANDS!!!)

There’s still a lot of totally robotic sounding AI TTS tools out there, and don’t get me started with those shitty TicTok videos with lame TTS! ACK!!

But there’s recently much more excitement in this industry now, and we are truly getting closer to some remarkable (and yet scary) realism in the results these days – much of what you’ve most likely seen and heard in various Deepfake videos.

Eleven Labs

AI Tools Part 1: Why We Need Them 304

Probably the most impressive examples I’ve seen/heard yet are coming from a startup called Eleven Labs https://blog.elevenlabs.io

Not only do they generate an amazingly believable TTS voiced reading of your input text in various styles and accents, but their technology is expanding to provide Dubbing and Voice Conversions.

Imagine being able to let software do the dubbing of your video or film to another language, retaining the emotions and inflections of the original voice actor. Or totally change the voice of someone onscreen to sound like someone else. Or even utilize your own voice to read copy without having to sit in the booth to record over and over and over.

I’ve requested beta access, so stay tuned for a subsequent article on this topic once I’ve had a chance to get under the hood.

Synthesys.ioAI Tools Part 1: Why We Need Them 305

Synthesys is a pretty impressive TTS tool with realistic and convincing voices from text input, based on the examples from their website. They’re not cheap though, and it doesn’t appear to be any trial versions to test so I’m not going to be able to share any detailed feedback. Check out this link and review the examples to see if it’s the right tool for you.

AI Tools Part 1: Why We Need Them 306

 

Murph.AIAI Tools Part 1: Why We Need Them 307

Murph.ai is an online AI TTS tool that can take written text and turn it into human-sounding recordings. You can try it out for free.

AI Tools Part 1: Why We Need Them 308

There are a lot of various AI voice models to choose from, but like most TTS tools I’ve seen to date, some of the pronunciation and phrasing still sounds artificial. But it’s getting closer and I’m sure will only improve over time.

Tortoise-TTS

AI Tools Part 1: Why We Need Them 309

There are other tools in development and if you’re a heavy-duty programmer/developer and want to play with code and hardware, etc. then check out Tortoise-TTS. You can review some pretty impressive examples of the results from cloning one own voice (or the voices of others/celebrities, etc.) on this examples page. Keep in mind too, that this is Open Source technology still.

Here’s a video tutorial that provides some insight as to what it takes, plus some examples. It’s chapter-driven, so open it up in YouTube to see the chapter markers to jump ahead.

NOTE:

Exploring this amazing leap in technology, what are the ethical repercussions of using an A-List actor’s voice to speak your words? What will the IP laws be like once this rolls out commercially? We’ll explore deeper in a subsequent article in the near future.

——            —–            —–            —–            —–            —–            —–

AI Music Generation Tools

There appears to be myriad AI Music apps and online tools available, so finding one that’s right for you may take a bit of exploring. I’m only sharing a couple that I’ve found to be most worthwhile considering to date. My list may change in time as I dig deeper into the technology and it’s evolution.

SmartSound

AI Tools Part 1: Why We Need Them 310

One of the oldest and most reliable custom music production tools that I’ve used for a couple decades is SmartSound. They first developed their desktop app in the late 90s called Sonicfire Pro and utilizes prerecorded sub-tracks and real instruments to develop amazing soundtrack compositions for your productions. While it’s technically not AI per se, the customizability and depth of what you can produce within their recorded libraries is ever-growing and a solid addition to your video production workflow. It also seems that many online AI music generation tools seem to copy several of their editing and customization features. If you’re a serious producer, this is probably the next best thing to hiring a music director and recording your own scores.

AI Tools Part 1: Why We Need Them 311

The online version of SmartSound is full featured and allows you to work anywhere with an Internet connection and your login to access the libraries and content that you’ve purchased – which makes it great for editing on the road or from your office/home.

Their desktop software, Sonicfire Pro allows some deep editing and exploration beyond the initially-generated tracks, and it also lets you create and edit your downloaded library resources while offline.

SmartSound isn’t cheap, but it’s still the king of auto-generated, customizable music tracks that I’ve found to date. And the fact that the instruments and resource tracks are actual recorded studio sessions, make the quality truly broadcast worthy.

Soundraw

AI Tools Part 1: Why We Need Them 312

Soundraw is an online platform that allows users to create and edit music using AI technology. The platform offers a variety of tools and features such as a music generator, drum machine, and effects processor that can be used to compose and produce original tracks. Users can also upload their own samples and tracks to the platform for editing and manipulation. Additionally, the platform allows users to export their creations in various audio formats for use in other projects or for sharing online.

Creating a track can be as easy as defining the Mood, Genre, Theme, Length, Tempo and Instruments featured. Soundraw then provides you with doszens of options that you can download and use directly or go to the online editor to further refine your desired tracks.

AI Tools Part 1: Why We Need Them 313

The online track editor allows you to make several changes to each section in the created track, adjusting various instruments, volumes, energy levels, track length, etc.


AI Tools Part 1: Why We Need Them 314

Pricing for Soundraw is a nominal commitment but fortunately, it appears that most the the tool’s features are unlocked to explore in the Free version until you want to download a track.AI Tools Part 1: Why We Need Them 315

AIVA

AI Tools Part 1: Why We Need Them 316

AIVA (Artificial Intelligence Virtual Artist) is a music generator that uses artificial intelligence algorithms to compose original music. It can create various types of music, such as classical, electronic, and rock, and can also mimic the style of specific composers or genres. It can be used by musicians, film makers, game developers and other creators to generate music for their projects.

AIVA (Artificial Intelligence Virtual Artist) was created by a Luxembourg-based startup company of the same name, founded in 2016. The company’s goal was to develop an AI system that could compose music in a variety of styles and emulate the work of human composers. The company’s co-founder, Pierre Barreau, is a classically trained pianist and composer who wanted to use his background in music and AI to create an AI system that could generate high-quality, original music. The company launched its first product in 2016, which was a website that allowed users to generate short pieces of music based on their preferences. Since then, the company has continued to develop and improve its AI algorithms, and has released more advanced versions of its music generation software.

While the instrumentation is primarily MIDI-based, many of the arrangements have more complexity and variability than some of the simpler music generators I’ve found online.

There’s an editor that allows you to see how the different instruments are programmed in the resulting track, but do require a free desktop editor app you can download directly. I’ve yet to install it, but will in my deep-dive article for Audio AI tools.

AI Tools Part 1: Why We Need Them 317

The cost isn’t horrible, but there are a lot of features unlocked in the free version to see how it works and learn from before committing to a paid subscription if you find it helps you generate soundtracks for your video productions.AI Tools Part 1: Why We Need Them 318

NOTE:

I have a lot to explore with these tools yet and will follow up in a subsequent article that I can dig deeper and see how far I can push them and add customization and manual over-layering in recording my own instruments.

—–            —–            —–            —–            —–            —–            —–

AI Text Generation Tools

*NOTE: This segment written almost completely by AI

AI generative text modules are computer programs that use artificial intelligence techniques to generate text. They are trained on large datasets of human-written text and use this training data to learn patterns and relationships between words and phrases. Once trained, the model can generate new text by predicting the next word in a sequence based on the patterns it has learned.

There are several different types of AI generative text modules, such as:

  • Language models: These models are trained to predict the likelihood of a given sequence of words, and are often used to generate text by sampling from the model’s predicted distribution of words. GPT-2 and GPT-3 are examples of this type of model.
  • Encoder-decoder models: These models consist of two parts, an encoder that takes in a source sequence and compresses it into a fixed-length vector, and a decoder that takes that vector and generates a target sequence. These models are often used for tasks such as machine translation.
  • Variational Autoencoder (VAE) : These models are similar to encoder-decoder models, but they also learn to generate new samples by sampling from a latent space. These models are often used for text generation tasks such as poetry and fiction writing.

All of these models are designed to generate text that is coherent and fluent, and can be fine-tuned for specific tasks such as chatbot, summarization, and text completion.

ChatGPT 

AI Tools Part 1: Why We Need Them 319

ChatGPT is a GPT-3.5 language model developed by OpenAI that uses deep learning techniques to generate human-like text. It is based on the GPT (Generative Pre-trained Transformer) architecture, which is a type of transformer neural network. The model is trained on a massive dataset of text, such as books and articles, to learn patterns and relationships between words and phrases. When given a prompt or a starting text, ChatGPT generates text by predicting the next word in the sequence, based on the patterns it has learned from the training data. The model is able to generate coherent and fluent text that can be used for a variety of natural language processing tasks such as text completion, summarization, and conversation.

ChatGPT is a large language model that can generate a wide range of human-like text. Some examples of what ChatGPT can produce include:

  • Text completion: Given an incomplete sentence or a prompt, ChatGPT can generate the missing words or phrases to complete the sentence.
  • Summarization: Given a long piece of text, ChatGPT can generate a shorter summary that captures the main points.
  • Conversation: ChatGPT can be fine-tuned to participate in a conversation, by answering questions or responding to prompts in a natural way.
  • Language Translation: Given a text in one language, ChatGPT can generate a translation in another language.
  • Text generation: ChatGPT can generate new text on a given topic, by sampling from the patterns it learned during training. This can be used for text generation tasks such as poetry and fiction writing.
  • Sentiment Analysis: ChatGPT can be fine-tuned to predict the sentiment or emotion of a given text.

These are just a few examples of what ChatGPT can produce, the model can be fine-tuned for a wide range of natural language processing tasks and produce more specific outputs.

AI Tools Part 1: Why We Need Them 320

You’ll notice that these screenshots have my initial prompt at the top and the ChatGPT reply below. You’ll also determine how much of this section’s text was generated here!

AI Tools Part 1: Why We Need Them 321

I’ve been using this on my iPhone as well – sometimes just to get a consolidated answer to a current events question. It’s like Google for text.

I’ve tried a couple different ChatGPT-enabled iPhone apps but they seem limiting compared to just signing into your account through your web browser. You still have all the functionality and accessibility that you would on your desktop web browser.

In this example, I was able to give a simple input question, and ChatGPT provided the query title and the complete response, including formatting…

AI Tools Part 1: Why We Need Them 322

…as opposed to the ChatMe app which took a lot more prompting to get an answer instead of stupid replies like “yes, I can give you that”! So frustrating I removed it from my phone!

AI Tools Part 1: Why We Need Them 323

 

Jasper 

AI Tools Part 1: Why We Need Them 324

Jasper is a copywriting software that automatically creates written content through the use of artificial intelligence and natural language processing. It shortens a writer’s research and drafting time since it provides original content in just a few clicks. Jasper is formerly known as Jarvis. But prior to being branded as it is today, it also went by the name Conversion AI. Just like how frequently the company changes its name, the people behind it also update the software almost every week.

Jasper AI employs GPT3 or the third generation of the Generative Pre-Trained Transformer for its artificial intelligence. GPT3 can generate large volumes of text since its machine learning parameters have reached 175 billion. Since Jasper AI is running in GPT3, it remains supreme in producing text-based content. To the untrained eye, the text created by Jasper would pass as something written by a human.

AI Tools Part 1: Why We Need Them 325

Jasper is a deep creative writing productivity tool with lots of templates and videos to get you started right away. There’s a 5-day free trial to explore and learn how useful it is and even the basic subscription allows up to 10 users on your account. Perfect for corporate marketing groups or small businesses and individuals alike.

AI Tools Part 1: Why We Need Them 326

In addition to over 50 individual templates for anything from an email to a blog post to an Amazon product ad or YouTube description, Jasper provides several “recipes” for complete ideas to produce from scratch.

AI Tools Part 1: Why We Need Them 327

Here’s a quick marketing video they just published earlier this month that gives you a quick look at how it works:

I’m looking forward to introducing this to our marketing team at my day gig (biotech marketing) to see how useful this may be to our writing staff and editors and marketers. I can see it’s usefulness instantly, but don’t confuse it’s capabilities with that of ChatGPT. Jasper has limited ability to fully create content without your guided input, so don’t expect it to give you EVERYTHING without feeding it the key ingredients first. But it’s highly trainable and can produce any style of messaging from very little data. And it can provide translations as well!

NOTE:

I’m pretty sure that many apps are going to start implementing ChatGPT (GPT-3) technology under the hood in the coming year, so watch for this evolving technology to pop up in your web editors and word processors shortly. There’s already a couple Chrome extensions available so I’ll explore how those work as well in my upcoming article on this segment.

—–            —–            —–            —–            —–            —–            —–

If AI technologies are combined – What will they produce?

So I got thinking one night while I was experimenting with ChatGPT and Midjourney AI on my iPhone, “What if I totally left it up to AI to create something original with no input from me? Not even a hint or suggestion!

So I asked ChatGPT to “Give me an awesome prompt for Midjourney” and this was the response (even the “Theme” was generated by AI):

AI Tools Part 1: Why We Need Them 328

I copy/pasted the response directly into Midjourney on the Discord app and didn’t provide any further prompt directions, and it spit out these initial basic images:

AI Tools Part 1: Why We Need Them 329

Through some variations and regenerating the same prompt with upscaling, I got a bunch of actually useable images rendered, without any Photoshop!

AI Tools Part 1: Why We Need Them 330

You can bet that I’ll be doing more of this in the future – and then combine some of the other AI to help me tell a story. This is really a creativity jumpstarter!

What to expect in the coming months?

AI Technology has certainly made its mark and will continue to be a presence in the digital industry. It is no wonder that claims of AI being a fad were quickly squashed, as it has provided endless tools for production assistance and marketing analytics that have been vital to our creative processes. With the ability to customize AI according to our needs or workflows, this technology looks like it’s here to stay. In my upcoming posts, I will dispel any apprehensions you may have about using AI and will provide insight into some of the specific type of Artificial Intelligence that can benefit you and your projects/marketing endeavors. So join me as we explore this fascinating world of Intelligent Machines! We’ll chart a course for productivity by diving deeper into ideas such as Machine Learning, Neural Networks and more, so stay tuned for more info, updates and articles. Let’s see what the future holds!
UPDATED: Here’s a more complete list of AI Tools that’s updated regularly.

UPDATED: AI Tools Part 2: A Deeper Dive

AI Tools Part 1: Why We Need Them 331

]]>
https://www.provideocoalition.com/ai-tools-part-1/feed/ 3