In this Sunday’s ‘AI: The Bigger Picture’, I’d like to build on a timely question asked by the editors as the Verge “What is a Photo” following some truly dramatic ways new AI tech by Google in its newest Pixel 8 smartphones. Specifically, the AI ‘computational photography’ technology that brings photo altering technologies only available previously to photo editing professionals with advanced, expensive tools, and years of training. We are about to ask this question far beyond photography, and despite the earnest questions, I think it’s all for the net good. Let me explain.
First, let’s take a look at what catalyzed the Verge folks to call it the “Photography Apocalypse”:
“With the Pixel 8, Google has turned the question of ‘what is a photo’ right on its head.”
“Google pioneered the concept of computational photography, where smartphone cameras do a huge amount of behind-the-scenes processing to spit out a photo that contains more detail than the camera sensor can detect in a single snap. Most modern smartphones use a system like Google’s HDR Plus technology to take a burst of images and combine them into one computationally-created picture, merging highlights, shadows, details, and other data to deliver a more pristine photo. It’s accepted practice at this point, but it also means that a baseline smartphone photo is already more than just “a photo” — it’s many of them, with their best parts combined.”
“The Pixel 8 lineup complicates things further by starting to transform how much a photo can be easily changed after the picture is snapped. It presents easy-to-use editing tools powerful enough to create a completely different image from the original photo you recorded when you hit the shutter button, and those tools are marketed as integral parts of the phone and camera. Photo editing tools have existed since the beginning of photography, but the Pixel 8 blurs the line between capture and editing in new and important ways.”
Specifically, there are four big AI features this time that is causing this re-examining of ‘What is a Photo’. Here is Google describing ‘Best Take’, ‘Magic Editor’, ‘Magic Eraser’, and ‘Zoom Enhance’
“Perfect your group photos with Best Take”
‘If you’re trying to take a group photo, even if you take multiple shots, chances are someone is always looking away or blinking — we’ve all been there, especially if you’ve ever tried to take a photo with kids. To take the stress out of getting that perfect group shot, the new Best Take feature in Google Photos uses a series of similar photos taken close together to help you automatically create a blended image with everyone’s best expression. If you prefer another expression, you can manually select another look from the other photos you took to get the group photo you want.”
Next up, ‘Magic Editor’, which really raised the key question on ‘What is a Photo’:
“Reimagine your photos with Magic Editor.”
“Earlier this year, we announced Magic Editor, a new experimental editing experience coming to Google Photos that uses generative AI to help you easily make complex edits and bring your photos in line with how you remember a moment. Want to resize or reposition your subject? Just tap or circle the object you want to edit, then drag to reposition it or pinch to resize it. You can also use contextual suggestions to improve the lighting and background, like changing a gray sky to a golden-hour sunset. Plus, after you select an edit, Magic Editor will give you multiple result options to choose from so you can get the look you want.”
“Magic Editor is a new experience from Labs in its early stages, and we know there are going to be times when the result isn’t exactly what you imagined. Your feedback is going to be critical in helping us improve the product over time so you can get the best edits possible. This is just the beginning, and we plan to add more intuitive generative AI features to Magic Editor in the future to help you bring your photos to life in new ways.”
And, ‘Magic Eraser’:
“Reduce distracting images and sounds in your videos with Image and Audio Magic Eraser.”
Much like unwanted photobombers in your photos, distracting background noises in your videos pull focus from what you’re trying to capture. Using advanced machine learning models, Audio Magic Eraser can identify sounds — like people talking in the background, music or wind — and sort them into distinct layers that you can control. Then, in just a few taps, you can reduce distracting noises so your video sounds the way you want.”
Last but not least, ‘Zoom Enhance’:
”Zoom in on what matters after the fact with Zoom Enhance.”With Zoom Enhance coming later to Pixel 8 Pro, you can zoom in on any photo after the fact and crop to what you want the focus of your photo to be. Using generative AI, Zoom Enhance intelligently fills in the gaps between pixels and predicts fine details, opening up more possibilities when it comes to framing and flexibility to focus on the most important part of your photo.”
Computational photography is also a vigorous pursuit of Google competitors like Apple, Samsung and others, especially with their flagship smartphone offerings every year. This piece by CNET highlights their competing features and capabilities.
The main issue raised by the Verge and other commentators of late, especially after Google’s unveiling of the above features in particular, is the following:
“The ease of use of generative AI can be bad, Peters argued last month in a conversation with The Verge’s editor-in-chief, Nilay Patel. “In a world where generative AI can produce content at scale and you can disseminate that content on a breadth and reach and on a timescale that is immense, ultimately, authenticity gets crowded out,” Peters said. And Peters believes companies need to look beyond metadata as the answer. “The generative tools should be investing in order to create the right solutions around that,” he said. “In the current view, it’s largely in the metadata, which is easily stripped.”
“Currently, we’re at the beginning of the AI photography age, and we’re starting off with tools that are simple to use and simple to hide. But Google’s latest updates make photo manipulation easier than ever, and I’d guess that companies like Apple and Samsung will follow suit with similar tools that could fundamentally change the question of “what is a photo?” Now, the question will increasingly become: is anything a photo?”
I understand the angst underlying this question. But from another perspective, AI technology is just doing what technology has done for centuries and more. Make conveniences available and affordable only to the few, to be available to many at ever cheaper and accessible prices and convenience. Gutenberg press is a key exhibit, breaking the lock on information held as a mechanism of control by the elite in so many societies.
One could argue that every one of the four above features available in Google phones and soon in most smartphones, have been done and more for decades in almost every professional photograph seen by billions on magazine covers for decades. It’s nothing that couldn’t be done in Photoshop. We all know that celebrities and super models really don’t look like their photos in real life.
The same question being asked today of ‘What is a Photo’ is already being asked or will be asked even louder on
‘What is a painting’?
‘What is a song’?
‘What is a video’?
‘What is an article or book’?
‘What is a story’?
And indeed so many other forms of human creativity. Soon to be the province of humans aided by AI, and/or just AI. Taking them all from relative Scarcity to ample Abundance.
Yet we’ve been smart enough to take that technology and their resulting changes in stride. And we will more likely do the same with these technologies now available for the masses. Even at the exponential pace of AI.
Technology just democratizes. And makes things cheaper at Moore’s Law pace or faster now in the case of AI.
Think it’s too early to start hand-wringing around AI reaching some sort of societal threshold. Let’s give ourselves more credit that perhaps society can absorb this technology as well, even at an accelerated pace. People are smart, and technologies make us smarter. Don’t think altered photos, videos and audio is going to fool us for long.
Certainly, there will be bad actors, and bad things done with this technology, deep fakes and all. But overall, this AI technology is likely to do more net good in the long run. Let’s be open to that possibility before asking ‘What is XYZ’? Especially when it comes to AI regulation, open or closed, big AI or small.
It’s likely just more of what humans have always created. Stories in all their forms. And we’re likely to like it more than we think.
Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)