Intelligent manufacturing robots, self-driving cars, virtual assistants, augmented reality, and next-gen videogames are only a few use cases that take artificial intelligence from the pen and paper realm of the academic environment and bring it to real life. In fact, you would be hard-pressed to find any major technological breakthroughs that don’t connect in one way or another with artificial intelligence.
But this is only the tip of the iceberg. As the world is becoming increasingly digital, businesses, brands, and large corporations are searching for innovative ways and mediums to propagate their message and reach out to the public. On a similar note, our exposure to an increasingly digital world seems to have fueled a craving for more diverse content as well as an appetite for creating our own digital media. Yet again, artificial intelligence seems to provide an answer and the right tools for the job with synthetic media, a multidisciplinary field that combines machine learning and AI techniques to produce machine-generated content.
Humans.ai is taking the concept of synthetic media and is combining it with blockchain technology to democratize content creation by giving every human being the opportunity to own and control their own piece of AI. With our technology, humans can now leverage their unique biometric data to create digital avatars that can be designed to look like an indistinguishable rendition of their owner or like an entirely new person generated by AI. Humans.ai manages to deliver these features all while providing an answer to some of the most poignant questions that gravitate around the ethical use of artificial intelligence with its blockchain-based technology.
Synthetic media is a catch-all umbrella term that describes the artificial production, manipulation, and modification of data and media content, a process in which human involvement is relatively low as all the heavy lifting is done by AI algorithms. Also known as generative media or AI-generated media, this area of technology is already beginning to revolutionize the face of the media creation landscape, significantly altering the way we interact with and consume new content. Why so? Synthetic media will spark a revolution in the media content sector that will slowly and inevitably have a ripple effect on other sectors. This is because synthetic media will act as a new medium that will supercharge creativity, drastically reducing the gap between ideas and content generation, while also promoting new freedom and flexibility to experiment with new approaches to storytelling and communication.
Probably the most interesting aspect of synthetic media is the fact that it will challenge the current media industry, as content creation will no longer be concentrated in the hands of a few big media companies.
The history of synthetic media is deeply linked with the development of artificial neural networks, a subset of machine learning that acts as the core nucleus of deep learning algorithms. As their name implies, artificial neural networks are inspired by the structure and processes that take place inside the human brain, trying to emulate them to mimic the learning and pattern recognition process that comes naturally to us humans.
Things that seem trivial to us from an intelligence standpoint such as recognizing a face or understanding a spoken word and its meaning depending on the context it is used in, are almost impossible to program through classical algorithms, which are more suited for simulating skills like playing chess. This is why neural networks have established themselves since the 1960s as the best approach to make computers “understand” images and sounds because instead of hardcoding the knowledge inside an algorithm, the computer mimics the learning process while it is trained to recognize patterns through vast quantities of data.
2014 marks a breakthrough year for synthetic media and the field of artificial intelligence as a whole. Machine learning researcher Ian Goodfellow and his colleagues developed a new class of machine learning systems called generative adversarial networks (GAN), which is widely considered by AI researchers as a big step forward for the domain. Yann LeCun, Facebook’s chief AI scientist, described GANs as “the coolest idea in deep learning in the last 20 years.” Another AI luminary, Andrew Ng, the former chief scientist of China’s Baidu, says GANs represent “a significant and fundamental advance” that inspired a growing global community of researchers.
Google also covered significant ground in this field, unveiling in 2017 transformers, a new type of neural network architecture that specializes in natural language processing, a subfield of AI that focuses on analyzing large quantities of natural language data to make computers understand written texts on a complex level, including subtle nuances. The technology offers interesting results, as it has been used to synthesize text and music at a level that approaches humanlike ability. Examples are OpenAI’s GPT-2 and GPT-3.
Synthetic media is composed of multiple branches, each of them targeting different modes of expression like text, speech, audio, images, videos that sometimes are combined to produce something much more complex than the sum of their parts. As a matter of fact, the field of synthetic media has developed so much that some projects seem to have crossed and left the “uncanny valley” territory in their rearview mirror. The “uncanny valley” is a phenomenon by Japanese roboticist Masahiro Mori in 1970 which describes humans’ ability to sense that something isn’t quite right regarding an artificially created object.
An early example of this type of media has been present in pop culture for some time with British bands like Gorillaz, where the band members are portrayed as an ensemble of fictional cartoon characters that are often made to look as they are interacting with real people during interviews or performing on stage alongside human performers. Although the concept is the same, in the case of Gorillaz and other virtual artists like Hatsune Miku from Japan, they are missing the AI element.
In 2016, the world has come to know Lil Miquela, the world’s most popular virtual influencer with 3.1million followers on Instagram. Besides her Instagram fame, she has appeared in ads for big-name brands like Calvin Klein and Loreal, in videos alongside real celebrities like Bella Hadid and J Balvin, while also starring in her own music video. But the fact of the matter is that she is not a real person, but a 3D model created by a team of virtual effects artists.
Artificial images that look like real-life photography can be produced using a deep learning model that takes silhouettes and crude sketches that resemble something drawn in Microsoft Paint and translates them into an image after processing a digital dataset. This process is known as image translation, and it works by taking an image input and transforming it into a realistic landscape. One example of this type of technology is NVIDIA’s GauGAN
Another interesting implementation of synthetic image generation is the website This Person Does Not Exist, created in 2019 by Phillip Wang. This technology works by taking apart all the relevant features of a human face and recomposing them in a coherent manner to generate an image that looks deceptively close to a real human being. Each time you hit the refresh button on the website, it will generate an entirely new artificial avatar. In the hands of journalists, this could become a powerful tool that can help them conceal their identity during investigations.
When talking about synthetic audio and its real-life implementation, most of us only need to reach into our pockets and activate that all-knowing voice that resides inside our phones. Virtual assistants like Siri, Alexa, Cortana, or Google assistant are the first examples that come to mind when we touch upon the synergy of AI and synthetic audio. But we are barely scratching the surface with these examples as synthetic audio can mean much more than first meets the eye or ear in this case.
In 2020, the Washington Post announced that all their written articles will come with an audio option that leverages Android’s IOS’ text to speech functions. The same year also marked the beginning of a partnership between the BBC and Microsoft that aims to create a synthetic voice that will read articles out loud from the media giant’s website.
Synthetic audio can serve, besides media outlets, a wide range of people who lost their voice due to health-related issues. Those of you who remember the ALS ice bucket challenge may know that ALS, also known as Motor Neurone Disease, is a progressive neurodegenerative disease that often takes away a person’s ability to speak. Project Revoice is a non-profit initiative that aims to use synthetic audio to help clone the voice of people who suffer from this disease, essentially ensuring that people who live with ALS aren’t robbed of their voice.
Another example of voice cloning can be seen at Sonantic, a UK-based company that reproduces the voices of actors, who recently released a recording of an AI-generated voice modelled after Val Kilmer who lost his natural voice as a result of throat cancer surgery in 2015.
Synthetic audio doesn’t need to limit its focus only on human voices. AI music synthesizers like Splash and Jukedeck are already streamlining the creation process of signature tunes. In the near future, we may even witness entire movie scores generated with the help of AI.
Often considered the culmination of synthetic media, synthetic videos often combine multiple techniques used in the AI generation of images and audio. The branch of synthetic video content that has received the most attention from mainstream media is known as Deepfake which is the process in which the face of a person is superimposed over the face of another person from a video. Simply put, the face one person, let’s say, actor Steve Buscemi is animated on top of a person from a video, Jennifer Lawrence for example in a way that replicates the facial expressions, including the lip-synch of the original.
Besides presenting itself as a never-ending source of memes and shenanigans, deep fakes can do a lot of damage to an individual’s reputation, going so far as potentially influencing the results of elections or a company’s stock value if a deepfake video goes viral at the right time. For example, a deepfake video of Elon Musk saying that Tesla is down by 10% can have deep financial ramifications on the stock market if the video isn’t immediately debunked as fake. Some examples of deepfake technology in action can be seen in Disney’s Rogue One: A Star Wars Story, in which the characters Grand Moff Tarkin and Princess Leia were recreated digitally, as the actors interpreting them were no longer alive during the shooting of the movie.
Other types of synthetic videos focus more on facial reenactment, where a target actor’s face is controlled using a source actor’s face. Facial reenactment technology transfers the pose, rotation and expressions from one face to another. This opens up a wide range of applications, ranging from creating photo-realistic virtual avatars and allowing celebrities to talk in multiple languages seamlessly. Notable examples of this technology in action are David Beckham speaking in 9 different languages to raise awareness about malaria and a video in which world leaders appear to be singing John Lennon’s Imagine.
Another example of synthetic video technology is the manipulation of old video footage or pictures to bring back to life famous people or younger versions of existing people. The Dali Museum in St. Petersburg, Florida used deepfake technology to create an avatar of famous Spanish surrealist artist Salvador Dali to greet people who visit his museum and talk about his works. In this case, deepfake presents itself as a much cheaper alternative than traditional CGI studios.
Humas.ai is combining the field of artificial intelligence with blockchain technology to provide its own twist on the synthetic media phenomenon. We believe that all of us several billion people present on our planet can provide his/her unique contribution to the development of a new generation of AI-based media that is kept honest and ethical through our complex validation, governance, and consensus mechanism called Proof-of-Human.
Our in-house technology guarantees that each piece of AI is kept under close human supervision to ensure that no piece of AI is used outside its intended purpose.
Our approach to synthetic media focuses on AI NFTs, a new type of asset class that encapsulates its owner’s biometric data, securing it inside an incorruptible and unhackable blockchain network to guarantee full ownership. Users can store biometrics such as their voice, posture, facial expressions, heartbeats, and so on, which can be later used only with the consent of validators from the Humans.ai ecosystem, who act as gatekeepers that enforce a set of rules outlined by the owner of the AI NFT, concerning how their biometric data can be used in the AI creation process.
Ours is a history of creators, and we intend to enable every human to own his/her own piece of AI and contribute directly to the upcoming synthetic media revolution.
With our technology actors can encapsulate their voice inside an AI NFT which can be used to generate audio recordings without needing the actor to go in person to the recording studio. Imagine if DreamWorks has an animated movie under development and decides to hire Morgan Freeman to do the voice of one of the characters. With our technology, there would be no need for Morgan Freeman to go to the studio to record multiple takes and so on. In fact, he would even need to leave the comfort of his home, as the studio could use his voice encapsulate inside one of our AI NFTs and introduce a text input, the lines of dialogue of the character, and receive a perfect audio rendition of Morgan Freeman saying the lines.
It’s no secret that famous people are usually approached by brands to endorse their products be they luxury watches, cars, or plain old deodorant and toothpaste. Regardless of the product advertised, filming an ad takes time and money.
For example, Lionel Messi is contacted by Gillette to make an ad for their new and improved electric razor. Messi would need to find time to make a trip to record the ad, which takes a couple of days, cutting into his training days and so on.
The situation exponentially simplifies if Gillette uses the technology developed by Humans.ai to create a digital avatar of the football player and use him in their advertisement. This would automatically translate to reduced costs for the company, as they will not need to rent a studio and a filming crew, as everything will be generated by computers. For Messi, this would mean a free paycheck, without requiring him to do anything.
The technology developed by Humans.ai can help improve this model by enabling a full-on translation that keeps the original voice of the person used. This way, everyone can generate his own avatar, and make it speak in multiple languages. News, media, and retail companies can greatly benefit from this type of implementation. For example, news media channels can use this technology to make the news presenters speak in different languages depending on the country where the news is broadcasted. The same goes for advertisements or movie audio dubs.
Harry Potter fans, hear me out. Do you remember the life-like pictures that were hung near the moving staircases? What if I tell you that with the technology developed by Humans.ai you are able to make your own moving avatars and render them on an intelligent picture frame. This way, you could hang on your walls the digital avatar of your family, friends, and even long-gone relatives.
Tovid.ai is one of the products that is leveraging the technology developed by Humans.ai. It works by generating a digital avatar that can be instructed to say almost anything. This could usher in a new wave of interactive content like a digital avatar that is designed to provide training sessions in companies or online courses that are no longer a disembodied voice but a visual representation of the person who is teaching the subject. Tovid.ai could also spark a new wave of interactive education, in which historical people like Thomas Jefferson can come back to life to speak about the signing of the Declaration of Independence or other historical events.
The technology developed by Humans.ai empowers people, giving them control over one of the most advanced instruments of our time, artificial intelligence, without needing to worry about the complex inner workings of this technology. We are dawning a new age in the realm of content creation with synthetic media, one in which content creation will be made available to everyone who wants to take an idea and bring it to life. Humans.ai is working to democratize content creation with the help of AI and blockchain technology, all while guaranteeing an ethical implementation of synthetic media.