AI, philosophy, economics, authorship and salmon

For this entry I set out to research on current conversations about developments and use cases of AI in the realm of Austrian authors, artists and journalists. Jörg Piringer uses AI in a way that I find fascinating and want to use as a source of inspiration for my own findings of possible use cases. Taking a liking to poetry, Piringer experiments with AI in this art form, exploring its strengths and weaknesses and using those to his advantages. More on his experiments and views on the subject below.

Walter Gröbchen takes a surprisingly economical stance on the current situation of AI, speaking of concerns for the modern technology market with its speedy growth and rising tensions between big players.

Finally, I take a look at Moritz Rudolph’s ‘Der Weltgeist als Lachs’ in which he speaks of a so-called spirit of the world, what defines it, where it spawned, where it has been going and where it could end in the future, comparing it to the life cycle of the Salmon, which returns to its place of birth at the end of its life. He puts China and its rapid technological advances in the center of his attention. This is where the inevitability of AI in the context of industry 4.0 ties into my research topic. I go into more detail below.

Jörg Piringer – Datenpoesie, Günstige Intelligenz

Born in 1974 and self titled musician, IT technician and author Jörg Piringer has been interested in and exploring the connections of technology and the arts, more specifically poetry, since the 1990s. In his 2018 book titled ‘Datenpoesie’, he experiments with a variety of algorithms such as auto-completion, compression and translation algorithms as well as the number based language system of Hebrew to explore how machines understand, evaluate and interpret language and how we can use these technologies to transform language itself.

His 2022 book ‘Günstige Intelligenz’ Makes use of the recent developments in the field of AI, more specifically OpenAI’s GPT as well as Chat-GPT. He feeds the model poetic inputs and prompts it to come up with original poetic ideas and even words. In an interview with Günter Vallaster, questions about authorship, AI’s originality, humanity and emotionality are raised. Piringer sees AI as a tool and sees no connection to authorship, given the AI’s lack of emotions and understanding of our world, though he admits to admirable levels of linguistic and especially stylistic understandings of the AI. Furthermore, AI seems to be very capable of recreating human language, however it often succumbs to clichés, stereotypes and rarely surprises with its creations and can rarely be called literature at all. Piringer likes to see this as a feature rather than an unintended lack of performance. All in all, Piringer is unsure yet unfazed about the future of AI, claiming that with any technology comes an inevitable normalisation and further explains that his dystopian and utopian fantasies about AI’s role in our possible future societies he exemplifies in the book are all exaggerated and not to be taken at face value, but serve to start conversations, which is ultimately his goal with the book.

Sources:

Walter Gröbchen – Maschinenraum

Journalist, author and record label owner Walter Gröbchen frequently publishes his thoughts and findings on technologies and other topics on his glossary ‘Machinenraum’ in the Wiener Zeitung. Writing about technological advances naturally brings up the topic of AI, which he discusses in his article ‘Die Simulation von Intelligenz’. In it, he speaks about the ever growing interest in the pre-trained model GPT with Microsoft, Elon Musk and Peter Thiel planning on investing billions of dollars into OpenAI with respective interests in mind. 

He theorises that the recent developments in AI, especially that of ChatGPT with its astounding resemblance of human communication re-ignites a forgotten fantasy of many of the big players of the tech industry; the idea of human-like interactions with machines. The implications and implementations range from simple search engine optimisation to voice activated assistants and smart home solutions. Every big player has their own solutions and versions of these services, a competitive edge in the form of AI would be extremely valuable. This extreme economical value of AI is where Gröbchen’s criticisms start. In his article ‘Der Krieg hinter den Kulissen’ he talks about the silicon and chip shortage that hit the tech industry hard during the last years. Even though the situation has been improving as of late, the tech industry is showing no sign of slowing down, so the emergence of AI and its respective rapid development could be a cause for concern in this regard as well.

Sources:

Moritz Rudolph – Der Weltgeist als Lachs

Born in 1989, Moritz Rudolph has studied politics, history and philosophy and is currently writing his dissertation about international politics in regards to old critical theory and is the author of ‘Der Weltgeist als Lachs’. In the book he talks about the so-called spirit of the world and how it relates to the Salmon, explaining that the spirit of the world seems to like to return to its origins, not unlike the Salmon returning to its place of birth to spawn their eggs or die. One has to be careful with this analogy however, as the spirit of the world returns to its inception point in a revolving manner rather than the Salmon’s back-and-forth path during its life cycle.

The spirit of the world manifests itself in what makes us human, our creativity, poetic nature, genuinely new ideas and imperfections. Rudolph speaks of Max Horkenheimer’s pessimistic view of the trajectory society and human history has been steering towards, worrying about a strictly systematically organised, highly efficient world. One that works so exceptionally well that it would generate a global equilibrium, eliminating discrepancies between the first, second and third world and any other negative differences. As Horkenheimer describes in 1966, this advanced society would also be devoid of the human spirit in a sense, as such an optimised system would not allow for creative liberty, emotionality or imperfection. Rudolph recommends to take Horkenheimer’s view of the world with a grain of salt, as it is a product of the cold war and very much influenced by the current geopolitical situation of the Soviet Union. Horkenheimer does, however, refer to another now global superpower that was at the time of the cold war not in the center of attention: China.

This is where Rudolph’s analyses and thought experiments begin. Rudolph speaks of the rapid evolution of China’s industry 4.0 and equates it to our modern definition of development and human advancement. China has been copying industrial practices from the west for the last 50 or so years and in doing so has become frighteningly fast at it. China’s leader Xi Jinping aims to develop China into a global superpower, a system transcendent of national government. This is where Rudolph sees the spawn point of the spirit of the world. Historically, and eurocentrically, it is often said that human history started in the east and has moved to the west, where it manifested itself in the land of opportunity, the USA. As the western world was busy hashing out the cold war, China has been quietly developing its industrial sector, infrastructure and governmental tactics to become the explosively expanding nation it is today. The world’s spirit seems to have managed the leap over the pacific ocean and looks as though it wants to return to its origin, the eastern world.

This substantial developmental speed of China’s industry 4.0 naturally includes AI. China’s unique governmental structure has made it the current leading provider of what Rudolph calls the core resource of AI: human data. Furthermore, China has been filing two to three times the amount of AI patents compared to the second largest player in AI, the USA. Rudolph calls this the first time in post WWII history that a key technology is in the hands of someone other than the USA, and in the first time of recorded capitalism, someone outside the western world. 

Rudolph goes on to theorise about the possibility of AI overtaking China in its race to becoming a potential world leader. One of the main criticisms of the ability of AI that Piringer also experienced in the creation of his poetic experiments with GPT is Ai’s lack of emotions, human imperfections and genuinely novel creative concepts, qualities that resemble Rudolph’s understanding of the spirit of the world. He dismisses this weakness of AI in regards to its leadership ability entirely, arguing that the world has seen countless leaders that show little sign of morals, emotions or creative capacities. What AI excels at is systematic and schematic understanding, which he considers key leadership qualities. This hypothetical world ruled by AI could be seen as Horkenheimer’s feared structurally perfect world – a system devoid of imperfection, poetically originating from the birthplace of the world’s spirit, where it shall find its death, too.

Source:

  • Moritz Rudolph, Der Weltgeist als Lachs [2021], Berlin: Matthes & Seitz Berlin Verlag 2021.

Further possible literature:

  • Nick Bostrom, Superintelligenz. Szenarien einer kommenden Revolution. Aus dem Englischen von Jan-Erik Strasser. Berlin: Suhrkamp 2014; Ray Kurzweil, Menschheit 2.0. Die Singularität naht [2005]. Aus dem Englischen von Martin Rötzschke. Berlin: LoLa Books 2013.
  • Marshall McLuhan, Die Gutenberg-Galaxis. Das Ende des Buchzeitalters [1962], Bonn: Addison-Wesley 1995.
  • Georg Wilhelm Friedrich Hegel, Vorlesungen über die Philosophie der Geschichte [1830/31]. Frankfurt: Suhrkamp 1970. 

State of the Art(ificial intelligence) 2: 3D applications

As of recent, a remarkable barrier has been broken in the world of AI generated artwork: that of the third dimension. Many big players are researching and contributing to this field such as OpenAI, NVIDIA and Google. The models that are currently on the forefront of technical development are already creating promising results but overall seem to still be ever so slightly sub par production levels of quality. However, since AI and ML has been one of the fastest developing technical branches of the recent years, it’s only a matter of time until the models are viable in productions of ever-increasing levels of quality.

Imagine 3D

Luma AI has recently granted limited access to the alpha version of their own text to 3D AI, which both creates 3D meshes as well as textures them.

The results look somewhat promising, though upon closer inspection one may notice that the polygon count of the generated models which you can download from their website is quite high while the model itself does not appear to be highly detailed, resembling the results of 3D scans or photogrammetry.

This factor is definitely limiting, however, since photogrammetry and 3D scans have become a common 3D modelling technique in the last few years, there are established ways of cleaning up the results. Bearing in mind that this is the first alpha version of the AI one see great potential in Imagine 3D.

Point-E

As briefly mentioned in my last post, Point-E comes from the same company that created the GPT models as well as the DALL-E versions, OpenAI. Point-E works similarly to DALL-E 2 in that it understands text with the help of GPT-3 and then generates art. The main difference of course being that Point-E works in 3D space. Currently, Point-E creates what you call point clouds rather than meshes, meaning it creates a 3D array of points in space that resemble the text prompt most closely.

Point-E does not generate textures since point clouds don’t support those, however it assigns a color value to each of those points and OpenAI also offers a point cloud to 3D mesh model that lets users convert their point clouds to meshes.

The left set of images are the colored point clouds with their representative converted 3D mesh counterparts on the right.

Access to Point-E is limited at the moment, with OpenAI releasing the code on github but not providing users with an interface to use the AI. However, a member of the community has created an environment for users to try out Point-E on huggingface.co.

DreamFusion

Google’s research branch has recently released a paper on their own text to 3D model, DreamFusion. It is based on a diffusion model similar to Stable Diffusion and is capable of generating high-quality 3D meshes that are reportedly optimised enough for production use, but I would need to do more extensive research on that end.

At the moment, DreamFusion seems to be one of the best at understanding qualities and concepts and applying them in 3D space, courtesy of an extensively trained neural network, much like 2D models like DALL-E 2 and Stable Diffusion.

Source: DreamFusion: Text-to-3D using 2D Diffusion, Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall, 2022

Get3D

Developed by NVIDIA, Get3D represents a major leap forward for text to 3D AI. It too is based on a diffusion model, only that it works in 3D space. It creates high quality meshes as well as textures and offers a set of unique features that, while in early stages of development, show incredible potential of the AI’s use cases and levels of sophistication.

Disentanglement between geometry and texture

As mentioned previously, Get3D generates meshes as well as textures. What makes Get3D unique is its ability to disconnect one from the other. As one can see in the video below, Get3D is able to apply a multitude of textures to the same model and vice versa. 

Latent Code Interpolation

One of the most limiting factors and time consuming procedures of 3D asset creating is that of variety. Modelling a single object can be optimised to a certain extent, however, in productions such as video games and films, producing x versions of the same object will, per definition, take x times longer and therefore become more expensive very fast.

Get3D features latent code interpolation, which is a way of describing that the AI understands the generated 3D meshes exceptionally well and is able to smoothly interpolate between multiple versions of the same models. This kind of interpolation is anything but revolutionary, however the implementation of an AI using it to its fullest potential is nothing short of impressive and shows a tremendous level of the AI’s understanding of what it is it is creating.

Text-guided Shape Generation

As text to 3D models are in their infancy at the moment, one of the most challenging tasks is creating stylised results of generic objects, something the 2D text to image models are already excelling at. This means that generating a car is a relatively easy task for an AI, but adding properties such as art styles, looks and qualities gets complex quickly. At the moment, Get3D is one of the most promising models when it comes to this particular challenge, only being outclassed by Google’s DreamFusion, but that is all a matter of personal opinion.

Source: GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images, Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, Sanja Fidler, 2022

State of the art(ifical intelligence)

Looking back, AI has definitely been one of the hottest topics of 2022, at the center of which it seemed like were text-to-image AI and neural networks. A polarising topic to be sure with some criticising the possible use cases, especially in the arts, and some praising the technology and it finding a special place within internet culture.

All this buzz can make it confusing to understand just how and what these AI everyone seems to be talking about are. This is what I want to clear up within this blog entry.

OpenAI

I want to start by examining OpenAI briefly. It is a research laboratory founded in 2015 with the aim of creating AI that benefits humanity as a whole in some fashion that has since given birth to many generative models that are now being used as a basis for text-to-image style AI.

OpenAI consists of the for-profit OpenAI LP as well as the non-profit OpenAI Inc. When it was founded, it started with a pledge of 1 billion US dollars from founders Sam Altman and Elon Musk. In 2019, Microsoft invested another 1 billion US dollars into the company.

GPT

OpenAI’s generative pre-trained transformers, or GPT, GPT-2 and GPT-3, respectively are generative models with the purpose of understanding language models. These started out as relatively simple models used to autocomplete sentences in computer programs. The newest model, GPT-3 is a highly complex network with over 175 billion parameters that is capable of understanding long, complex sentences, produce well-formed text and is being used for ChatGPT, a service that lets users converse with the AI. As of June 2020, GPT-3 is licensed exclusively to Microsoft.

DALL-E

OpenAI has also developed DALL-E and its successor DALL-E 2, transformer models that can generate images based on text prompts. Both use GPT-3 as a base to understand the text prompts and then generate images based on said prompts. No public code has been released so far. 

CLIP

CLIP can be seen as the opposite of DALL-E as it is capable of creating detailed descriptions of images. CLIP as well as similar AI models are already being used by many websites such as unsplash.com to create descriptions or alt texts for the images hosted on the site.

Point-E

A more recent development has created Point-E, a transformer model similar to DALL-E that, instead of images creates 3D models based on text prompts interpreted by GPT-3. This is another feature that I believe to be of incredible value, as creating 3D assets is a challenging, highly technical as well as time-consuming and hence expensive feat.

Midjourney

Another noteworthy text-to-image generating AI is Midjourney, which has entered open beta in July 2022 and as of November 2022 has reached its fourth version. The AI can be used through bot commands using the chat platform ‘Discord’. 

Stable Diffusion

Perhaps the most notable client-side text-to-image model is stable diffusion, a collaborative effort by Stability AI, CompVis LMU and Runway along with other contributors. Stable diffusion is based on a diffusion model, its main purpose consisting of de-noising images. Stable diffusion works by creating random noise which it then iteratively de-noises based on a text prompt by the user, eventually creating highly detailed images.

As previously mentioned before, stable diffusion is remarkable in the sense that, in contrast to many other AI text-to-image models, users are able to run it on their private machines rather than only through server or cloud services.

This screenshot below shows stable diffusion working as a blender add-on on my personal computer. On the left you can see the actual model and viewport, the right image shows what stable diffusion creates after being given the viewport render as well as the text prompt ‘Photorealistic face of a monkey’.

Defining AI & MI

Artificial intelligence (AI)

There are a multitude of ways to look at artificial intelligence, some technical, some philosophical and some anthropologic. For example, as per Alan Touring, 

“If there is a machine behind a curtain and a human is interacting with it (by whatever means, e.g. audio or via typing etc.) and if the human feels like he/she is interacting with another human, then the machine is artificially intelligent.”

– Alan Touring (source unknown, ca 1950)

This definition is based more on the human nature of an artificial machine or algorithm and less on the processing power or problem-solving skills of said machine. It does underline the importance of human interaction with an artificial intelligence, though. Given that many consider Touring the man to invent the first computer and the fact that his quote is from the year 1950, it makes sense that processing power wasn’t his biggest concern.

However, since computer science and with it, artificial intelligence are developing at explosive speeds, modern definitions are much more concerned with the processing capabilities of such artificial algorithms:

“As per modern perspective, whenever we speak of AI, we mean machines that are capable of performing one or more of these tasks: understanding human language, performing mechanical tasks involving complex maneuvering, solving computer-based complex problems possibly involving large data in very short time and revert back with answers in human-like manner.”

– Joshi, Ameet V. Machine Learning and Artificial Intelligence. 2020. p.4

Joshi consolidates both the human and procedural aspect of AI in his definition. What I am most interested in is the aspect of an AI understanding human language, imagination and perception. AI such as DALL-E 2 have come a long way in understanding just that. How an AI becomes more intelligent with time is, apart from human input, thanks to machine learning.

Machine learning

“The term refers to a computer program that can learn to produce a behavior that is not explicitly programmed by the author of the program.”

Joshi, Ameet V. Machine Learning and Artificial Intelligence. 2020. p.5

Joshi defines, that machine learning, or MI, “refers to a computer program that can learn to produce a behavior that is not explicitly programmed by the author of the program”. There are a multitude of machine learning algorithms, as there are countless fields in which they are being used. Generally though, machine learning is based on a set of data that a program then has to understand and evaluate based on certain metrics a human may provide as well as experience from prior iterations. The program then has to produce results in the desired way, iterating on each attempt to become better at doing just that.

The level of human interaction can be used as a way to categorise the many different kinds of MI learning and training methods. According to Villar, unsupervised learning may be the best description for the kind of MI useful for image processing and media design in general. With unsupervised learning,

“no label for any input vector is provided. The objective in this case is to find the structure behind the patterns, with no supervisory or reward signal. These models analyze and deduce peculiarities or common traits in the instances so as to discover similarities and associations among the samples. Example problems are clustering and latent variable models.”

Osaba, Eneko. Artificial Intelligence:: Latest Advances, New Paradigms and Novel Applications. 2021.

This could mean understanding the associations humans form with art styles, aesthetics and visual representations of prompts in general. I do want to stress that this is speculation on my point as I didn’t yet look into machine learning for image processing specifically. This blog entry serves more as a definitive basis.

Onboarding & possible educational potential

During a chat with Alex Popkin, director of motion graphics and animation at Ingenuity Studios, an established VFX, compositing and motion graphic studio, he mentioned that he started using AI to generate explanations for media production terms he would like his team to be familiar with. As per the screenshot below, one can see that this works exceptionally well. This has the potential of speeding up on-boarding processes of employees greatly. 

Stable diffusion for Adobe Photoshop

During our conversation, Mr. Popkin also mentioned that the stable diffusion AI I mentioned in my previous entry is also available for Adobe Photoshop directly. The algorithm essentially works the same as in 3D software, but the use cases inside of Adobe Photoshop may be more general and be used for illustration prompts, concept art prompts etc. You can find out more about it here.

Applications of AI & machine learning for media design

The Concept

With my masters thesis I want to explore existing use-cases for AI and machine learning in media production. During my internship at Ingenuity Studios in Hollywood, CA, I noticed that one of the highest prioritised disciplines of any media production process is efficiency. One of the greatest learning experience was to identify which steps in the creative process can be sped up without affecting the quality and or creative vision of the final product. Ideally, I would like to find a niche that may benefit from the use of AI and explore possible use-cases in a media production pipeline.

As far as I can tell, AI is already being used in concept phases with production stages still relying heavily on human input. There are already promising technologies on the rise, a few of which I show below. I believe that many of these technologies are still in early stages of development but due to the exponential nature of machine learning and AI proficiency, I believe that they will be implemented in professional fields very soon.

Stable Diffusion for 3D software

With stable diffusion for Blender, a user may create a text prompt. Stable diffusion will then use the user’s 3D scene to approximate a render result according to the text prompt. While this technology is still in development, its potential looks promising for look development, storyboarding and concept art stages of media production.

AI generated textures

A similar use case to the aforementioned viewport stable diffusion AI, this workflow may speed up the process for the actual production stages for various fields of media design such as motion graphics, VFX or game design. Making use of stable diffusion once again, this add-on for blender allows the user to generate a text prompt upon which the AI will generate seamless PBR textures. The neural network of the AI understands a multitude of surfaces and art styles. In a world where high quality textures are expensive to purchase and technically complicated to produce, this could offer a high-quality alternative for lower budget productions.

AI rendered textures 

Literary research

In a very brief literary research attempt, my main goal was to establish whether or not there were already promising literary sources I could refer to in my potential master’s thesis. Since AI and machine learning are very rapidly developing technologies, I was not too sure about this step of the research process.

As it turns out, there are more than enough scientific papers that talk about AI, both technically, creatively as well as ethically. I’m not sure how much research I want to dedicate to the ethics of AI yet but that is for another research session to decide. So far, I have prepared some sources that seem promising. For my next blog entry, I plan on doing a deep dive into the definitions behind the the terms ‘AI’ and ‘machine learning’.