Post-Magic Leap (April 2020 onwards) I partnered with Karen Laur and, later, also Jamil Moledina on a series of projects under the banner of Crater Productions. The most recent of these was a collaboration with Inworld AI called Virj. (I’m now an advisor to Inworld AI)
The Virj project started at the end of 2022 when John Gaeta—yet another ex-Leaper, among many other distinctions—got in touch with me. He had landed at Inworld AI as Chief Creative Officer. Inworld was looking for an opportunity to showcase their system by building something based on a known IP.
As part of things we were doing at Crater Productions, I had been working on some new material in the Snow Crash universe. In the aggregate this is called the Extended Snow Crash Universe Timeline (ESCUT). It included a new character named Virj—a virtual assistant in a Metaverse real estate development called Vertex 4.
As the story goes, Virj was created by Hiro Protagonist. He told his employers that the name Virj was an abbreviation of Virtual Assistant. In his own mind, though, it came from Virgil, the ancient Roman poet who shows up in Dante’s Inferno as the guide to the underworld.
At some point, Virj becomes sufficiently capable and self-aware that he hacks into some corporate databases he’s not supposed to have access to, and discovers the truth about his own name. He then proceeds to download the complete works of Virgil, as well as Dante’s Inferno, and uses those to reshape his knowledge base and his personality.
At Crater we had already been experimenting with using GPT to produce an all-text version of Virj. We fed in the complete works of Virgil and Dante, as well as Snow Crash, and then saw what happened when we typed in various prompts. Here are a few samples:
Juno in the Black Sun. Nuns take vows of poverty, chastity, and obedience to the Mother Superior. Latro in the Mother Superior's office sings: "Ave, Juno, Musa, and Salus!" Juno, the patroness of marriage, turns out to be a lesbian. Salus is turned into a bull shark. Mother Superior is killed in the second bull shark attack. (This is where the name "entertains a Bull's Head" comes from, which is the name of the firm that inscribes the sneakers.) Latro tries to comfort the surviving nuns by writing them a song…
…ambushed Thracian stalled midway on his way to Atlastic's on the far side of the Metaverse, which was to say the far side of Jove's Astrid but with a lot more computer code and a lot fewer mountains in the way) to ambush and legitimate (since they were both in the process of being legitimized and they both needed to stay secret for as long as they could) partnership in the form of an escort and a base of operations for their secret project: an AIsocosm (the name had been chosen as an homage to the ancient city Agrigentum and its inhabitants…
“My parents were Investigators. I was forced to be constitutet for my over-elaborate safety procedures. I was able to escape only because the In roofers' van was destroyed in the laterre-creation of the Enchanted Isle." (This was an in-joke name for the true-life Islemechanic that the Mount was interfacing with.) "I have never been to the Isle myself, but I have been to the site where it was incarnated in the legendary mud-wrestling match on the South Bank of the Thames. My father was Nigel SARKY.”
We were fascinated and intrigued by these results, which came from GPT-3. So, an earlier generation of the LLM from what’s being used today.
Obviously there is a lot of hallucination going on here. But that’s not a deal killer for this application. Virj, and the world he lives in, are works of fiction. We’re not asking the model to give us factual information. We’re asking it to entertain us. Moreover, Virj, according to his backstory, is what we would call an Unreliable Narrator in the fiction writing biz. Once you’ve caught on to the fact that a narrator is unreliable, then their very unreliability becomes part of the story. It’s a feature, not a bug.
During the time that I was concocting Virj’s backstory I had already become familiar with how LLMs work and their propensity for hallucination. So this was already baked in, as it were, to his origin story. Hiro had been hired by Vertex 4’s management in the wake of the epic failure of an earlier virtual assistant project, Gnatty, who was more in the vein of of a three-dimensional Clippy. Desperate to replace Gnatty with something less embarrassing, they’d asked Hiro if he could hack something together as quickly as possible. Accordingly, he had cobbled Virj together from bits and pieces of other projects.
So it was always part of the vision for this character that he’d be a little glitchy. We knew that we could rely on LLMs to generate some of those glitches for us.
Anyway, we were more than ready to take up the invitation from John Gaeta. For a few months we collaborated with an internal Inworld production team and created an interactive, three-dimensional version of Virj. This was based on Inworld AI’s platform, which was far more capable than the text-based GPT3 systems we’d been using to that point.
Inworld’s platform enables developers to create characters inside of game engines (Unreal Engine, in our case) that are connected to brains in the cloud. The brains are programmed by the game developer to have a certain knowledge base and set of behavioral traits. Through a speech-to-text pipeline the brain can “hear” and “understand” what the player says. The brain then generates an appropriate utterance and sends it back to the game engine, which causes the avatar to speak the words out loud in a realistic voice, complete with facial animations and some degree of emotional response.
The Inworld team produced a couple of documentary videos about the project, which we showed to a select group of senior game industry people at GDC in 2023. Now for the first time they’ve gone up on Inworld’s YouTube channel where anyone can look at them:
It should be stressed that these videos represent the state of the technology more than a year ago.
The top-line summary is that this system turned out to be way more interesting than we expected. To the degree that, a few months later, one of us (Jamil) started working at Inworld, while all three of us launched a startup whose purpose is to build on top of Inworld AI and Unreal Engine. I’ll have more to say about the startup in due course, but for now I wanted to register some other impressions of what it’s like to incorporate an AI back end into a live, running game. Very simply, these break down into Time and Space.
Time
When you’re having a conversation with an LLM, at the beginning you’re apt to think “holy shit, this is really good, it’s totally like talking to a real person.” As time goes on, however, the conversation begins to feel circular and undirected. Of course, you can have conversations like that with real people, but they tend not to be the most interesting conversations—or the most interesting people!
Most real conversations have a sense of direction—a beginning, middle, and end. The participants have motivations—something they hope to achieve over the course of the dialog. Moreover, they have a recollection of what was talked about earlier in the conversation, and so they know not to repeat things that have already been said. If you’ve ever had a conversation with a family member suffering from short-term memory loss, you’ll know what I mean.
In the Virj project, I started by writing a script. It took the form of a dialog between me and Virj. But this was not a script to be followed exactly. It was more in the nature of a template—a general arc for the conversation to follow. Within that structure, Virj’s “brain” had leeway to generate utterances on the fly.
The engineers at Inworld ended up producing three versions of Virj’s brain. One of them followed the script pretty closely. The others had more creative leeway. In the videos above, you’re seeing sample outputs from all three of those brains, edited together.
At the very beginning of the project I had some misgivings about writing a script. The engineer in me wanted to say, “hey, wait a sec, if this system really works, no script should be necessary, the brain should just work.”
By the end of the project, however, the novelist in me had actually become excited about the potential of this technology as a new creative medium. Rather than replacing writers (who hardly need replacing anyway, since we’re numerous, and we work for cheap) this had the potential to empower writers to tell stories in a new way.
Space
According to his backstory, Virj’s job is to serve as a concierge/virtual assistant/tour guide in Vertex 4. This is a city in the Metaverse that has a very particular geography. We, its creators, had maps of the place. We knew what was where. And although it was fine for Virj to hallucinate from time to time when riffing about Virgil or Dante, it actually was somewhat important for him to get things right when talking about spatial relations: what was where, how to get from point A to point B.
Even simple spatial relations are difficult to program into a Large Language Model, however, precisely because it works on the level of language. It doesn’t actually have a spatial model.
We took a stab at it by generating large number of sentences explaining what was where, and training Virj’s brain on those sentences. This wasn’t particularly effective. And it certainly wasn’t efficient.
Consider Manhattan. If you’re on 23rd Street and you need to get to 45th Street, you need to go uptown. That’s just obvious. Likewise if you need to go from 14th to 52nd, or 33rd to 96th. It’d be child’s play to write an algorithm to give those answers. But if you’re trying to get an LLM to tell people whether to go uptown or downtown, you probably need to write a script that would generate a huge number of sentences containing all possible permutations of street numbers, and let the model crunch on that for a while.
The Virj experiment ended before we could tackle such matters. But:
We are trying to develop a creative medium here, not an all-knowing oracle.
In creative media such as games, film, and theater, there is a long and honorable tradition of patching things together behind the scenes to create the desired illusion. We are pragmatists, not purists. The only thing that matters is to delight the audience.
Modern game engines are incredibly sophisticated platforms for keeping track of things in three-dimensional space. They are good at a lot of other things too. But keeping track of things in space is their bread and butter. There’s no real need to try to improve on those capabilities with AI.
What’s next then?
At the end of the Virj project, the path forward looked something like this:
Use AI to make brains
Use game engines to keep track of space
Use writers to manage time: which is to say, the development of a story with a beginning, middle, and end.
In the year since, we’ve been working on that strategic triad in our own lane, while maintaining a friendly and productive relationship with Inworld AI, where they’re pursuing similar strategies and more.
More on our project later. As for Inworld AI, they have an active comms operation complete with a blog and a publication list that can give you some idea of what’s on the horizon if you are curious to know more.
Parting thought
If you made it through to the end of my previous post “Leibniz's Admonition as applied to Magic Video Game Swords,” I ended up suggesting that what artists do is go beyond the boundaries of what can be rationally predicted or extrapolated from known antecedents. And that when you’re working in that zone, you have to abandon the outcome-centric approach that financiers are comfortable with and switch to a different way of thinking and doing. With the Virj project we blundered our way to the edge of that zone and found ourselves exploring the border, combining what could be generated by LLMs (which is all just extrapolation from existing texts) with what couldn’t. For the latter, you need artists.