This is our submission to the Encode AI London Hackathon Virtual Protocol: Best AI Use Case for Gaming, Social or Digital Entertainment bounty. In this project, we created an inifinite platforming game with a twist, the levels are generated by a language model!
Instead of using a standard, more traditional level generation routine, we exploited the sequence-based and context aware nature of generative language models to create playable levels following a defined schematic. This demonstrates the potential of these models, as they are not only powerful for language applications but, with some out of the box thinking, can be used to generate visual worlds using only text.
This project demostrates a proof-of-concept of content-generation ability of current open source language models in the visual domain. We show that with the right fine-tuning, the distilgpt2 model can generate unique, playable levels for a platformer-style video game. This has an inital use case of enhancing video game development as it allows the easy creation of unique and varied levels. In this case, we used a platformer as the example, however it would be possible to extend this technology further to other game designs, such as biome generation for a sandbox game like MineCraft. Our solution is suited to this task as language models are inherently context-aware, meaning they can react to the exisiting environment and update or expand upon it in a natural fashion. This is different to tradition level generation where rules have to be manually coded into the game to ensure level generation feels smooth and varied without compromising on playability. For example, traditionally you may have to require generation to place the start and end goal in specific locations, however with generative AI, we found the model learnt the desired start and end location for the levels during the training process without requiring specific instruction.
This project also points to a further use case for language models to complete any generation based task with a well-defined output scheme. By using fine-tuning, we were able to generate consistent level schematics from the language model, however in theory you could generate a schematic for anything using this approach.
The audience for this project is firstly video game players as using generative AI to create levels allows players to experience more diverse gaming experiences, particularly in inifinite/sandbox games where generative AI would be able to continually generate new, exciting terrain for the player to explore. The experience is enhanced by the fact that language models can build on the content they've previously generated, enabling smooth transitions between areas and levels. This could be combined with using language models for more obvious tasks, such as NPC dialogue and location descriptions to give players access to a rich environment that continually grows to challenge the player.
Secondly, video game developers, particularly indie video game developers, would be able to make extensive use of this technology to simplify traditionally challenging tasks in the development process. This project shows that instead of writing hours of code to produce infinite level generation, developers could create a small number of examples of the levels they are looking for and then finetune a model to produce that. Not only does this save the developer time and effort, allowing them to invest more into areas such as game aesthetics or story, but also could generate richer, more varied worlds and levels than traditional algorithmic techniques could manage. Overall, this technology impacts both the consumer and the supplier, benefiting both in different ways.
Utilising language models in this manner will enable more games to be produced with much less effort. Level creation is one of the most time-consuming aspects of game developement with users constantly demanding more and more. Typical procedural generation for levels lack character and creativity which negatively impacts the player experience. Language models have shown themselves capable of creativity and planning for writing tasks, and this project demostrates that they are cable of processing level schematics in the same way. Thus, with further refinement, this technology will produce interesting and intellectually stimulating player levels for the creator to use in their game. The use of this technology by game developers will allow smaller teams to have a reduced workload, building consumer-ready games much faster. With this technology, only the game engine and physics and style of the game need to be decided and created by the developers; the levels will be produced by the AI under the guidance of the developer.
Indie games are a big part of the video game industry, thus, allowing more creative people to bring their projects to life without having to dedicate a large team to the upkeep and creation of levels, will increase the number of quality games on the market. Furthermore, reducing the workload of current developers means mantainence costs associated with games will be reduced. Many mobile games maintain their income and player base by frequently releasing new content in the form of new levels to play. Games such as Candy Crush have been incredibly successfully at maintaining their player base by releasing weekly levels since 2012, accumulating 16000+ levels to date. This intensity of content generation is necesssary to maintain player's attention over time and ensure regular profit, however, it is very time-consuming and requires teams of people working constantly. This technology hopes to allow indie developers to compete with large companies in frequent content generation by drastically reducing the workload.
To keep the entire project lightweight the game was built from scratch in pygame instead of using a large game engine. The game is a 2D platformer, wherein the player moves left, right and jumps to avoid obstacles and reach the end goal. The game features simple collision detection and physics using the pygame python library to facilitate this style of game. To begin, the player is spawned in the bottom left of the level and must get the the goal in the top right. Along the way, there are obstacles in the form of spikes which respawn the player at the start upon coming in contact with them. Upon reaching the goal, a new level is generated by the AI model. A new level comes as a string of characters which is then converted to a map of the level, from which assets are drawn to the screen to represent the level. Collision detection and physics is handled by the game code.
To finetune the AI model, levels were randomly generated with platforms of varying length and hazards. These random levels were often implayable, since a purely randomly generated level will have a little too much variation, however resticting randomness leads to boring levels for the player. This is where the AI comes in. These random levels were sufficient for a language model to understand the idea of a platformer level. The main ideas embedded were that a platformer level needs: a boundary, platforms to reach the goal, and some hazards. The levels produced by the model are much better structured than the random algorithm and lead to a more enjoyable playing experience with infinite precedurally generated levels.
With more time, training and user feedback, the AI would be able to grasp the concept of an easier or harder level, adjusting to suit the ability of the player. A large benefit of the model used (a language model) is that the player could also communicate to the AI what they want the level to look like, then the AI can produce a suitable platformer level with this in mind.
The model is a fine-tuned version of (distilgpt2)[https://huggingface.co/distilbert/distilgpt2] which we have uploaded (here)[https://huggingface.co/Alistair-R/EncodeHackathonLevelGen]. To train the model to generate level schematics, the platformer levels had to be encoded as a string of characters. We used '#' for a platform (or border), 'S' for start, 'E' for end and 'H' for a hazard. Fine-tuning the model required 200 randomly generated level schematics and was quick (~20mins) due to the small dataset size. At the end of training, the model loss was around 0.28 which was sufficient to generate consistent playable levels. As part of the fine-tuning, the training data schematics were tokenized and the model was retrained on the tokens. After finetuning the model with strings of this nature, the model quickly grasped the expected format of the output and produced similar strings to represent levels. An example output of the model is given here:
LevelSchematic:################
# E#
##### ###
# #
# #### #
# #
# ### #### ###
# #
# #### ##
# #
# ## ###
# H #
##### #### ###
# #
#S# ## ### #
################
which was processed with a short function to ensure all the lines of the level were the correct size by adding/removing spaces to produce the following:
################
# E#
##### ###
# #
# #### #
# #
# ### #### ###
# #
# #### ##
# #
# ## ###
# H #
##### #### ###
# #
#S# ## ### #
################
These level schematics could be easily fed into the game to produce a fully functional level for the player to enjoy.