AI Learns to Play Tag (and breaks the game)

@aiwarehouse 14 days ago

More information about how Albert and Kai were trained: Time it took to train :_Albert::_Kai:: Room 1: 12h 30m (though I stopped the recording after Albert broke the game) Room 2: 13h 40m Room 3: 1d 20h 2m Final Battle: 6h 48m (this wasn’t shown but was needed since the agents weren’t used to seeing other teammates) We continue training on top of the previous brains, meaning by the end of the video Albert and Kai both have trained for 3 days and 5 hours Thank you so much for watching! These short videos take literally hundreds of hours to make, if you want to help allow us to make them faster, please consider becoming a channel member! By becoming a member, your name can be in future videos, you can see behind-the-scenes things that don’t fit in the regular videos, you can also use stickers of Albert, Kai and some other characters our team made in comments :_Albert::_Kai::_Tyler::_Jonas::_Catt: (more coming) :D NOTES When I mention it took x days to train, that’s in game time, and much larger than the displays indicate since there are 200 copies training simultaneously. This is a very long comment going over more of the details of how Albert and Kai works, issues they’ve had, unexpected results etc. THE BASICS: Albert and Kai were trained using reinforcement learning, meaning they were rewarded for doing things correctly and punished for doing them incorrectly (the reward is just increasing their score, and the punishment is decreasing it). After they finish each attempt, the actions they took are analyzed and the weights in their neural networks (brains) are adjusted using an algorithm called MA-POCA to try to prioritize the actions that led to the most reward. The agents start off making essentially random decisions until Kai accidentally tags Albert in the first room and is rewarded, then, as mentioned above, the weights in his neural network brain are adjusted in order to try to replicate that reward (it wasn’t this simple for this video since we use self-play to train both agents at the same time, more on that later). This leads to Kai learning that tagging Albert is good, and since Albert is punished when he’s tagged, it also leads to Albert learning that getting tagged by Kai isn’t good. This process continues through 10s of millions of steps until one of the agents consistently loses, or the agents are able to counter each other well enough to where it’s a draw. REWARD FUNCTION: Albert and Kai are given two types of rewards, group rewards and individual rewards. When Albert gets tagged he’s punished by getting a -1 group reward and Kai is rewarded by getting a +1 group reward and vice versa, encouraging Kai to tag Albert, and Albert to avoid being tagged by Kai. Additionally, Albert is given an individual reward of 0.001 for each frame he’s alive (0.6 total in a room lasting 10s), and Kai -0.001, to encourage Kai to try to tag Albert as quickly as possible. When we introduce the grabbable cubes we also give Albert an individual reward of +1 the first time he picks up the cube to make sure Albert actually starts using the cube (since without this, the rewards were too infrequent for Albert to learn to use it effectively). BRAIN: Albert and Kai’s brains are neural networks with 4 layers each (one input layer, 2 hidden layers and one output layer). The agents collect information about the scene through direct values and raycasts. Every 5 frames they’re fed data about their position in the room, the opponent’s position, velocity, direction etc., and they also collect information through raycasts (a simplified version of eyes). The agent's eyes (raycasts) can differentiate between walls, ground, moveableObjects and Kai/Albert. The agents' brains (neural networks) are given the data the agents collect from direct values and raycasts and use them to predict 4 numbers for the respective agent which control how that agent moves. An example of an output of one of the neural networks is: [1, 2, 0, 1], this would be interpreted as [1=move forward, 2=turn right, 0=don’t jump, 1=try to grab], so the agent being controlled by this neural network would try to move forward while turning right and grabbing. The fact that we have two agents training simultaneously complicates things a bit, normally we’re able just update the agents brains every x steps, but if we did that for both brains at the same time then they would struggle developing multiple strategies, since reinforcement learning tends to be best at finding a single solution, that would lead to the winner dominating and the loser stuck doing the same strategy over and over. The way we tackle this issue is by using something called self-play. Since we use self-play, we technically only train one agent at a time, and swap which is being trained every 100k steps. When we’re training Albert, we use a recent model of Kai’s brain as his opponent, and to avoid there only being one strategy, we store 10 recent brains to use as opponents, swapping them out every couple thousand steps so that Albert learns to beat all of them and not just one. This results in a much more general AI that’s hard to exploit. UNEXPECTED BEHAVIORS: In room 1 Albert manages to break out of the room by exploiting a small hole in the hitbox near the top of the room, which was there because I didn’t make the hitboxes on the walls tall enough. Though Albert used it to escape, I’m not convinced he actually would learn to do it consistently. The challenge with this video is that it can be difficult to interpret the agent’s behaviors; Albert could be making certain unexpected moves as a way to exploit Kai’s poorly trained brain to get him to make bad moves, or Albert could just be making these unexpected moves because he hasn't trained enough. Albert was able to find the hole a few times, however he wasn’t able to do it consistently, this could be from either him not training long enough, his observations not making it easy to detect when he can jump out, or Kai quickly learning to counter him getting to the display in time. In room 2 Albert also manages to glitch out of the room, and he was able to do this consistently. We made sure the cube grabbing functionality was coded as rigorously as possible, even with it automatically detaching the grab if the force exerted is too high, I couldn’t find a single way of exploiting it in testing, but Albert certainly didn’t have issues finding it. Albert also had a couple moments of throwing the cubes at Kai and spinning with the cube to throw Kai out of the room, we didn’t even consider this being a possibility before training, AI’s able to come up with some really clever solutions to problems. OTHER Thank you so much to our amazing team that helped make this video! Jonas helped with setting up the character controls, Tyler helped create the clean grabbing functionality, Catt helped edit and Andrew and Steve helped solve any issues we ran into while making the video. If you want to meet our team and talk to all of us, join our discord server!:) discord.gg/qDRtuFe5gp

@MultiFlash009 14 days ago

Kai occasionally obliterating Albert's dead body shows that AI is capable of learning gamer rage

@Mar_Marine 14 days ago

"Albert, you can't escape" Albert: "Okay, I'll force Kai to escape."

@Chefboyardee119 12 days ago

I love how Albert’s biggest breakthrough was just escaping the tag arena altogether

@fufo654 10 days ago

i like how Kai kept emoting on Albert whenever he tagged him, very human

@BaconBalling 14 days ago

The 1v5 went crazy you can’t even lie

@what_d1245 14 days ago

Albert: "while you struggled on foolish pursuits, i studied the cube"

@milessoup9246 9 days ago

10:15 I absolutely love that four of them just instantly died cause they could not comprehend what was happening fast enough, and one just instantly kicked into life or death mode and started doing insane strategies on the fly to avoid the army of death following him