Monday, May 4, 2015

Features and Rewards

The features and the rewards are the soul of the QLearning algorithm and of the Pacman agent. Nonetheless, it is the hardest part to get right and, even with high scores, I still thinking the best way to do. I implemented up to 10 different features to represent a state, and tried a lot of combinations between then, but I could do it for more one month until discover the best way, because it is really fun to see the variations in the Pacman behavior according with the changes in the features or reward functions.
For now, I am using only 4 features to represent the Pacman World:

  • One divided by the distance to the closest dot or power dot.
  • One divided by the distance to the closest blind ghost.
  • The quantity of ghosts one step away divided by four.
  • Whether the Pacman is going to be eaten or not.
Each feature receive a weight by the Qlearning algorithm, usually good things are positive numbers and bad things negative numbers, but I am going to explain more with more details in a further post. The reward function that I am using is based in 5 rewards:

  • DOT: value = 5
  • POWER_DOT: value = 10;
  • EAT_GHOST: value = 800;
  • DIE: value = -1200;
  • WALK: value = -2;  
It is clear that the worst thing that could happen is to die, and the reward function reflect that.
This is a list of some other features that I used in other versions of the game: Eat Dot, Eat Power Dot, Eat a Ghost, Distance to Power Dot, Distance to closest active ghost, Sum of distance of active ghosts divided by the number of ghosts actives (I was believing so much in that one, but it create a Pacman afraid of moving) and etc.

All the code related with the features and rewards can be found in those classes that explain themselves with their names: FeaturesExtration, PacmanFeatures and Reward.

No comments:

Post a Comment