Monday, February 23, 2015

Scholarship Requires Planning

    1. Code Refractory and Scratch of Classes Design
      1. Point of completion: Code refactored and classes design sketch
      2. Deadline: 4d - 3/1/2015
      3. Description: Refactor the code of the open source pacman in order to facilitate the development and the understanding of anyone who look at this code, also, a simple UML classes design can guide the next steps of the project.

    1. Connectors
      1. Point of completion: java-connection and LispConnection classes working good
      2. Deadline: 8d - 3/9/2015
      3. Description: This classes will encapsulate all the communication and changes in data format between the java and lisp programs.

    1. Feature and Reward Extractor
      1. Point of completion: FeatureExtractor class and getLastReward() method working good.
      2. Deadline: 15d - 3/24/2015
      3. Description: The FeatureExtractor.java class will calculate and provide a list of features in a specific state with a specific action to the QAgent in Lisp. For example, the number of ghosts in 1 step of distance. The method getLastReward() provide the difference in the pacman score after last action.

    1. Q-Learning Agent
      1. Point of completion: Q-Learning Algorithm controlling the Pacman.
      2. Deadline: 15d - 4/8/2015
      3. Description: This phase is the core of the project, here I am going to implement the Q-Learning Algorithm.

    1. Game Design Improvement and/or Search Algorithms to Improve Distance Features 
      1. Point of completion: A version with the game design improved
      2. Deadline: 10d - 4/18/2015
      3. Description: Search Algorithms to Improve Distance Features in the pacman agent and/or Improve Ghosts AI and/or add new levels in the game and/or new audio or graphic features.

    1. Statistics
      1. Point of completion: Complete statistic evaluation comparing the random machine, the human player and the q learning algorithm evolution.
      2. Deadline: 5d - 4/23/2015
      3. Description: If the LRT Algorithm is good I will implement some chase and scattering techniques to the ghosts and make the game more interesting, if it is not good, I will see what is wrong and make changes.

    1. Paper
      1. Point of completion: Project Paper Complete
      2. Deadline: 11d - 5/5/2015
      3. Description: This paper is going to describe all the algorithms used in the project.

Scholarship Requires Context

Jarad Cannon, Kevin Rose, Wheeler Ruml. “Real-Time Motion Planning with Dynamic Obstacles” Proceedings of the Fifth Annual Symposium on Combinatorial Search (2012).
Description: Probably it is going to be the main guide for the project, because this article describe the exact kind of algorithm that I need to use and compare some of them, showing pros and cons of each.  


Bulitko, Vadim, Yngvi Björnsson, and Ramon Lawrence. "Case-Based Subgoaling In Real-Time Heuristic Search For Video Game Pathfinding." (2014): arXiv. Web. 22 Feb. 2015.
Description: This article is going to be useful because it talk about subgoaling in the context that I am working with. I could assume each dot in the pacman game as a subgoal.


Maxim Likhachev, David Ferguson , Geoffrey Gordon, Anthony (Tony) Stentz, and Sebastian Thrun, "Anytime Dynamic A*: An Anytime, Replanning Algorithm," Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), June, 2005.
Description: The LRTA* algorithm is based in static obstacles, differently of the pacman game. So, the article describe one variation of this algorithm to deal with this situations.


I. Szita and A. Lorincz (2007) "Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man", Volume 30, pages 659-684
Description: This work describe some heuristics and the use of some algorithms in the pacman game.


Korf, R. E. 1990. Real-time heuristic search. Artificial Intelligence 42(2-3):189–211.
Description: This article is a great reference in the real time heuristic search field.


David M. Bond, Niels A. Widger, Wheeler Ruml, Xiaoxun Sun.”Real-Time Search in Dynamic Worlds” Proceedings of the Symposium on Combinatorial Search (SoCS-10), 2010.
Description: This article describe other kind of Real-Time Search in Dynamic Worlds algorithm, the


Garcıa, Adrián Ortega, and Juan Carlos Orendain Canales. "Agent Pac-Man: A Study in A* Search Method." Sistemas Inteligentes: Reportes Finales Ene-May 2014} (2014): 1. http://www.researchgate.net/profile/Gildardo_Sanchez-Ante/publication/262600223_Sistemas_Inteligentes_Reportes_Finales_Ene-May_2014/links/0f317538331511fb40000000.pdf#page=6
Description: This work describe the use of A* search method in the Pacman game and also a different behaivor to the ghosts.


Q-Learning Algorithm

Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs


Deep Learning for Reinforcement Learning in Pacman


Technical Note Q-Learning


Reinforcement Learning and Function Approximation∗


Q-Learning with Linear Function Approximation


Q-learning with linear function approximation


An Analysis of Reinforcement Learning with Function Approximation


Double Q-learning


Approximate dynamic programming and reinforcement learning∗


Combining Q-Learning with Artificial Neural Networks in an Adaptive Light Seeking Robot


http://busoniu.net/files/papers/adprl11_survey.pdf


Extra


Course berkley

Lecture

Slides

Books

Wiki

Tutorial


https://docs.google.com/document/d/1j6A34-NBcdTyjQJGvlcyn0tOgucY4wdiQMXj-rlQtzw/edit?usp=sharing

Sunday, February 15, 2015

Concept Proof - Pacman LISP + JAVA

I have been doing some tests to see if the Java and Lisp would really work good together to the project using the library abcl. I had pretty good results, but probably I will have some trouble to convert java arrays in lisp lists and with the old design patterns adopted in the Open Source PACMAN. Although, it is part of the process of learning. This project already has "smart" ghosts that are unable to turn back in the path. I made some changes in the code/design and add one lisp function to control the Pacman. With the human behavior of "lose in a hurry" It is going randomly and also it is not allowed to turn back in the path.

You can find the source code on GitHub and a "Concept proof" video bellow:

TTT LM: Modeling Players - Random Machine And Human

Assignment  and Code

Monday, February 9, 2015

Candidate topics for a Research/Programming Project

Content Based Recommendation System to Events

Choose which bar, show, theater, play or any sort of event, go or do not go can be a big issue for people who live in a big city with a lot of options, or even to people who are visiting a new city. There are some apps, as Foursquare, that could help in this situations, but none of them gives personal recommendation and none of them are based in events. This project would be the creation of a recommendation system to events, where the user could see their prefer kind of events, and a guess about how they would rate those events. In this way, the user would save time, instead of look at all events in the city to choose which one is the best, the user could simply see the top ten events for him. The system would provides personalized recommendation by matching user’s interests with description and attributes of this events. For example, the place, actors, artists, theme, day of week, time kind of music and etc. Moreover, the system would obtain the user’s interests using their likes or dislikes in previous events, their friends interests or even a initial interview. The machine learning would be an essential part of the system, for example, a decision tree could be used to learn a model that smartly chooses minimum set of question while learning user's preference in the initial interview. Moreover, for the recommendation of the event, some standard techniques of machine learning could be used, such as, logistic regression, support vector machines, decision tree or other. Considering the natural interaction required to this system, it should be web based. 

Real-Time Learning with Dynamic Obstacles Applied in A PacMan

Real time learning algorithms are commonly used in situations that the agent has to interleaves planning and acting to have a quick answer on a path finding problem. Also, some of those algorithms include obstacles in their analysis, obstacles that can be dynamic or not. It can be used in many different situations, for example a flying drone or a robust robot motion planning in dynamic environments. The project to the course could be the application of some of these algorithms, such as, LSS-LRTA* or neural networks, to a PacMan game.


Articles that can help

http://rvsn.csail.mit.edu/location/Minkov_collaborative_recommendation.pdf http://www.aaai.org/ocs/index.php/SOCS/SOCS12/paper/viewFile/5412/5174 http://www.gamesitb.com/nnpathgraham.pdf
http://www.gamesitb.com/pathgraham.pdf

Other Ideas

Smart Othello - Reversi Game
Traffic Tweet Classifier

Sunday, February 8, 2015

Tic Tac Toe - Visualization, Analysis, and Statistics

In this picture is possible see the "Visualize" method working:





In this picture is possible see the "Analyze" method working:


Combining them we can see the method "demo-va" working:


Bellow, a image of the use of the method "stats" to display the statistics of the game played randomly showing the games.


Running the algorithm with two random players and 30000 games, we could see the statistic in the view of the first player. The result was: 58.43% of win, 28.74% Lost and 12.82% Draw. As expected for most of class it was about 60% of chance of winning. Bellow you can see a screen shot of the test.


 and here the code.

Also, the most interesting part of the code was the method of analysis.


(defmethod analyze((l list))
 (defparameter wonlist '( (nw n ne) (w c e) (sw s se) (nw w sw) (n c s) (ne e se) (nw c se) (ne c sw)))
 (defparameter result 'd)
 (defparameter xlist ())
 (defparameter olist ())
 (loop 
  for n from 0
        for move in l
        if (and (evenp n) (equal result 'd)) do (setf xlist (cons move xlist))
  do (loop for lane in wonlist 
    if (equal (length(intersection xlist lane)) 3) 
    do (setf result 'w)
   ) 
  if (and (oddp n) (equal result 'd)) do (setf olist (cons move olist))
  (loop for lane in wonlist  
   if (equal (length(intersection olist lane)) 3) 
   do (setf result 'l)
  )
 )
 result
)






Sunday, February 1, 2015

Random Tic Tac Toe

It is a simulation of 5 games in a random tic tac toe:


Impressively, the first player won every game. This is the file of the code.