CSC466: Topics in Artificial Intelligence, Spring 2015: February 2015

Monday, February 23, 2015

Scholarship Requires Planning

1. Code Refractory and Scratch of Classes Design
1. Connectors
1. Feature and Reward Extractor
1. Q-Learning Agent
1. Game Design Improvement and/or Search Algorithms to Improve Distance Features
1. Statistics
1. Paper

Scholarship Requires Context

Jarad Cannon, Kevin Rose, Wheeler Ruml. “Real-Time Motion Planning with Dynamic Obstacles” Proceedings of the Fifth Annual Symposium on Combinatorial Search (2012).

http://www.aaai.org/ocs/index.php/SOCS/SOCS12/paper/viewFile/5412/5174

Description: Probably it is going to be the main guide for the project, because this article describe the exact kind of algorithm that I need to use and compare some of them, showing pros and cons of each.

Bulitko, Vadim, Yngvi Björnsson, and Ramon Lawrence. "Case-Based Subgoaling In Real-Time Heuristic Search For Video Game Pathfinding." (2014): arXiv. Web. 22 Feb. 2015.

http://arxiv.org/ftp/arxiv/papers/1401/1401.3857.pdf

Description: This article is going to be useful because it talk about subgoaling in the context that I am working with. I could assume each dot in the pacman game as a subgoal.

Maxim Likhachev, David Ferguson , Geoffrey Gordon, Anthony (Tony) Stentz, and Sebastian Thrun, "Anytime Dynamic A*: An Anytime, Replanning Algorithm," Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), June, 2005.

http://www.cs.cmu.edu/~ggordon/likhachev-etal.anytime-dstar.pdf

Description: The LRTA* algorithm is based in static obstacles, differently of the pacman game. So, the article describe one variation of this algorithm to deal with this situations.

I. Szita and A. Lorincz (2007) "Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man", Volume 30, pages 659-684

http://www.jair.org/media/2368/live-2368-3623-jair.pdf

Description: This work describe some heuristics and the use of some algorithms in the pacman game.

Korf, R. E. 1990. Real-time heuristic search. Artificial Intelligence 42(2-3):189–211.

http://www.aaai.org/Papers/AAAI/1988/AAAI88-025.pdf

Description: This article is a great reference in the real time heuristic search field.

David M. Bond, Niels A. Widger, Wheeler Ruml, Xiaoxun Sun.”Real-Time Search in Dynamic Worlds” Proceedings of the Symposium on Combinatorial Search (SoCS-10), 2010.

http://www.cs.unh.edu/~ruml/papers/rtds-socs10.pdf

http://www.cs.unh.edu/~ruml/papers/rtds-socs10-talk.pdf

Description: This article describe other kind of Real-Time Search in Dynamic Worlds algorithm, the

Garcıa, Adrián Ortega, and Juan Carlos Orendain Canales. "Agent Pac-Man: A Study in A* Search Method." Sistemas Inteligentes: Reportes Finales Ene-May 2014} (2014): 1. http://www.researchgate.net/profile/Gildardo_Sanchez-Ante/publication/262600223_Sistemas_Inteligentes_Reportes_Finales_Ene-May_2014/links/0f317538331511fb40000000.pdf#page=6

Description: This work describe the use of A* search method in the Pacman game and also a different behaivor to the ghosts.

Q-Learning Algorithm

Reinforcement Learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs

http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/MS_PACMAN_RL.pdf

Deep Learning for Reinforcement Learning in Pacman

http://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Hochlaender_BScThesis_2014.pdf

Technical Note Q-Learning

http://download.springer.com/static/pdf/35/art%253A10.1007%252FBF00992698.pdf?auth66=1425451074_112ca436260bbadbd8ea71f50d035baa&ext=.pdf

Reinforcement Learning and Function Approximation∗

http://www.cs.uic.edu/~sloan/my-papers/FLAIRS05-to-appear.pdf

Q-Learning with Linear Function Approximation

http://welcome.isr.ist.utl.pt/img/pdfs/1702_COLT07QLPO-Proceedings.pdf

Q-learning with linear function approximation

http://welcome.isr.ist.utl.pt/img/pdfs/1707_RT-602-07.pdf

An Analysis of Reinforcement Learning with Function Approximation

http://icml2008.cs.helsinki.fi/papers/652.pdf

Double Q-learning

http://papers.nips.cc/paper/3964-double-q-learning.pdf

Approximate dynamic programming and reinforcement learning∗

http://www.dcsc.tudelft.nl/~bdeschutter/pub/rep/10_028.pdf

Combining Q-Learning with Artificial Neural Networks in an Adaptive Light Seeking Robot

http://web.cs.swarthmore.edu/~meeden/cs81/s12/papers/MarkStevePaper.pdf

http://busoniu.net/files/papers/adprl11_survey.pdf

Extra

Course berkley

http://inst.eecs.berkeley.edu/~cs188/pacman/home.html

Lecture

http://videolectures.net/icml08_melo_arl/

Slides

https://blog.itu.dk/MAIG-E2013/files/2013/09/lab-8-qlearning-pacman.pdf

http://cs188websitecontent.s3.amazonaws.com/lectures/fa13-cs188-lecture-11-6PP.pdf

https://www-s.acm.illinois.edu/sigart/docs/QLearning.pdf

https://courses.cs.washington.edu/courses/cse573/12au/slides/09-rl2.pdf

Books

http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node65.html#fig:Qbackup

http://www.cse.wustl.edu/~kilian/introductions/reinforcement_learning.pdf

http://www.sztaki.hu/~szcsaba/papers/RLAlgsInMDPs-lecture.pdf

Wiki

http://en.wikipedia.org/wiki/Q-learning#Algorithm

Tutorial

http://mnemstudio.org/path-finding-q-learning-tutorial.htm

http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm

http://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.html

http://burlap.cs.brown.edu/tutorials/cpl/p3.html

http://www2.hawaii.edu/~chenx/ics699rl/grid/rl.html#Q-L - Good references

PacMan-AI/Reinforcement Learning

https://github.com/jcarrillo7/PacMan-AI/blob/50bcc68372ccc377be489770a1b13878e2e70631/Reinforcement%20Learning/featureExtractors.py

https://github.com/jcarrillo7/PacMan-AI/blob/master/Reinforcement%20Learning/qlearningAgents.py

https://docs.google.com/document/d/1j6A34-NBcdTyjQJGvlcyn0tOgucY4wdiQMXj-rlQtzw/edit?usp=sharing

Sunday, February 22, 2015

TTT LM: Heuristic Machine Development

Assignment and Code

Wednesday, February 18, 2015

Informal Video Intro to Machine Learning

Sunday, February 15, 2015

Concept Proof - Pacman LISP + JAVA

I have been doing some tests to see if the Java and Lisp would really work good together to the project using the library abcl. I had pretty good results, but probably I will have some trouble to convert java arrays in lisp lists and with the old design patterns adopted in the Open Source PACMAN. Although, it is part of the process of learning. This project already has "smart" ghosts that are unable to turn back in the path. I made some changes in the code/design and add one lisp function to control the Pacman. ~~With the human behavior of "lose in a hurry"~~ It is going randomly and also it is not allowed to turn back in the path.

You can find the source code on GitHub and a "Concept proof" video bellow:

TTT LM: Modeling Players - Random Machine And Human

Assignment and Code

Wednesday, February 11, 2015

Candidate topics for a Research/Programming Project - Part II

Monday, February 9, 2015

Candidate topics for a Research/Programming Project

Content Based Recommendation System to Events

Choose which bar, show, theater, play or any sort of event, go or do not go can be a big issue for people who live in a big city with a lot of options, or even to people who are visiting a new city. There are some apps, as Foursquare, that could help in this situations, but none of them gives personal recommendation and none of them are based in events. This project would be the creation of a recommendation system to events, where the user could see their prefer kind of events, and a guess about how they would rate those events. In this way, the user would save time, instead of look at all events in the city to choose which one is the best, the user could simply see the top ten events for him. The system would provides personalized recommendation by matching user’s interests with description and attributes of this events. For example, the place, actors, artists, theme, day of week, time kind of music and etc. Moreover, the system would obtain the user’s interests using their likes or dislikes in previous events, their friends interests or even a initial interview. The machine learning would be an essential part of the system, for example, a decision tree could be used to learn a model that smartly chooses minimum set of question while learning user's preference in the initial interview. Moreover, for the recommendation of the event, some standard techniques of machine learning could be used, such as, logistic regression, support vector machines, decision tree or other. Considering the natural interaction required to this system, it should be web based.

Real-Time Learning with Dynamic Obstacles Applied in A PacMan

Real time learning algorithms are commonly used in situations that the agent has to interleaves planning and acting to have a quick answer on a path finding problem. Also, some of those algorithms include obstacles in their analysis, obstacles that can be dynamic or not. It can be used in many different situations, for example a flying drone or a robust robot motion planning in dynamic environments. The project to the course could be the application of some of these algorithms, such as, LSS-LRTA* or neural networks, to a PacMan game.

Articles that can help

http://rvsn.csail.mit.edu/location/Minkov_collaborative_recommendation.pdf http://www.aaai.org/ocs/index.php/SOCS/SOCS12/paper/viewFile/5412/5174 http://www.gamesitb.com/nnpathgraham.pdf
http://www.gamesitb.com/pathgraham.pdf

Other Ideas

Smart Othello - Reversi Game
Traffic Tweet Classifier

Sunday, February 8, 2015

Tic Tac Toe - Visualization, Analysis, and Statistics

In this picture is possible see the "Visualize" method working:

In this picture is possible see the "Analyze" method working:

Combining them we can see the method "demo-va" working:

Bellow, a image of the use of the method "stats" to display the statistics of the game played randomly showing the games.

Running the algorithm with two random players and 30000 games, we could see the statistic in the view of the first player. The result was: 58.43% of win, 28.74% Lost and 12.82% Draw. As expected for most of class it was about 60% of chance of winning. Bellow you can see a screen shot of the test.

and here the code.

Also, the most interesting part of the code was the method of analysis.



(defmethod analyze((l list))
 (defparameter wonlist '( (nw n ne) (w c e) (sw s se) (nw w sw) (n c s) (ne e se) (nw c se) (ne c sw)))
 (defparameter result 'd)
 (defparameter xlist ())
 (defparameter olist ())
 (loop 
  for n from 0
        for move in l
        if (and (evenp n) (equal result 'd)) do (setf xlist (cons move xlist))
  do (loop for lane in wonlist 
    if (equal (length(intersection xlist lane)) 3) 
    do (setf result 'w)
   ) 
  if (and (oddp n) (equal result 'd)) do (setf olist (cons move olist))
  (loop for lane in wonlist  
   if (equal (length(intersection olist lane)) 3) 
   do (setf result 'l)
  )
 )
 result
)

Sunday, February 1, 2015

Random Tic Tac Toe

It is a simulation of 5 games in a random tic tac toe:

Impressively, the first player won every game. This is the file of the code.