View this PageEdit this PageUploads to this PageVersions of this Page over TimePrintable Version of this PageHome PageRecent ChangesSearchSign In

Assignment 2

Case Based Reasoning Assignment 2

Instructor: Ashwin Ram (ashwin@cc.gatech.edu)

TA: Saurav Sahay (ssahay@cc.gatech.edu)

Description: The goal of this assignment is to explore several specific algorithms that can be used in each of the four stages of the CBR cycle. The assignment will be done in groups of 2, and each group will select two algorithms and implement them.

Due: Thursday, October 8th, 2009

Details:

  1. Select a dataset: visit http://theinfo.org/get/data or http://mlearn.ics.uci.edu/MLSummary.html and select a dataset appropriate for your algorithms (e.g. if you are implementing a feature selection algorithm, select a dataset with lots of features).

  1. Select two algorithms from the following list:

    Feel free to select two algorithms from the same group, in which case you should compare them and explain pros and cons (but do NOT select IBL2 and IBL3, since all IBLn are similar). The list of algorithms provided here is only a suggestion. You may read other papers, and select any other algorithm that you please. All the class papers can be found at the class website here. Please contact the TA prior to selecting an algorithm that is not in the list above.

  1. Create the basic CBR cycle (you can use jColibri if you want, or you may write your own framework in Java or another programming langage). Basically, define a program that:
    1. Loads the case base.
    2. Allows the user to enter a query or problem.
    3. Has a RETRIEVE module to retrieve cases.
    4. Has a REUSE module to adapt the retrieved case to the new problem.
    5. Relies on the user for REVISE by asking the user whether the solution is correct.
    6. Has a RETAIN module to add the new case {problem, solution} to the case base.

  1. Now, implement the two selected algorithms in their respective CBR stages.

  1. Evaluate the efficiency and accuracy of the resulting system.

Example:

Let’s assume that you select the “soybean” dataset, and that you decide to implement the algorithms “Cover Trees” (for retrieval) and “IBL2” (for retention). The logical steps to implement the system and evaluate it would be:

  • Implement a basic CBR system that uses a simple nearest neighbor algorithm to retrieve with the simplest similarity metric that you can think of. You may use your code from assignment 1 for this.
  • Evaluate the efficiency of the system (number of queries per second) and effectiveness (predictive accuracy).
  • Implement “Cover Trees” and “IBL2”.
  • Evaluate the efficiency of the system after implementing cover trees for retrieval (it should be faster), evaluate the difference on case base size, learning time, retrieval time and predictive accuracy after implementing IBL2 (it should be faster on retrieval, slower on learning, use less memory and a bit less accurate).

Hints:

  • To evaluate predictive accuracy, the simplest form is to use a “leave-one-out” method (where one case is removed from the case base and used as the query). If you do this for every case in the case base and compute the percentage of queries solved correctly, you will have a good estimate of the system’s predictive accuracy.
  • To evaluate memory usage, simply count the number of cases stored in the case base.
  • To evaluate learning or retrieval time, run the system in a loop and count how many times per second the system can retrieve or learn a new case (remove any interaction with the user when you do this!).

Deliverables:

  1. Email assignments to the TA by the deadline.
  2. Attach your packages, contained in a file gtXXXXX.Assignment2.zip. The files themselves must be in a sub-folder gtXXXXX\Assignment2 inside the zip.
  3. Your name and GT-ID, written in the body of the email. If you are working in a group, list everybody’s name and GT-IDs.
  4. A short report explaining the results of applying the implemented algorithms to the selected datasets, and their benefits and disadvantages.