Assignment 2
Case Based
Reasoning Assignment 2
Instructor:
Ashwin Ram (ashwin@cc.gatech.edu)
TA:
Saurav Sahay (ssahay@cc.gatech.edu)
Description:
The goal of this assignment is to explore several specific algorithms
that can be used in each of the four stages of the CBR cycle. The assignment
will be done in groups of 2, and each group will select two algorithms and
implement them.
Due:
Thursday, October 8th, 2009
Details:
- Select a dataset:
visit http://theinfo.org/get/data or http://mlearn.ics.uci.edu/MLSummary.html and select a dataset appropriate for
your algorithms (e.g. if you are implementing a feature selection algorithm,
select a dataset with lots of features).
- Select two algorithms
from the following list:
Feel free to select two algorithms
from the same group, in which case you should compare them and explain
pros and cons (but do NOT select IBL2 and IBL3, since all IBLn
are similar). The list of algorithms provided here is only a suggestion.
You may read other papers, and select any other algorithm that you please.
All the class papers can be found at the class website here. Please
contact the TA prior to selecting an algorithm that is not in the list
above.
- Create the basic
CBR cycle (you can use jColibri if you want, or you may write your own
framework in Java or another programming langage). Basically, define
a program that:
- Loads the case base.
- Allows the user
to enter a query or problem.
- Has a RETRIEVE module
to retrieve cases.
- Has a REUSE module
to adapt the retrieved case to the new problem.
- Relies on the user
for REVISE by asking the user whether the solution is correct.
- Has a RETAIN module
to add the new case {problem, solution} to the case base.
- Now, implement the
two selected algorithms in their respective CBR stages.
- Evaluate the efficiency
and accuracy of the resulting system.
Example:
Let’s assume
that you select the “soybean” dataset, and that you decide to implement
the algorithms “Cover Trees” (for retrieval) and “IBL2” (for
retention). The logical steps to implement the system and evaluate it
would be:
- Implement a basic
CBR system that uses a simple nearest neighbor algorithm to retrieve
with the simplest similarity metric that you can think of. You may use
your code from assignment 1 for this.
- Evaluate the efficiency
of the system (number of queries per second) and effectiveness (predictive
accuracy).
- Implement “Cover
Trees” and “IBL2”.
- Evaluate the efficiency
of the system after implementing cover trees for retrieval (it should
be faster), evaluate the difference on case base size, learning time,
retrieval time and predictive accuracy after implementing IBL2 (it should
be faster on retrieval, slower on learning, use less memory and a bit
less accurate).
Hints:
- To evaluate predictive
accuracy, the simplest form is to use a “leave-one-out” method (where
one case is removed from the case base and used as the query). If you
do this for every case in the case base and compute the percentage of
queries solved correctly, you will have a good estimate of the system’s
predictive accuracy.
- To evaluate memory
usage, simply count the number of cases stored in the case base.
- To evaluate learning
or retrieval time, run the system in a loop and count how many times
per second the system can retrieve or learn a new case (remove any interaction
with the user when you do this!).
Deliverables:
- Email assignments
to the TA by the deadline.
- Attach your packages,
contained in a file gtXXXXX.Assignment2.zip. The files themselves must
be in a sub-folder gtXXXXX\Assignment2 inside the zip.
- Your name and GT-ID,
written in the body of the email. If you are working in a group, list
everybody’s name and GT-IDs.
- A short report explaining
the results of applying the implemented algorithms to the selected datasets,
and their benefits and disadvantages.
|