During this activity, you will begin to add an evaluation component to your kNN implementation. If you did not get your kNN implementation working, see the
gi14-solutions in the class Dropbox folder.
The Supervised learning video from Guided Inquriy 13: Machine Learning discussed a number of binary classification evaluation measures.
Write a program that will take the output from the kNN program (features,true label,predicted label) as an input file and then compute and display the following stats:
- what the positive class is
- confusion matrix values:
- true positives
- false positives
- true negatives
- false negatives
- F1 score
Getting the output of the kNN program
If you are using a bash-like environment in your terminal (you are on macOS using Terminal, using msys or git bash on Window, or any *nix OS), than it’s easy to redirect the output of your kNN program to a new file using the file redirect operator
> file. For example, if I want to store the output in a file named
iris-dev-knn4-predictions.csv, I could do that like this:
python knn.py ../../datasets/iris-train.csv ../../datasets/iris-dev.csv 4 euclidean > iris-dev-knn4-predictions.csv
If you aren’t using a terminal with a bash-like environment, you can look up how to redirect output (e.g., “redirect output powershell”) or modify your kNN program to take an additional command line argument that is the name of the output file, and write the output to that file.
Computing evaluation measures
Aside from accuracy, the evaluation measures we covered are for binary classification, where you have two labels. For right now, assume that the input to the evaluation program is binary (even though, for example, the iris dataset has three labels). Use the true label of the first observation as the positive class and consider any label that is not the positive class to be the negative class.
For example, the true label of the first observation in
virginica, so that is the positive class and
versicolor will jointly be considered the negative class.
Test your implementation using the labels predicted by kNN with k=4 using iris-train.csv as the training set and iris-dev.csv as the testing set.
Work on this for 65 minutes with your group and submit what you have at the end.