Class Activity 14: kNN Evaluation
- Due Apr 9, 2020 by 2pm
- Points 1
- Submitting a file upload
- Available after Apr 8, 2020 at 2pm
Overview
During this activity, you will begin to add an evaluation component to your kNN implementation. If you did not get your kNN implementation working, see the gi14-solutions
in the class Dropbox folder.
The Supervised learning video from Guided Inquriy 13: Machine Learning discussed a number of binary classification evaluation measures.
Write a program that will take the output from the kNN program (features,true label,predicted label) as an input file and then compute and display the following stats:
- what the positive class is
- confusion matrix values:
- true positives
- false positives
- true negatives
- false negatives
- accuracy
- precision
- recall
- F1 score
Getting the output of the kNN program
If you are using a bash-like environment in your terminal (you are on macOS using Terminal, using msys or git bash on Window, or any *nix OS), than it’s easy to redirect the output of your kNN program to a new file using the file redirect operator > file
. For example, if I want to store the output in a file named iris-dev-knn4-predictions.csv
, I could do that like this:
python knn.py ../../datasets/iris-train.csv ../../datasets/iris-dev.csv 4 euclidean > iris-dev-knn4-predictions.csv
If you aren’t using a terminal with a bash-like environment, you can look up how to redirect output (e.g., “redirect output powershell”) or modify your kNN program to take an additional command line argument that is the name of the output file, and write the output to that file.
Computing evaluation measures
Aside from accuracy, the evaluation measures we covered are for binary classification, where you have two labels. For right now, assume that the input to the evaluation program is binary (even though, for example, the iris dataset has three labels). Use the true label of the first observation as the positive class and consider any label that is not the positive class to be the negative class.
For example, the true label of the first observation in iris-dev-knn4-predictions.csv
is virginica
, so that is the positive class and setosa
and versicolor
will jointly be considered the negative class.
Testing
Test your implementation using the labels predicted by kNN with k=4 using iris-train.csv as the training set and iris-dev.csv as the testing set.
Submissions
Work on this for 65 minutes with your group and submit what you have at the end.
Rubric
Criteria | Ratings | ||
---|---|---|---|
The POGIL activity was reasonably attempted (you were on task and followed directions)
|
|
||
You submitted a copy of your group's worksheet
|
|
||
You completed the reflection portion of the worksheet
|
|
||
|