Class Activity 14: kNN Evaluation

Due Apr 9, 2020 by 2pm
Points 1
Submitting a file upload
Available after Apr 8, 2020 at 2pm

Overview

During this activity, you will begin to add an evaluation component to your kNN implementation. If you did not get your kNN implementation working, see the gi14-solutions in the class Dropbox folder.
The Supervised learning video from Guided Inquriy 13: Machine Learning discussed a number of binary classification evaluation measures.
Write a program that will take the output from the kNN program (features,true label,predicted label) as an input file and then compute and display the following stats:

what the positive class is
confusion matrix values:
- true positives
- false positives
- true negatives
- false negatives
accuracy
precision
recall
F1 score

Getting the output of the kNN program

If you are using a bash-like environment in your terminal (you are on macOS using Terminal, using msys or git bash on Window, or any *nix OS), than it’s easy to redirect the output of your kNN program to a new file using the file redirect operator > file. For example, if I want to store the output in a file named iris-dev-knn4-predictions.csv, I could do that like this:

python knn.py ../../datasets/iris-train.csv ../../datasets/iris-dev.csv 4 euclidean > iris-dev-knn4-predictions.csv

If you aren’t using a terminal with a bash-like environment, you can look up how to redirect output (e.g., “redirect output powershell”) or modify your kNN program to take an additional command line argument that is the name of the output file, and write the output to that file.

Computing evaluation measures

Aside from accuracy, the evaluation measures we covered are for binary classification, where you have two labels. For right now, assume that the input to the evaluation program is binary (even though, for example, the iris dataset has three labels). Use the true label of the first observation as the positive class and consider any label that is not the positive class to be the negative class.
For example, the true label of the first observation in iris-dev-knn4-predictions.csv is virginica, so that is the positive class and setosa and versicolor will jointly be considered the negative class.

Testing

Test your implementation using the labels predicted by kNN with k=4 using iris-train.csv as the training set and iris-dev.csv as the testing set.

Submissions

Work on this for 65 minutes with your group and submit what you have at the end.

Rubric

Title:

POGIL Activity (2)

Criteria

Ratings

Edit criterion description

The POGIL activity was reasonably attempted (you were on task and followed directions)

Edit rating Delete rating

Pass

Edit rating Delete rating

Fail

Edit criterion description

You submitted a copy of your group's worksheet

Edit rating Delete rating

Pass

Edit rating Delete rating

Fail

Edit criterion description

You completed the reflection portion of the worksheet

Edit rating Delete rating

Pass

Edit rating Delete rating

Fail

Rubric

Find a Rubric Links to an external site.

Title:

Title

Criteria

Ratings

Pts

Edit criterion description Delete criterion row

Description of criterion

Range

threshold: 5 pts

Edit rating Delete rating

5 pts

Full Marks

Edit rating Delete rating

0 pts

No Marks

pts

5 pts

Total Points: 5 out of 5