'Shape of ROC curve
I did a prediction analysis on a dataset and drew the ROC curve.
The ROC curve looks like below,
Im not very much sure about the shape of the curve. Doesn't it need to be a wavy curve. But looking at the cure, can we decide, that there is an issue with this. I got arount 71% accuracy, that is ok for me. But I'm worrying about the shape of the curve, which is not wavy. For an example doesn't look like below. (taken from internet.)
Solution 1:[1]
It looks like you only plotted three points. The idea of a ROC curve is to show how the FP/TP ratio varies when you tweak the decision threshold in order to establish the performance at every point. Without information about how you plotted this or what parameters you have, it's hard to say anything more.
A typical example would be to tweak aggressivity level -- if you have a spam scanner which will classify as spam at a particular score, how does changing the score threshold change the TP/FP rate? So effectively the X axis will also reveal the threshold setting (but possibly stretched in a manner) and the curve at every point will show how many of the samples in your clean collection will be FPs at that threshold, and how many in your spam collection will be correctly blocked.
("Stretching" means that the threshold setting might not map linearly onto the FP rate. If nothing happens between thresholds 0.950 and 0.975, you don't plot that interval on the x axis at all. The points on the x axis are the threshold values where the TP/FP rate changes; some could be very close to each other in terms of threshold value, and other adjacent points could correspond to a large jump in the threshold value.)
A good ROC curve has a large area underneath it. An ideal ROC goes from 0 to 1.00 and stays there, but then you don't need the plot to help you decide how to deploy your solution anyway. But in reality, they will come in all kinds of shapes, from vaguely asymptotic towards the upper left (very good) to straight diagonal (pretty lousy) and even asymptotic towards the lower right (extremely poor; random verdicts would be better). The interesting points are the "knee" where the TP rate's growth slows down and the FP rate starts growing quicker (that's where you should stop increasing the threshold) and any irregularities, especially any which break monotony.
(In your example from the net, there is a spot around TP 0.6 where increasing the threshold will only increase FPs. Why is that? Is there a skew in the samples, or a problem in the implementation? Could it be fixed?)
Solution 2:[2]
It looks like you have plotted points using the predicted class of a classifier (.predict
function in python's sklearn package) rather than the predicted class probability (.predict_proba
function in python's sklearn package). This means there is only one threshold change, when the class switches from 0 to 1, rather than a range of values that would give you the smooth curve.
Replace your predict class with your prediction probability and this should fix your problem.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | chsws |