Conversation
|
I changed this request a little -- it makes more sense to start from pearson, which has the same test, and go from there. |
| pearson :: (G.Vector v (Double, Double), G.Vector v Double) | ||
| => v (Double, Double) -> Double | ||
| pearson = correlation | ||
| => v (Double, Double) -> (Double, PValue Double) |
There was a problem hiding this comment.
you should probably document the return value? that's not clear to an amateur like me at least.
There was a problem hiding this comment.
Yes it's absolutely necessary to document meaning of p-value and what hypothesis is being tested.
Shimuuar
left a comment
There was a problem hiding this comment.
Most important thing is comments. For statistical applications it's crucial to describe what exactly function does
| pearson :: (G.Vector v (Double, Double), G.Vector v Double) | ||
| => v (Double, Double) -> Double | ||
| pearson = correlation | ||
| => v (Double, Double) -> (Double, PValue Double) |
There was a problem hiding this comment.
Yes it's absolutely necessary to document meaning of p-value and what hypothesis is being tested.
| ) | ||
| => v (a, b) | ||
| -> Double | ||
| -> (Double, PValue Double) |
There was a problem hiding this comment.
It's especially important for Spearman correlation. What is meaning of p-value here? Is it described anywhere? I'm not sure that student's distribution will arise for ranks
There was a problem hiding this comment.
Wikipedia sources the following for this test:
Press; Vettering; Teukolsky; Flannery (1992). Numerical Recipes in C: The Art of Scientific Computing (2nd ed.). p. 640.
The equation is 14.6.2. Whether this approximation is optimal, I do not know, but I'm sure there are better methods out there for p-values for Spearman's correlation coefficient, but I used the Student's t distribution as a simple solution.
Unfortunately this changes the API, but I believe that a p-value should be reported with every correlation. This technique uses Student's t distribution, which is fine, but it would be neat to have exact p-values with small samples sizes as well (for the future).