fix bug for Kruskal Wallis Test#227
Conversation
|
P.S. Build problems with GHC 9.0 and 9.2 are |
|
P.P.S. Build problems with GHC 9.0 and 9.2 are due to some caching problem. Ignore them |
Shimuuar
left a comment
There was a problem hiding this comment.
What is behavior of exiting tests: scipy throws exception,. R return NaN for p-value. So we need to answer what is desired behavior. Standard formula is not defined in that case but maybe there's sensible limit.
Numerical experimentation suggest that there's no sensible limit. If we take test for two large samples where almost all elements are identical except few. p-value depends only on number of elements that differ from mode and nearly independent of sample size. I think returning Nothing is correct choice.
I'd like to avoid breaking changes if possible
kruskalWallisRankfunction is perfectly defined for samples where all element are same. No need to change anything- Same for
kruskalWallis. It returns NaN but it's quite reasonable way to communicate failure in numeric code.
P.S. Please rebase on top of master. I've fixed CI failures.
fixes #148.
The Kruskal Wallis test requires there to be some underlying variation within the dataset. That means that the test should fail when provided with samples that are all of the same value.
This fix inserts a check for the first rank and the last rank within
kruskalWallisRank. If the first rank is the same as the last rank, then there is no underlying variation, andNothingis returned and propagated forward.