fix for NUTCH-2455 more efficient usage of hostdb in generate#254
fix for NUTCH-2455 more efficient usage of hostdb in generate#254okedoki wants to merge 9 commits intoapache:masterfrom
Conversation
|
Please review only fix for NUTCH-2455 more efficient usage of hostdb in generate(c1ce018) The "added id to output files" is not correct commit, I have reverted it. |
lewismc
left a comment
There was a problem hiding this comment.
Hi @okedoki can you please format this entire patch according to the eclipse-codeformatter thank you
|
I found a bug with partitioned that prevents to get correct hostdb data to the correct reducer. It is fixed. For some reasons, I have a conflict with Generator from master. I assume it happened because of autoformating, so instead of correct comparison it shows that the whole code of Generator is replaced. What is the rule for fixing in this case? |
|
Mmmm OK @okedoki we need to resolve this conflict. The issue here is that you have indented everything by 4 spaces by the looks of it. This is incorrect as indenting accoridng to the code formatting template is 2 space indents. Please update the ppull request again if you could. Thanks |
|
@lewismc |
|
@okedoki thank you very much, this is a big patch and we need to test it out. |
… by reference, fixed with clone
|
There was a silly bug that didnt copy hostdb correctly in reducer because of copy-by-reference. hostDomainCounts.put(key.second.toString(), new MutablePair<HostDatum, int[]>((HostDatum) hostDatum.clone(), new int []{1,0})); at line 484 |
Three questions/modification left open: