Wednesday, October 5, 2011

25th Place in Hearst Challenge

In my earlier post, I have shared the code for data preparation for Hearst Challenge. The final results of Hearst Challenge is announced. Glad to know that my simple model has got 25th place in the Challenge.


 A few lines on the model, I used upsampling to handle the class imbalance. Since the number of negative samples were very very high compared to that of positive ones. I created 10 subsets of the original data, each dataset containing upsampled positive samples and randomly sampled negative ones. An ensemble model with an SVM classifier for each training subset, the model was created. I used SVM perf for the SVM training. This proves a simple model does help though it is not the best.