Challenge Summary *under construction*
Next step: JMLR Special issue on Large Scale Learning
- Submission: 15 October 2008
- Decision: 15 December 2008
- Final version due: 15 February 2009
Results of the Challenge have been announced during the LSL Wshop, ICML 2008
Wild track: | ||||
SGD-QN | Antoine Bordes | |||
Newton SVM | Olivier Chapelle | |||
SDM SVM L1/L2 | Olivier Chapelle, Sathiya Keerthi | |||
SVM track: | ||||
Liblinear | Hsiang Fy Yu | |||
Best student awards: | ||||
Interior point SVM | Kristian Woodsend | |||
Triple Jump SVM | Han-Shen Huang and Chun-Nan Hsu |
The score aggregates the predictive accuracy and computational effort, for various dataset sizes. See below for a discussion.
What, who, how...
Interestingly, most participants used linear algorithms, variants of L1/L2-SVM solvers* SGD-Qn (Antoine Bordes)
* Newton (Olivier Chapelle)
* Dual coord. descent (Hsiang Fy Yu)
* Tripple Jump SVM (Han-Shen Huang and Chun-Nan Hsu)
* Interior point (Kistian Woodsend)
Among the non linear methods are:
* Parallel decision tree (Yossi Richter)
* Averaging of selective naive Bayes (Marc Boulle)
* Averaging of RBF-kernel SVM (Jochen Garcke)
While non-linear methods usually get a better predictive accuracy than linear ones, they often are significantly slower; all the more so as 100k examples was often enough for linear algorithms to converge due to the low data dimensionality.
The programming languages used as C/C++, Java and Matlab. The Wild track and the Alpha synthetic dataset have been the most attractive ones, with respectively 263 entries and 72 submissions.
The most active participants so far are Christian Woodsend (Interior Point SVM: linear, parallel, wild track), Antoine Bordes (SGD-QN, Larank), Olivier Chapelle and Sathiya Keerthi (SDM-L1/L2, Newton).
The Parallel track unfortunately did not attract much interest (see below).
Wshop and Discussion
The Pascal workshop on Large Scale Learning took place after ICML 2008, in Helsinki. All 9 talks and the discussion have been recorded and will be available as Pascal lectures.The discussion focused on two main points: agregating the scores and what next.
Scores, criteria, cheating...
The overall results were summarized after three figures: time vs. aoPRC; dataset size vs. time; dataset size vs. error, reflecting the main questions from the end-user: how long does it take to reach a given performance; how does the computational effort increases with time, and which amount of data is needed for reaching a given accuracy. The second figure (computational effort vs time) is used to compute the empirical computational complexity, which raises the question: smart stopping criteria could be used to reach a sublinear complexity... is this fair ?The current aggregation of the ranks is detrimental to methods which are accurate but slow; what is worse, the time needed to adjust the algorithm parameters is not accounted for, which is especially unfair to parameterless methods (e.g., Bayesian)... How to account for the model selection cost ?
The worst of all is that, by submitting a fast and idiot method, you could scramble the ranks... and kill the order. Clearly, algorithm assessment is a multi-objective problem and there is no way we can come with a single fair total order. Best is to look at the plots to see where is your optimal trade-off given the amount of data / computational budget you have.
Next Challenge
Everybody seemed to be interested in having a second round, even - as could have been expected - nobody was willing to take on. Shall we ? We don't know yet... On one hand the infrastructure is up and running; on the other hand it is difficult to obtain LARGE public datasets, and ajust the criteria accordingly...Some points raised during the discussion:
- LSL *must* go parallel; reserve a few nodes of the PASCAL cluster for the challenge and consider fixed time budgets of e.g. 10, 100, 1000 and 1000 seconds;
- People might submit code to be run on the PASCAL cluster (no mistakes about the computational effort; no need to distribute huge datasets...);
- The cost of model selection must definitely be accounted for;
- The aggregation would proceed by normalizing each one of the six current scores; and averaging the normalized scores (preventing the scrambling-based cheating...).
Help welcome do you want to help organizing a LS-very-L Challenge in 2009-2010 ?
Please contact us.
LSL, the story...
After an initial proposal was presented at the NIPS*07 large scale learning workshop, and received few encouragements, a more polished (and less ambitious ...) version was presented during the PASCAL meeting in Bled, january 2008; the challenge started end of February.Then we had an endlessly long time without any submissions at all; and after that a number of submissions that performed worse than random... Ha ! Olivier Chapelle managed to beat Vojtech's baseline SVM.
For the following period, a race went on between Olivier and Kristian Woodsend with his incredibly fast Interior Point SVM. In the end Olivier won the race with his Newton SVM which remained top ranking till the end of the challenge.
A long time passed before other submissions appeared. Marc Boulle's Bayesian submission ranked high in accuracy but overall received a low rank due to the computational effort...
At that point, a critical post from John Langford drew some more attention to the challenge (thanks John) and we accordingly introduced the parallel track; which finally interested very few people (!?).
Finally, the number of submissions totalled 44, with 49 participants registered.
This table displays the best performing methods on the challenge datasets.
optimal classifier | best submitted model | aoPRC | aoPRC | method |
alpha | quadratic | 0.009774 0.0854 AV SVM - garcke | ||
beta | quadratic | 0.304563 0.4577 AV SVM single - garcke | ||
gamma | semi-quad | 0.009418 0.0116 Avg NB Classif. - MB | ||
delta | semi-quad | 0.071005 0.0801 Avg NB Classif. - MB | ||
epsilon | linear | 0.033645 0.0341 IPM SVM 2 - kristian | ||
zeta l | inear | 0.011361 0.0116 SDM SVM L1 - yahoo | ||
dna | WD-SVM | 0:4558 0.8030 SDM SVM L2 - yahoo | ||
ocr | n.n. n.n. | 0.1579 SDM SVM L2 - yahoo | ||
fd | n.n. n.n. | 0.2314 Avg. NB Classif.CR - MB | ||
webspam | n.n. n.n. | 0.0004 SDM SVM L2 - yahoo |