Chargement...
 
Tao

Debriefing ICML 2015

Seminar, October 14th, 14h00-16h00


Talks


1. From Word Embeddings To Document Distances, Matt Kusner, Yu Sun, Nicholas Kolkin, Kilian Weinberger
presented by Gregory Grefenstette

abstract:
We present the Word Mover’s Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representations for words from local co-occurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to “travel” to reach the embedded words of another document. We show that this distance metric can be cast as an instance of the Earth Mover’s Distance, a well studied transportation problem for which several highly efficient solvers have been developed. Our metric has no hyperparameters and is straight-forward to implement. Further, we demonstrate on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates.

slides : ReviewGref.pptx


2. Weight Uncertainty in Neural Network, Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra
presented by Gaetan Marceau-Caron

abstract:
We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.



3. The Kendall and Mallows Kernels for Permutations,Yunlong Jiao, Jean-Philippe Vert
presented by Guillaume Charpiat


Abstract:
We show that the widely used Kendall tau correlation coefficient is a positive definite kernel for permutations. It offers a computationally attractive alternative to more complex kernels on the symmetric group to learn from rankings, or to learn to rank. We show how to extend it to partial rankings or rankings with uncertainty, and demonstrate promising results on high-dimensional classification problems in biomedical applications.


Discussion and feedback


1. From Word Embeddings To Document Distances

Le Earth Moving Distance me fait penser à deux choses pour calculer la similarité entre deux documents / sacs de mots:

  • l'approche de Vitanyi fondée sur la compression,
dissimilarite (document A, document B-) = taille (gzip (A union B-)) / (taille (gzip (A)) + taille B)

http://arxiv.org/pdf/0809.2553.pdf - qui permet certaines choses comme la détection de la phylogenie des langues, et d'autres trucs intéressants; mais pas tout; et je ne sais pas où est la limite de ce que ca sait faire, mais je pense que c'est typiquement si on a besoin d'une transformation de la langue des documents A et B pour les amener ds une meme représentation pivot.

  • le domain adaptation tel que l'a présenté Francois Laviolette.
http://arxiv.org/pdf/1505.07818v2.pdf - où tu essaies bien de trouver un encodage de A et de B (ici des sacs de vecteurs) tel que ca amene la distribution de A et la distribution de B à être le plus similaire.

Le point commun est :
On a des sacs A et B; on cherche la transformation (soit de A vers B, Earth moving distance; soit de A et B vers la droite réelle, Vitanyi; soit de A et B vers R**d, Laviolette) qui rapproche A et B.

Maintenant la question est: quelle contrainte/propriété sur la transformation, et quel algo pour optimiser la transformation.
Je pense qu'un papier qui étudie ca du point de vue algorithmique, et sur des applis, serait top.




Collaborateur(s) de cette page: Antoine.Bureau .
Page dernièrement modifiée le Jeudi 15 octobre 2015 14:35:44 CEST par Antoine.Bureau.