
Historique: Marco

Aperçu de cette version: 7

Version 17:04.

How can one identify topic-specific influencers in online media and social networks?
To tackle this problem of prominent importance, a vast number of different methods have been proposed — from measuring the size of cascades in online social platforms (e.g.\ counting Twitter retweets) to filtering the network depending on the considered topic (e.g.\ with Personalized PageRank).
Nonetheless, for all of them, evident reasons make it hard if not impossible to exploit features from different domains simultaneously (e.g.\ for a journalist, both her articles and the number of her regular readers), to evaluate against some expert's ground truth (e.g.\ a leading authority's designation of a set of influential blog authors), or both.
This work explores the possibility of removing the two limitations at once: we combine the use of features obtained from different domains (online mainstream media and social networks) with ground-truth sets of influencers obtained from real-world experts. This allows us to restate the problem as a supervised Machine Learning task that we deploy, after careful feature engineering, on a dataset of more than 30.000 online media authors. Experimental results show that a few features are enough to recognize ground-truth influencers, but also suggest that — crucially — such features should come from different domains.

Version 2h45.

Social network- and content-based search for influencer identification

This paper tackles the issue of topic-specific influencers identification, aimed at online media and social networks.
Several heuristics have been proposed, focussing on the impact of a person into the network flow (e.g. triggering cascades) or filtering the network depending on the considered topic (e.g. Personalized PageRank).

We combine supervised learning algorithms and feature engineering, by exploiting content-related information (online media and blogs), social networks features (e.g. extracted from the Twitter graph), and topic-specific ground truth from experts/decision makers.

Experiments on Twitter and blog data show that quite different content-related and social network features are used to characterize influencers depending on the target topic.


However, a hundred is likely too much; I'd say that the following twenty are the most deserving (I'll write the prefix of their filenames):



Identifying Influencers with Ground Truth
Trendsetters ? (non = early adopters)
Multi-domain features and ground truth for finding opinion leaders

Social network feature and content-based search for influencer identification
: how experts can help.



This paper tackles the issue of topic-specific influencers identification, aimed at online media and social networks.
Several heuristics have been proposed, focussing on the impact of a person into the network flow (e.g. triggering cascades) or filtering the network depending on the considered topic (e.g. Personalized PageRank).

We combine supervised learning algorithms and feature engineering, by exploiting content-related information (online media and blogs), social networks features (e.g. extracted from the Twitter graph), and topic-specific ground truth from experts/decision makers.

Experiments on Twitter and blog data show that quite different content-related and social network features are used to characterize influencers depending on the target topic.


1) Despite the name, and even if this was its original purpose in the Web, Personalized PageRank is not necessarily used to obtain a topic-biased score; it depends on what the underlying graph, and its nodes' weights, represent. I would add an "e.g. with" right before "Personalized PageRank" (with an uppercase "P"). And I would include that paragraph into the abstract

2) In the third paragraph ("This paper investigates ...") I lost a bit the idea that the goal is finding *topic-specific* influencers. It may be worth remarking it before the end, which may also make clearer what the ground truth is (e.g. with "a few influencers identified beforehand by experts on given topics").

3) I fear it is not clear what is meant by "static" and "network", and "content- and network-related". I like the second ones, but I'd suggest to explain what they mean, e.g. in this way:
"... content-related features (e.g. pure textual relevance of online publications) and network-related features (e.g. number of Twitter followers)"
This also instills the idea that we have *both* type of features simultaneously for our instances, which I think is worthy (see point 5 below).

4) The last paragraph is quite speculative; even if at this stage the purpose is to address the paper to the right reviewers, I would keep things less specific until we get more results.

5) More in general, I know that Michele and I have different points of view on the strengths of the paper: while Michele underlines especially the fact that we treat the problem as a supervised ML problem as only a few people did before, I tend to focus on the availability of ground truth and of crossed social/media data (which nobody has, given the enormous cost of e.g. mapping the New York Times' authors to their Twitter profiles). I think both of these are good points — so, if we can make the two of them appear, it's a plus. However, this is more of an observation for the final version.


Beyond the mere identification of influencers is the characterization of influencers related to a particular topic: a leader in sciences might be unknown to humanities circles, and vice versa.

Several heuristics have been proposed, focussing on the impact of a person into the network flow (e.g. triggering cascades) or filtering the network depending on the considered topic (e.g. Personalized PageRank).

This paper investigates a supervised learning approach to influencer identification: starting from some partial ground truth (a few influencers identified beforehand by the expert user), the goal is to retrieve other influencers.
A principled methodology, based on content- and network-related feature engineering together with statistical learning, is used to build an influencer score, shaped from the seeding ground truth.

Experiments on Twitter + blog data show that quite different static and network features are required to characterize influencers depending on the target topic.


Information Version
lun. 27 de Oct, 2014 15h44 sebag from 15
mer. 22 de Oct, 2014 18h34 sebag from 14
mar. 10 de Jun, 2014 15h38 sebag from 13
dim. 23 de Feb, 2014 00h22 sebag from 12
jeu. 13 de Feb, 2014 18h40 Marco.Bressan from 11
jeu. 13 de Feb, 2014 18h07 Marco.Bressan from 10
jeu. 13 de Feb, 2014 18h01 Marco.Bressan from 9
jeu. 13 de Feb, 2014 17h56 Marco.Bressan from 8
jeu. 13 de Feb, 2014 17h05 Marco.Bressan from J'ai ajoute` une nouvelle version de l'abstract, completement differente de les autres, pour voir ce qui nous plait. 7
jeu. 13 de Feb, 2014 14h48 sebag from 6
  • «
  • 1 (en cours)
  • 2