Todo

CMA-ES
10-CV; cross validation
un autre topic ?

Version (A) 18:00

Social network-features and content-based search for influencer identification

This paper tackles the issue of topic-specific influencers identification in online media and social networks.
Several heuristics have been proposed, from focusing on the impact of a person into the network flow (e.g. triggering cascades) to filtering the network depending on the considered topic (e.g. Personalized PageRank).
We combine supervised learning algorithms and feature engineering, by exploiting content-related information (online media and blogs), social networks features (e.g. extracted from the Twitter graph), and topic-specific ground truth from experts/decision makers.
Experiments on Twitter and mainstream media data show that quite different content-related and social network features are used to characterize influencers depending on the target topic.

Version (B) 17:04.

Social network features and content-based search for influencer identification

How can one identify topic-specific influencers in online media and social networks?
To tackle this problem, a vast number of different methods have been proposed — from measuring the size of cascades in online social platforms (e.g.\ counting Twitter retweets) to filtering the network depending on the considered topic (e.g.\ with Personalized PageRank).
Nonetheless, for all of these it is hard if not impossible to exploit features from different domains simultaneously (e.g.\ from textual analysis and graph analysis), to evaluate against some expert's ground truth (e.g.\ a leading authority's designation of a set of influencers), or both.
This work explores the possibility of removing the two limitations at once: we combine the use of features obtained from different domains (online mainstream media and social networks) with ground-truth sets of influencers obtained from real-world experts. This allows us to restate the problem as a supervised Machine Learning task that we deploy, after careful feature engineering, on a dataset of more than 20.000 online media authors tracked for at least 6 months. Experimental results show that a few features (especially textual relevance and social network popularity) are enough to recognize ground-truth influencers, but also suggest that — crucially — such features should come from different domains.

Version 2h45.

Social network- and content-based search for influencer identification

This paper tackles the issue of topic-specific influencers identification, aimed at online media and social networks.
Several heuristics have been proposed, focussing on the impact of a person into the network flow (e.g. triggering cascades) or filtering the network depending on the considered topic (e.g. Personalized PageRank).

We combine supervised learning algorithms and feature engineering, by exploiting content-related information (online media and blogs), social networks features (e.g. extracted from the Twitter graph), and topic-specific ground truth from experts/decision makers.

Experiments on Twitter and blog data show that quite different content-related and social network features are used to characterize influencers depending on the target topic.

Resources

https://gforge.inria.fr/scm/viewvc.php/biblio/?root=modyrum

However, a hundred is likely too much; I'd say that the following twenty are the most deserving (I'll write the prefix of their filenames):

Baeza2012
Barbieri2013
Benevenuto10
Bodendorf09
Bonchi08
Bonchi11
Bonchi2011
Bonchi2012
Budalakoti12
Chen09
Chi2007
Getoor2011
Kleinberg05
Liu10
Pal11
Romero11
Son11
VerSteeg13
Watts11
Ye12

Titre

Identifying Influencers with Ground Truth
Trendsetters ? (non = early adopters)
Multi-domain features and ground truth for finding opinion leaders

Social network feature and content-based search for influencer identification
: how experts can help.

Abstract

V1

This paper tackles the issue of topic-specific influencers identification, aimed at online media and social networks.
Several heuristics have been proposed, focussing on the impact of a person into the network flow (e.g. triggering cascades) or filtering the network depending on the considered topic (e.g. Personalized PageRank).

We combine supervised learning algorithms and feature engineering, by exploiting content-related information (online media and blogs), social networks features (e.g. extracted from the Twitter graph), and topic-specific ground truth from experts/decision makers.

Experiments on Twitter and blog data show that quite different content-related and social network features are used to characterize influencers depending on the target topic.

Comments

-------------
1) Despite the name, and even if this was its original purpose in the Web, Personalized PageRank is not necessarily used to obtain a topic-biased score; it depends on what the underlying graph, and its nodes' weights, represent. I would add an "e.g. with" right before "Personalized PageRank" (with an uppercase "P"). And I would include that paragraph into the abstract

2) In the third paragraph ("This paper investigates ...") I lost a bit the idea that the goal is finding *topic-specific* influencers. It may be worth remarking it before the end, which may also make clearer what the ground truth is (e.g. with "a few influencers identified beforehand by experts on given topics").

3) I fear it is not clear what is meant by "static" and "network", and "content- and network-related". I like the second ones, but I'd suggest to explain what they mean, e.g. in this way:
"... content-related features (e.g. pure textual relevance of online publications) and network-related features (e.g. number of Twitter followers)"
This also instills the idea that we have *both* type of features simultaneously for our instances, which I think is worthy (see point 5 below).

4) The last paragraph is quite speculative; even if at this stage the purpose is to address the paper to the right reviewers, I would keep things less specific until we get more results.

5) More in general, I know that Michele and I have different points of view on the strengths of the paper: while Michele underlines especially the fact that we treat the problem as a supervised ML problem as only a few people did before, I tend to focus on the availability of ground truth and of crossed social/media data (which nobody has, given the enormous cost of e.g. mapping the New York Times' authors to their Twitter profiles). I think both of these are good points — so, if we can make the two of them appear, it's a plus. However, this is more of an observation for the final version.

V0

Beyond the mere identification of influencers is the characterization of influencers related to a particular topic: a leader in sciences might be unknown to humanities circles, and vice versa.

Several heuristics have been proposed, focussing on the impact of a person into the network flow (e.g. triggering cascades) or filtering the network depending on the considered topic (e.g. Personalized PageRank).

This paper investigates a supervised learning approach to influencer identification: starting from some partial ground truth (a few influencers identified beforehand by the expert user), the goal is to retrieve other influencers.
A principled methodology, based on content- and network-related feature engineering together with statistical learning, is used to build an influencer score, shaped from the seeding ground truth.

Experiments on Twitter + blog data show that quite different static and network features are required to characterize influencers depending on the target topic.

Marco

Réunion 27 oct. 2014

SI 2014

Debriefing Marco Cyril Philippe ECML-PKDD 2014

Todo

Version (A) 18:00

Version (B) 17:04.

Version 2h45.

Resources

Titre

Abstract

V1

Comments

V0

actions

Marco

Réunion 27 oct. 2014

SI 2014

Debriefing Marco Cyril Philippe ECML-PKDD 2014

Todo

Version (A) 18:00

Version (B) 17:04.

Version 2h45.

Resources

Titre

Abstract

V1

Comments

V0