Historique: Data Sets

Afficher la page Collapse Into Edit Sessions

Aperçu de cette version: 1

GitHub - curran/data: A collection of public data sets (as of Jan 15, 2016)

This repository

curran

/
data

Code Issues 2 Pull requests 0 Pulse Graphs A collection of public data sets

HTMLJavaScriptOther New file Find file

HTTPS Choose a clone URL HTTPS (recommended) Clone with Git or checkout with SVN using the repository's web address. HTTPS Learn more about clone URLs Download ZIP Branch:gh-pagesSwitch branches/tags

gh-pages master Nothing to showNothing to show New pull request Latest commit 22c9c2b Jan 12, 2016

curran Clean up README Permalink

Failed to load latest commit information.
Rdatasets	Add R datasets	Oct 25, 2015
airbnb	Add Airbnb data set	Aug 4, 2015
all	Add data from data soup meetup	Aug 30, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Add pincodes dataset	Dec 13, 2015
appliedPredictiveModeling	Add applied predictive modeling data sets	Aug 5, 2015
bokeh	Add Bokeh examples	Oct 25, 2015
calc	Add calc s70 data	Oct 30, 2015
cdc	Added full table data module with unindented causes for the entire ca…	Feb 19, 2014
correlatesofwar	Add correlates of war data	Oct 30, 2015
d3Examples	Add updated unemployment timeseries data	Aug 7, 2015
data.gov.in	Added data sets from data.gov.in	Apr 21, 2014
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Add data.gov data	Aug 5, 2015
dataSoup	Add data from data soup meetup	Aug 30, 2015
datalibExamples	Add datalib examples	Aug 3, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Fixed bug where population was not showing up in CSV file for Geoname…	Apr 29, 2015
dcjs	Add data sets from DC.js	Aug 4, 2015
dspl	Added countries list from Google	Apr 1, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Added Africa undernourishment data set	Dec 1, 2014
fbi	Add stub for FBI crime dataset	Dec 7, 2015
gapminder	Update README	Aug 31, 2015
geonames	Fixed bug where population was not showing up in CSV file for Geoname…	Apr 30, 2015
integrated	Added population vs. gdp data set	Apr 28, 2015
ipo	Add note on IPO calue column	Aug 5, 2015
jsLibraries	Added js lib data set	Apr 1, 2015
mattermark	Removed funky characters in CSV	Aug 18, 2015
medicalStoreChallenge	Add data from medical store challenge	Aug 4, 2015
migrants	Add data from data soup meetup	Aug 30, 2015
motherjones	Add mother jones shooting data	Oct 30, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Added data about bachelors degrees fron NSF	Feb 12, 2014
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Update README.md	Dec 10, 2015
oecd	Add house price data	Aug 5, 2015
olpc	Fixed image	Apr 27, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Add house numbers in Montreal	Aug 3, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Add processed small data sets	Oct 19, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Add William Playfair trade data	Sep 6, 2015
plotlyExamples	Add fertility-rates-in-south data set	Aug 3, 2015
senseYourCity	Added unit test framework, added test for iris dataset parsing using …	Jul 31, 2015
slavevoyages	Add slave voyages data	Oct 30, 2015
statCounter	Remove .DS_Store Mac turd	Aug 27, 2015
superstoreSales	Added superstore sales example data	Sep 2, 2014
syntagmatic	Add data sets from syntagmatic	Aug 4, 2015
tuskegeeInstitute	Add links to references	Nov 24, 2015
tweets	Merge branch 'gh-pages' of github.com:curran/data into gh-pages	Nov 30, 2015
uci_ml	Add cleaned avian flu data	Oct 28, 2015
un	Add UN Data extract	Oct 19, 2015
undp	Add undp data sets	Aug 6, 2015
unhcr	Add note on refugees data	Nov 23, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Add cleaned avian flu data	Oct 28, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Added filtered versions for earthquake data	Apr 27, 2015
util	Added earthquake data	Apr 27, 2015
uwdata_voyager	Add Voyager data sets	Aug 5, 2015
vegaExamples	Add copy of vega example data sets	Aug 3, 2015
w3schools	Added stub for scraping w3schools browser market share data	Apr 3, 2014
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Add WSJ data set	Nov 14, 2015
wikibon	Add Big Data Vendor data from Wikibon	Aug 12, 2015
Plugiciel désactivé Le Plugiciel aname n'a pas pu être exécuté.	Update sm.pop.refg_Indicator_en_csv_v2.csv	Nov 20, 2015
worldFactbook	Added world factbook data	Aug 14, 2013
.gitignore	Added unit test framework, added test for iris dataset parsing using …	Aug 1, 2015
Interest Group Spending 2000-2016.csv	Create Interest Group Spending 2000-2016.csv	Jan 11, 2016
LICENSE	Add MIT Licence for #2	Nov 13, 2015
README.md	Clean up README	Jan 12, 2016
package.json	Added unit test framework, added test for iris dataset parsing using …	Aug 1, 2015
test.js	Added unit test framework, added test for iris dataset parsing using …	Aug 1, 2015

README.md

Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
data

A collection of public data sets for testing out visualization methods. These data sets are at various stages of preparation, some are just raw data, some are CSV files, and some are exposed as AMD modules. This collection is messy, but with some digging you may find hidden gems.

Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Targets for import:

Soul of the Community (American Statistical Association)
World Population Prospects (United Nations)
Employment (Bureau of Labor Statistics)
Healthy People (Centers for Disease Control)
GapMinder Data
NASA Satellite-Derived Environmental Indicators
IMF Public Finances in Modern History Database
Executions in the US by type over time
Datasets used in the book, An Introduction to Categorical Data Analysis
Energy Information Administration Open Data
Data sets from Five Thirty Eight
Data sets in the Infovis Wiki

Here's a listing of data sets with more detail. Columns will be marked in terms of their type for visualization, including:

Q = Quantitative, continuously varying numeric columns

T = Temporal, a timestamp

O = Ordered, distinct categories with a natural order (e.g. Low, Medium, High)

N = Nominal, distinct categories with no natural order (e.g. Ethnicity)

G = Geospatial identifiers (e.g. Country, City)

UCI Machine Learning Repository - Adult (3.8 MB)

This data set demonstrates a mix of quantitative, ordinal, and nominal columns. To analyze this data set using visualization, it would be useful to aggregate the data on the fly before visualization.

age: Q
workclass: N
education: O
education-num: Q
marital-status: N
occupation: N
relationship: N
race: N
sex: N
capital-gain: Q
capital-loss: Q
hours-per-week: Q
native-country: N

Data Canvas Sense Your City (237MB or Real-time API)

This data set contains measures collected by DIY sensor kits across several major cities %22San Francisco%22, %22Bangalore%22, %22Boston%22, %22Geneva%22, %22Rio de Janeiro%22, %22Shanghai%22, %22Singapore%22. There is a visualization competition for this data set, submissions due March 20.

city: G
timestamp: T
temperature: Q
light: Q
airquality: Q
sound: Q
humidity: Q
dust: Q

Medical Store Geospatial Challenge (< 100KB)

This is a data set is small, but comes with a set of real-world questions about the data. This is also a competition, with submissions due April 25.

Referrers - Each row corresponds to information on a particular client referral source.

referrer_code: N

visit_count: Q
city — referrer city
postal_code_referrer: G

(latitude, longitude): G

Clients - Each row corresponds to a client visit to the store

client_id: N

referrer_code: N
city — referrer city
postal_code_referrer: G
(latitude, longitude): G
initial_visit_date: T
product_count: Q

UCI Machine Learning Repository - Individual household electric power consumption (20 MB)

This data set would be a great candidate to show multi-scale temporal aggregation.

timestamp: T
global_active_power: Q
global_reactive_power: Q
voltage: Q
global_intensity: Q

BrightKite User Check-ins (57.2 MB)

This data set would be a useful example for multi-scale aggregation in both space and time. This has been used as the motivating example for several Big Data visualization systems based on data cubes (imMens: Real‐time Visual Querying of Big Data, Nanocubes for real-time exploration of spatiotemporal datasets).

user-id: N
timestamp: T
(latitude, longitude): G

ACLED (Armed Conflict Location and Event Data Project) (35MB)

This data set contains entries for each violent event in Africa from 1997 - 2014. This data set would be a good candidate for visualization with a linked timeline and choropleth map, where selections in the timeline can drive the filtering of data shown on the map.

timestamp: T
(latitude, longitude): G
country: G
number of fatalities: Q

Safecast (3.2GB)

Grassroots sensor data about nuclear radiation in Japan

Statistical Computing Statistical Graphics Data expo Airline on-time performance (12GB)

A great data set for scalability testing. This is the data set used in the Crossfilter Demo.

The GDELT Data Set (~100GB)

This would be a great data set for more extreme scalability testing. There is an Open Source project for loading this data set into Spark on AWS.

The Indian Census has lots of public data.

Best Buy has a developer portal for querying their data via a Web API.

https://github.com

Something went wrong with that request. Please try again. You signed in with another tab or window. [|Reload] to refresh your session.You signed out in another tab or window. [|Reload] to refresh your session.

Historique

Activer la pagination rows per page

Avancé

Information	Version
ven. 15 de Jan, 2016 15h34 ggrefens from 129.175.15.11	4	Afficher
ven. 15 de Jan, 2016 15h33 ggrefens from 129.175.15.11	3	Afficher
ven. 15 de Jan, 2016 15h31 ggrefens from 129.175.15.11	2	Afficher
ven. 15 de Jan, 2016 15h28 ggrefens from 129.175.15.11	1	Afficher