GitHub - curran/data: A collection of public data sets (as of Jan 15, 2016)
This repository
/
data
Code Issues 2 Pull requests 0 Pulse Graphs A collection of public data sets
HTML97.9%
JavaScript1.8%
Other0.3%
HTML JavaScript Other New file
Find file
HTTPS Choose a clone URL HTTPS (recommended) Clone with Git or checkout with SVN using the repository's web address. HTTPS
Learn more about clone URLs Download ZIP Branch: gh-pagesSwitch branches/tags
gh-pages master Nothing to showNothing to show
New pull request Latest commit
22c9c2b Jan 12, 2016
curran Clean up README Permalink
Failed to load latest commit information.
Rdatasets Add R datasets Oct 25, 2015
airbnb Add Airbnb data set Aug 4, 2015
all Add data from data soup meetup Aug 30, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Add pincodes dataset Dec 13, 2015
appliedPredictiveModeling Add applied predictive modeling data sets Aug 5, 2015
bokeh Add Bokeh examples Oct 25, 2015
calc Add calc s70 data Oct 30, 2015
cdc Added full table data module with unindented causes for the entire ca… Feb 19, 2014
correlatesofwar Add correlates of war data Oct 30, 2015
d3Examples Add updated unemployment timeseries data Aug 7, 2015
data.gov.in Added data sets from data.gov.in Apr 21, 2014
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Add data.gov data Aug 5, 2015
dataSoup Add data from data soup meetup Aug 30, 2015
datalibExamples Add datalib examples Aug 3, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Fixed bug where population was not showing up in CSV file for Geoname… Apr 29, 2015
dcjs Add data sets from DC.js Aug 4, 2015
dspl Added countries list from Google Apr 1, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Added Africa undernourishment data set Dec 1, 2014
fbi Add stub for FBI crime dataset Dec 7, 2015
gapminder Update README Aug 31, 2015
geonames Fixed bug where population was not showing up in CSV file for Geoname… Apr 30, 2015
integrated Added population vs. gdp data set Apr 28, 2015
ipo Add note on IPO calue column Aug 5, 2015
jsLibraries Added js lib data set Apr 1, 2015
mattermark Removed funky characters in CSV Aug 18, 2015
medicalStoreChallenge Add data from medical store challenge Aug 4, 2015
migrants Add data from data soup meetup Aug 30, 2015
motherjones Add mother jones shooting data Oct 30, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Added data about bachelors degrees fron NSF Feb 12, 2014
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Update README.md Dec 10, 2015
oecd Add house price data Aug 5, 2015
olpc Fixed image Apr 27, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Add house numbers in Montreal Aug 3, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Add processed small data sets Oct 19, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Add William Playfair trade data Sep 6, 2015
plotlyExamples Add fertility-rates-in-south data set Aug 3, 2015
senseYourCity Added unit test framework, added test for iris dataset parsing using … Jul 31, 2015
slavevoyages Add slave voyages data Oct 30, 2015
statCounter Remove .DS_Store Mac turd Aug 27, 2015
superstoreSales Added superstore sales example data Sep 2, 2014
syntagmatic Add data sets from syntagmatic Aug 4, 2015
tuskegeeInstitute Add links to references Nov 24, 2015
tweets Merge branch 'gh-pages' of github.com:curran/data into gh-pages Nov 30, 2015
uci_ml Add cleaned avian flu data Oct 28, 2015
un Add UN Data extract Oct 19, 2015
undp Add undp data sets Aug 6, 2015
unhcr Add note on refugees data Nov 23, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Add cleaned avian flu data Oct 28, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Added filtered versions for earthquake data Apr 27, 2015
util Added earthquake data Apr 27, 2015
uwdata_voyager Add Voyager data sets Aug 5, 2015
vegaExamples Add copy of vega example data sets Aug 3, 2015
w3schools Added stub for scraping w3schools browser market share data Apr 3, 2014
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Add WSJ data set Nov 14, 2015
wikibon Add Big Data Vendor data from Wikibon Aug 12, 2015
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Update sm.pop.refg_Indicator_en_csv_v2.csv Nov 20, 2015
worldFactbook Added world factbook data Aug 14, 2013
.gitignore Added unit test framework, added test for iris dataset parsing using … Aug 1, 2015
Interest Group Spending 2000-2016.csv Create Interest Group Spending 2000-2016.csv Jan 11, 2016
LICENSE Add MIT Licence for #2 Nov 13, 2015
README.md Clean up README Jan 12, 2016
package.json Added unit test framework, added test for iris dataset parsing using … Aug 1, 2015
test.js Added unit test framework, added test for iris dataset parsing using … Aug 1, 2015
README.md
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
data
A collection of public data sets for testing out visualization methods. These data sets are at various stages of preparation, some are just raw data, some are CSV files, and some are exposed as AMD modules. This collection is messy, but with some digging you may find hidden gems.
× Plugiciel désactivé
Le Plugiciel aname n'a pas pu être exécuté.
Targets for import:
Here's a listing of data sets with more detail. Columns will be marked in terms of their type for visualization, including:
Q = Quantitative, continuously varying numeric columns
T = Temporal, a timestamp
O = Ordered, distinct categories with a natural order (e.g. Low, Medium, High)
N = Nominal, distinct categories with no natural order (e.g. Ethnicity)
G = Geospatial identifiers (e.g. Country, City)
UCI Machine Learning Repository - Adult (3.8 MB)
This data set demonstrates a mix of quantitative, ordinal, and nominal columns. To analyze this data set using visualization, it would be useful to aggregate the data on the fly before visualization.
age: Q
workclass: N
education: O
education-num: Q
marital-status: N
occupation: N
relationship: N
race: N
sex: N
capital-gain: Q
capital-loss: Q
hours-per-week: Q
native-country: N
Data Canvas Sense Your City (237MB or Real-time API)
This data set contains measures collected by DIY sensor kits across several major cities
%22San Francisco%22, %22Bangalore%22, %22Boston%22, %22Geneva%22, %22Rio de Janeiro%22, %22Shanghai%22, %22Singapore%22 . There is a
visualization competition for this data set, submissions due March 20.
city: G
timestamp: T
temperature: Q
light: Q
airquality: Q
sound: Q
humidity: Q
dust: Q
Medical Store Geospatial Challenge (< 100KB)
This is a data set is small, but comes with a set of real-world questions about the data. This is also a competition, with submissions due April 25.
Referrers - Each row corresponds to information on a particular client referral source.
referrer_code: N
visit_count: Q
city — referrer city
postal_code_referrer: G
(latitude, longitude): G
Clients - Each row corresponds to a client visit to the store
client_id: N
referrer_code: N
city — referrer city
postal_code_referrer: G
(latitude, longitude): G
initial_visit_date: T
product_count: Q
UCI Machine Learning Repository - Individual household electric power consumption (20 MB)
This data set would be a great candidate to show multi-scale temporal aggregation.
timestamp: T
global_active_power: Q
global_reactive_power: Q
voltage: Q
global_intensity: Q
BrightKite User Check-ins (57.2 MB)
This data set would be a useful example for multi-scale aggregation in both space and time. This has been used as the motivating example for several Big Data visualization systems based on data cubes (
imMens: Real‐time Visual Querying of Big Data ,
Nanocubes for real-time exploration of spatiotemporal datasets ).
user-id: N
timestamp: T
(latitude, longitude): G
ACLED (Armed Conflict Location and Event Data Project) (35MB)
This data set contains entries for each violent event in Africa from 1997 - 2014. This data set would be a good candidate for visualization with a linked timeline and choropleth map, where selections in the timeline can drive the filtering of data shown on the map.
timestamp: T
(latitude, longitude): G
country: G
number of fatalities: Q
Safecast (3.2GB)
Grassroots sensor data about nuclear radiation in Japan
Statistical Computing Statistical Graphics Data expo Airline on-time performance (12GB)
A great data set for scalability testing. This is the data set used in the
Crossfilter Demo .
The GDELT Data Set (~100GB)
This would be a great data set for more extreme scalability testing. There is an
Open Source project for loading this data set into Spark on AWS.
The Indian Census has lots of public data.
Best Buy has a
developer portal for querying their data via a Web API.
https://github.com
Something went wrong with that request. Please try again. You signed in with another tab or window. [|Reload] to refresh your session.You signed out in another tab or window. [|Reload] to refresh your session.