Public data resources: research-quality, free data mining data sets
All datasets with keywordsSearch entries: |
Text mining` article` api` text` corpus` newspaper |
|
Information Extraction: The RISE Repository of Information Sources |
Text mining` information` text mining` extraction` reviews` jobs |
Text mining` links` text mining` books` rdf` ocr` documents |
|
Text mining` api` blog` comments` text mining` stream` trends` backtype` queryminer |
|
Free book usage data from the University of Huddersfield » “Self-plagiarism is style” |
Text mining` books` library` borrowing` recommender` isbn` recommendation` collaborative` filtering |
ICWSM 2009 – International AAAI Conference on Weblogs and Social Media |
Text mining` blog` crawl` corpus` network` web` link, data mining data sets |
Change.gov: The Obama-Biden Transition Team | Join the Discussion: Healthcare |
Text mining` textmining` opinion` comment` topic` government` queryminer |
Opinion Extraction, Opinion Mining, Sentiment Analysis, Summarization of Customer Reviews |
Text mining` sentiment` mining` classification` machine learning` reviews` recommender` text mining` links |
Text mining` wikipedia` named entity` tagged` text ming |
|
Text mining` django` wikipedia` compressed` text mining` howto |
|
Text mining` reddit` api` json` |
|
Text mining` phising` corpus` text` email` text mining` nlp` mail` security |
|
Text mining` wikipedia` hadoop` textmining` links |
|
Text mining` question` answering` trec` nlp` machinelearning |
|
The New York Times Annotated Corpus « YooName – named entity recognition |
Text mining` named entity` nytimes` corpus` people` organizations` locations |
Text mining` named entity` location` place names` geo` nlp` natural language processing |
|
Text mining` book` data` wiki` via:jhammerb |
|
Text mining` faq` question_answering` questions` web` crawl` corpus` xml` textmining |
|
Wikipedia:Lists of common misspellings/For machines – Wikipedia, the free encyclopedia |
Text mining` spelling` mispelling` wikipedia, data mining data sets |
Business and Finance` finance` api` social` kiva` microlending` lending |
|
Business and Finance` visualization` retail` finance` gis` map` location` store` via:magnetbox |
|
Business and Finance` finance` commercial` consumer` mint` spending |
|
Best Buy Remix – Welcome to the Best Buy Remix Developer Network |
Business and Finance` retail` data` api` product` bestbuy |
Behavioral Targeting, Analytics and Advertising Service for Publishers, Ad Networks |
Business and Finance` analytics` audience` segmentation` toolbar` commercial` sem` search` advertising |
Business and Finance` ceo` compensation` pay` economics` business` labor |
|
Business and Finance` trading` finance` s` api` list |
|
Business and Finance` netflix` api` movie` mashup` netflixprize` ratings |
|
Open beats Closed: Best Buy’s new APIs – O’Reilly Radar |
Business and Finance` retail` bestbuy` api |
Business and Finance` custom` research` retail` finance` market` service` analyst` |
|
Business and Finance` retail` dillards` uark |
|
developerWorks Interviews: Massive data mining and the resurgent mainframe |
Business and Finance` price` retail` transaction` sams_club` dillards |
Business and Finance` opentick` nasdaq` finance` stock` data mining data sets |
|
Business and Finance` finance` links` sec |
|
Business and Finance` edgar` finance` sec` filing` ftp` instructions |
|
Business and Finance` investing` finance` datamining` announcement` sec` filing` links |
|
Government` un` voting` statistics` government |
|
Research Datasets :: CID Data :: Center for International Development at Harvard University (CID) |
Government` economics` international` development |
Government` government` banking` csv` tarp` bailout |
|
Government` dc` government` feeds` transparency` opendata |
|
Announcing the New York Times Campaign Finance API – Open – Code – New York Times Blog |
Government` nyt` api` campaign` donations` fec` |
Voter registration data; or, HERE IS YOUR HOPE, YOU FOOLS! « The Edge of the American West |
Government` voter` registration` politics`2008 |
import/parse/fec.py at master from aaronsw’s watchdog — GitHub |
Government` fec` python` parser` government` campaign |
Government` government` transparency` parsing` election` python |
|
Dataset of the day: Where are the Obamacans? | Off the Map – Official Blog of FortiusOne |
Government` obama` goverment` mashup` gis` geo` map` campaign` donations |
Government` cmu` politics` campaign` donations` fec` via:jhammerb` government |
|
Government` timeseries` crime` statistics` publicdata |
|
Government` voter` voting` politics` government` name` address` registration |
|
Voter List Data Files – Election Department, Clark County, Nevada |
Government` voting` voter` registration` name` address` data` election` politics |
Government` UN` publicdata` government` statistics |
|
RealClearPolitics – Election 2008 – Democratic Presidential Nomination |
Government` polls` politics |
Government` crime` fbi |
|
Daily Kos: Obama helps us track $1,000,000,000,000 of federal spending |
Government` corruption` government` politics` finance` |
Government` government` money` politics` |
|
Government` campaign` politics` elections |
|
Government` usda` economics` population` cpi` gdp` income |
|
Government` government` directory` links` wiki` states |
|
Government` economics` links |
|
Government` economics` lumber` building` materials` homedepot |
|
Government` government` bridges` safety |
|
Twitter API Wiki / REST API Documentation: Social Graph Methods |
Network Analysis` graph` network` api` social` twitter |
Network Analysis` graph` network` link` wikipedia` pagerank |
|
Network Analysis` directory` businesses` twitter` companies |
|
Massive Scrape of Twitter’s Friend Graph « blog.infochimps.org – Organizing Huge Information Sources |
Network Analysis` textmining` twitter` network` socialnetwork` pagerank` graph` queryminer |
Network Analysis` twitter` socialnetwork` graph |
|
Network Analysis` wikipedia` named_entity` rdf` ontology |
|
ICWSM 2009 – International AAAI Conference on Weblogs and Social Media |
Network Analysis` blog` crawl` corpus` network` web` link |
Network Analysis` rdf` movies` movie` api |
|
Network Analysis` youtube` research` crawl` socialnetwork` network` graph` web |
|
API Documentation – Twitter Development Talk | Google Groups |
Network Analysis` twitter` text` api |
Network Analysis` wireless` RF` radio` signal` dartmouth` network |
|
Network Analysis` api` yahoo` music` artists |
|
Web Analytics` web` analytics` api` traffic` advertising` demographics` lookery |
|
Spatial Analysis` gis` geo` map` mapping` images` satellite |
|
Spatial Analysis` neighborhoods` geo` gis` maps |
|
Image Analysis and Video Analysis` fmri` neuroscience` python` neuralnetwork |
|
Image Analysis and Video Analysis` face` detection` image |
|
Image Analysis and Video Analysis` facerecognition` opencv` face` links |
|
NORB Object Recognition Dataset, Fu Jie Huang, Yann LeCun, New York University |
Image Analysis and Video Analysis` image` 3d |
Image Analysis and Video Analysis` images` photo` pictures` search |
|
Image Analysis and Video Analysis` activity` recognition` intent |
|
Image Analysis and Video Analysis` facerecognition` face` image` recognition |
|
Image Analysis and Video Analysis` images` audio` publicdata` maps` video` free |
|
Image Analysis and Video Analysis` image` vision` recognition` |
|
Image Analysis and Video Analysis` tracking` video` detection` image` recognition` vehicle` pedestrian` |
|
Image Analysis and Video Analysis` image` recognition` detection` pedestrian` thermal` tracking` facerecognition` illumination |
|
Carnegie Mellon University – CMU Graphics Lab – motion capture library |
Image Analysis and Video Analysis` gait` pedestrian` walk` motion |
Audio Analysis` sound` publicdomain` audio |
|
Bioinformatics` fmri` neuroscience` python` neuralnetwork |
|
Medical Informatics` timeseries` machinelearning` ecg` health` medical` sleep` apnea |
|
UC Berkeley. Sheldon Margen Public Health Library. Statistical/Data Resources |
Healthcare Analytics` health` links` resources` publichealth` berkeley |
Healthcare Analytics` google` health` trends` search` prediction` epidemiology` biodefence` queries |
|
Eigenvector Research, Inc. : Data Sets Available to Download |
Chemoinformatics` NIR` spectra` chemistry` semiconductor` pharmaceutical` matlab` |
Healthcare Analytics` duplicate |
|
Healthcare Analytics` health` information` public` publicdata |
|
Healthcare Analytics` mri` cardiac |
|
Demography` aging` statistics` studies |
|
Demography` poverty` statistics |
|
Demography` internet` demographics` online` web |
|
Demography` gis` census` rdf` semantic` sparql |
|
Sports Analysis` baseball` database` publicdata` statistics` sports |
|
It’s a Pitch-by-Pitch Scouting Report, Minus the Scout – New York Times |
Sports Analysis` baseball` gameday |
Network Analysis` urban` transportation` feeds` public` sanfrancisco` bart` api` |
|
Tim Davis: UF Sparse Matrix Collection : sparse matrices from a wide range of applications |
Matrices` spare` matrix |
Pre-processing` resources` links`mapping |
|
Amazon Web Services` amazon` ebs` ec2` s3` publicdata` hadoop |
|
Hosted Datasets` amazon` ebs` publicdata |
|
Web Analytics` workshop` search` web` microsoft` log` |
|
downloading – flossmole – Google Code – How to get FLOSSmole data for your own use |
Google` opensource` project` activity` mysql` dump |
Supervised` sentiment` review` product` amazon |
|
bizzare` scifi` phrase` name` word` generators` random` perl |
|
Phrases` webservice` api` thesaurus` textmining` nlp` rest` |
|
Search Query Performance report – Google AdWords Help Center |
Performance` adwords` ppc` search` metrics` webanalytics` sem` query` queryminer |
Web Analytics` queryminer` keyword` tool` research` commercial` search` adwords |
|
Network Analysis` links` catalogs` social |
|
Audio Analysis` lidar` visualization` radiohead` google` video |
|
Image Analysis and Video Analysis` images` words` english` search` visualization` imagemap |
|
Temporal Analysis` timeseries` anomaly` detection` astronomical` physics |
|
Image Analysis and Video Analysis` visualization` community` design` processing |
|
BGN: Domestic Names – State and Topical Gazetteer Download Files |
Demography` gis` usgs |
Random` benchmark` clustering` regression` machinelearning` list` statistics` mathematics |
|
Image Analysis and Video Analysis` nonlinear` dimensionality` reduction` faces` digits` images` manifold |
|
Yahoo! Search Blog: BOSS — The Next Step in our Open Search Ecosystem |
oss` api` open` search` yahoo` BOSS` queryminer |
Download the Database – IP Address Lookup – Community Geotarget IP Project |
Network Analysis` geocoding` geoip` internet` ip` ipaddress` mysql |
Government` airline` statistics` finance` revenue` location` travel |
|
Show Us a Better Way: What public data is already available? |
Government` statistics` census` uk` school` news` publicdata |
Government` country` cities` geo |
|
Government` government` traffic` statistics` trends` transportation |
|
Government` via:inkdroid` libraries` mashup` rdf` semantic` search` semanticweb` books |
|
reddit.com: Ask Reddit: Where to download a DB dump of Reddit? |
Text mining` reddit` socialnetwork` news` web |
Text mining` collaborative` filtering` dating` rating` profiles` czech |
|
Business and Finance` predictionmarket` tool` finance` buzz` advertising` marketing` startup` mmds |
|
VGChartz.com | Video Games, Charts, News, Forums, Reviews, Wii, PS3, Xbox360, DS, PSP |
Business and Finance` sales` ranking` videogames` retail |
Business and Finance` retail` finance` sales` store` |
|
Image Analysis and Video Analysis` image` python` code` flickr` matlab` recognition |
|
Image Analysis and Video Analysis` image` recognition |
|
Network Analysis` tag` tagging` s |
|
Network Analysis` netflixprize` imdb` sparql |
|
Image Analysis and Video Analysis` machinelearning` motion` capture` sensor |
|
Text mining` api` buzz` opinion` trends` text` twitter` summize` search |
|
Image Analysis and Video Analysis` visualization` contest` scalability` motion` tracking` pedestrian` sensor |
|
Business and Finance` movie` revenue` sales` box_office` imdb` commercial` movie_study |
|
Business and Finance` movie` revenue` box_office` |
|
Live Search : xRank™ Celebrity — check out who’s hot and who’s not! |
Network Analysis` search` query` volume` trends` celebrity` prediction` buzz` named_entity |
Business and Finance` movie` revenue` timeseries` imdb` commercial` subsription |
|
Business and Finance` economics` links |
|
google` trends` search` web` analytics` api` code` python` hack |
|
google` trends` search` query` api` csv` keyword` timeseries |
|
Open Research – the Data: Lastfm-ArtistTags2007 – Duke Listens! |
last.fm` music` tagging` artists` tags` collaborative` filtering |
medical` obesity` |
|
tiger` gis` lectures |
|
geo` google` gps` location` geolocation` cell` wifi` api` gis |
|
celebrity` misspelling` spelling` names |
|
ImportGenius.com : U.S. Customs Database and Competitive Intelligence Tools |
commercial` shipping` imports` exports` finance` datamining |
betting` prediction` betfair` price` csv` predictionmarket |
|
news` text` articles` api` content` media` xml` images` publicdata |
|
scipy` python` machinelearning` statistics` resource |
|
wikipedia` pageviews` trends` textmining` seo` topic |
|
via:chl` wikipedia` web` analytics` seo` topic` textmining` traffic |
|
yahoo` geo` geocoding` location` landmarks` gis |
|
images` links` lists` archive` |
|
Yahoo offers geographic data to Web sites | Tech news blog – CNET News.com |
gis` webservice` yahoo` api` location` landmark |
query` search` log` excite` altavista` alltheweb` transaction |
|
TechTC – Technion Repository of Text Categorization Datasets |
datamining` textmining` categorization` classification` odp` directory` text |
textmining` classification` category` odp` directory |
|
FEC Election Contributions: Download Detailed Files by Election Cycle |
individual` donations` government` election` publicdata` fec |
search` statistics` keywords` analytics` api` python` web` seo` google |
|
mysql` states` countries` isocode |
|
hotels` geonames` |
|
locations` cities` countries` gis |
|
cities` gis |
|
corpus` text` similarity` terms |
|
web` crawler` bot` |
|
Data sets and corpus / corpora for biological literature and text mining |
bioinformatics` text` corpora` domainspecific` genomics` corpus` |
defect` recall` automobile` fightclub` nhtsa` saefty |
|
p2psim – kingdata : DNS server latency network distance matrices |
distance` matrix` network` p2p` dns` latency` nmf` queryminer |
pagerank` web` matrix` matlab |
|
opentick` trading` beta` feeds` finance |
|
wikipedia` xml` ec2 |
|
walmart` visualization` video` freebase` store` retail` locations` opening |
|
gis` mobile` geolocation |
|
cornell` web` archive` hadoop` crawl |
|
im2gps: estimating geographic information from a single image |
imagerecognition` via:csantos` gis` cmu` gps` imageprocessing` paper` hack` freaking_awesome |
image` video` audio` currency` sports` imagerecognition |
|
economics` list |
|
free` movie` database` netflixprize |
|
api` cogmap` person` name` organization` record_linkage |
|
retail` locations` stores |
|
record_linkage` identity` name` organization` orgchart` marketing |
|
German English Parallel Corpus “de-news”, Daily News 1996-2000 |
german` translation` corpus` english` text` via:maxme |
neuroscience` patch` clamp` recordings` neuron` timeseries` patchclamp` data` neural |
|
aggregator` links |
|
retail` clickstream` traffic` web` links` sales |
|
Dolores Labs Blog » Blog Archive » Our color names data set is online |
colormap` color` mechanicalturk |
teradata` retail` transactional` database |
|
large` competition` challenge` svm` machinelearning` scalability |
|
ECIS 2007 – The 15th European Conference on Information Systems |
retail` dillards` sams_club |
alexa` aws` web` search` api` |
|
creativecommons` court` legal` law` via:inkdroid |
|
blog` web` text |
|
Lyricsfly Lyrics API, database access to search for music artist and song title |
song` lyrics` database` api` |
99 Wikipedia Sources Aiding the Semantic Web » AI3:::Adaptive Information |
links` directory` record_linkage` extraction` wikipeida` named_entity` recognition` textmining` semanticweb |
audioscrobbler` recommendation` collaborative` filtering` music |
|
directory` rdf` semantic` data` soup` graph |
|
Free Economic Data | Economic, Financial, and Demographic Data |
finance` economics` portal` links |
machinelearning` trading` competition` backtest` matlab` code` finance` via:DeliciousRob |
|
computer` vision` image` ray` trace` fingerprint` stereo` detection` via:chl |
|
The Dataverse Network Project | The Dataverse Network Project |
statistics` repository` harvard |
harvard` repository` social` science` research` portal` links |
|
climate` temperature` netcdf |
|
MNIST handwritten digit database, Yann LeCun and Corinna Cortes |
handwriting` mnist` image` recognition |
facerecognition` face` recognition` umass` image |
|
generator` names |
|
generator` tools` list` via:jd |
|
compete` api` web` statistics` traffic` analytics` mashup |
|
peekaboom` vision` image` large` human` computation` machinelearning` recognition |
|
links` oceanography` satellite |
|
blog` ucla |
|
nlp` corpus` tagged` named_entity` recognition` list |
|
del.icio.us` |
|
finance` links |
|
wikipedia` xml` structured` corpus |
|
arxiv` api` open` paper` academic` |
|
England Football Results Betting Odds | Premiership Results & Betting Odds |
gambling` soccer` football` excel` statistics |
rna` bioinformatics` microarray` expression` gene` machinelearning |
|
bioinformatics` microarray` expression` gene` machinelearning` stanford |
|
bioinformatics` microarray` expression` gene` machinelearning |
|
bioinformatics` microarray` expression` gene` machinelearning |
|
corpus` text` legal` law` court` ruling` opensource` publicdata |
|
python` finance` edgar` pylons` matplotlib` sec` webservice` via:jolby |
|
links` statistics |
|
Text Mining, Visualization and Social Media |
crawler` blog` corpus |
facerecognition` machinelearning` face` image |
|
umd` links` statistics` government` sports` via:rickladd |
|
biology` medicine` articles` text` journal` authors |
|
music` similarity` machinelearning |
|
Internet Archive: Details: Amazon ASIN listing and similarity graph |
ASIN` amazon` recommendation` collaborative` filtering` via:keyvowel |
weather` europe` ascii` netcdf |
|
machinelearning` datamining` cmu` link` collection |
|
driving` transportation` publicdata |
|
books` sales` commercial |
|
finance` data` |
|
searchengine` search` tagging` aggregator` numeric` extraction` tables` collaboration` web2.0 |
|
textmining` open` nature` standards` search |
|
metafilter` comments` network` via:chl |
|
web` search` spam` crawler` yahoo |
|
socialnetwork` trustnetwork` trust |
|
TaskForces/CommunityProjects/LinkingOpenData/DataSets – ESW Wiki |
opendata` semantic` rdf` collaboration |
publicdata` links |
|
semanticweb` rdf` congress` politics` government |
|
networks` research` graph` tags` paper` record_linkage |
|
archive` internet` web` index` |
|
competition` machinelearning` forecasting` contest |
|
microsoft` text` paraphrase` corpus |
|
nlp` text` corpus` ngram` google` commercial` license |
|
census` names` identity` frequency` record_linkage |
|
Given Name Frequency Project: Analysis of Given Name Popularity |
name` record_linkage` text` identity` code |
enron` names` identity` text` record_linkage |
|
api` identity` people` webservice` record_linkage |
|
Name Discrimination Data Named Entity Resolution / Entity Disambiguation |
record_linkage` corpus` nlp` names |
Developers Area – eBay Market Data Documentation – eBay Market Data Documentation |
ebay` api` retail` price` code |
name` authorship` rdf` record_linkage |
|
bibliography` rdf` ontology` duplicate` name` record_linkage |
|
StrikeIron Super Data Pack Web Service 1.0 – StrikeIron Marketplace |
webservice` publicdata` datacleaning |
Duplicate Detection, Record Linkage, and Identity Uncertainty: Datasets |
duplicate` detection` record_linkage` datacleaning` text |
datacleaning` record_linkage` video` lectures` course` cornell` economics` finance` publicdata |
|
retail` overstock` sales` api` product` price` forecasting |
|
Amazon Web Services Developer Connection : Can Alexa WS provide detailed … |
finance` alexa` amazon` tech |
ebay` retail` pricing` sales` api` product |
|
face` image |
|
epidemiology` gis` health |
|
Google Trends API coming soon | Tech news blog – CNET News.com |
google` trends` api` |
social` activity` location` cell` gis |
|
machinelearning` reinforcement` agent` competition` |
|
optimization` vehicle` routing |
|
oil` energy` statistics` economics` petroleum |
|
search` pagerank` text` tags` content |
|
machinelearning` CMU` course` projects` graphicalmodel` code` paper |
|
Financial Forecast Center’s Historical Economic and Market Data |
exchangerate` dollar` economics` |
economics` indicators` time` series |
|
finance` numberpedia` mechanicalturk` textmining` statistics |
|
socialnetwork` graphs` comicbooks |
|
dictionary` words |
|
wikipedia` authorship` |
|
tools` generator |
|
recommender` collaborative` restaurant |
|
community resource guide: i’ve been here before – show me the links |
demographics` maps` gis` statistics` links |
economics` social` government` health` labor` links |
|
netflix` netflixprize` movie` index` wikipedia` |
|
paper` corpus` arXiv |
|
links` transparency` government` politics` congress` reference |
|
Technophilia: Where to find public records online – Lifehacker |
public` records` links |
corpus` email` spam` textmining |
|
enron` corpus` email` text` social` network |
|
finance` cpi` inflation` data |
|
health` gis` epidemiology` links |
|
cia` population` python` code` grep |
|
Miller Center of Public Affairs – Richard Nixon – Oval Office Recordings |
nixon` speech` tapes` audio` mp3` wav` flac |
phone` politics |
|
housing` refinance` mortgage` |
|
retail` finance` sales` sqft` |
|
retail` finance` sales` sqft |
|
retail` location` poi |
|
retail` poi` location` gis` gps |
|
retail` location` gis |
|
smallworld` networking` socialnetwork` graph |
|
collaborative` filtering` jokes |
|
video` |
|
links` finance` commercial |
|
finance` xml` edgar` sec` code` perl |
|
EDGAR` sec` mail` text |
|
finance` SEC` scrape` parse` commercial |
|
Retail and Food Services – Time Series Data/Seasonal Factors |
retail` sales` census |
categorization` textmining` detection` tools |
|
retail` sales` uk |
|
tools` generator` random |
|
consumer` data` database` api |
|
factset` finance` |
|
finance` ibes` analyst` forecast` wharton |
|
finance` |
|
yahoo` finance` stock` price` |
|
network` links |
|
statistics` labor` government` consumer |
|
housing` sales` finance |
|
ethanol` |
|
retail` finance` store` locations` gis |
|
retail` gis` store` locations |
|
Energy Information Administration – EIA – Official Energy Statistics from the U.S. Government |
finance` government` energy` historical` forecasts` fuel` oil |
links |
|
product` upc` database` |
|
crawler` benchmark` search` web` links |
|
TechTC – Technion Repository of Text Categorization Datasets |
corpus` text |
traffic` data` |
|
volume rendering |
|
vision` caltech` image recognition |
|
pedestrian` image` classification` detection |
|
finance` economics` feed` free` stock` trading` opentick` opensource |
|
textmining` corpus` concordance` wordlist` n-gram |
|
dictionary` hack` security` wordlist` password |
|
data` mysql` email` energy` text` social network |
|
blog` corpus` spam |
|
corpus` text` newsgroup |
|
crowd sourcing` image` processing` algorithm` collaborative` distributed` web2.0` code` opensource |
|
paleo climatology` climate` oceanography` coral` sponge` biology |
|
finance` economics` naics` industry` classifications |
|
democracy` web2.0` mashup` government` funding` article |
|
collaborative` wiki` government` congress` politics` elections` web2.0` directory |
|
census` data` population` statistics |
|
statistical learning` machine learning` code` R` libraries` cran` |
|
linkd` datamining` timeseries` text` extraction` socialnetwork |
|
python` visualization` library |
|
machine learning` network` graph` |
|
aol` search` |
|
python` text` |
|
corpus` nlp` machine learning` textmining |
|
video` machine learning` statistics` matrix` sampling` large` sparse` algorithm` experiment_design |
|
wikipedia` laptop` install` dump |
|
ranking` search |
|
CN710: Comparative Analysis of Learning Systems (Spring 2006) – Class Project |
machinelearning` algorithm` ogi` bu` greyhound` finance |
python` urban` software` simulation` opensource` GIS` census` |
|
wikipedia` rdf` |
|
wikipedia` rdf` tools |
|
face` algorithm` facere cognition` data` image |
|
face` seung` algorithm` recognition` image |
|
extraction` finance` semantic` semanticweb` text |
|
aol` search` video` talk` algorithm` information retrieval` datamining` machinelearning |
|
aol` search` query` analysis |
|
aol` search` query` analysis |
|
aol` search` oracle` database` code |
|
query` categorization` algorithm` google |
|
Statistical NLP / corpus-based computational linguistics resources |
corpus` machine learning` text |
text` machine learning` context` matlab |
|
machine learning` code` links |
|
pagerank` code` algorithm |
|
Official Google Research Blog: All Our N-gram are Belong to You |
linguistics` google` ngram` nlp` record_linkage |
clustering` algorithm` java` parallel |
|
blog` econometrics` finance` machine learning` math` statistics |
|
Structural Analysis of Discrete Data and Econometric Applications, |
books` econometrics` economics` finance` ebook |
Kris Brower » Archives » Google Onpage Search Results Analysis |
google` ranking` aol` search` analytics |
netflixprize` machine learning` course` |
|
matrixmarket` matrix` |
|
Estimation of mean values, covariance matrices and imputation of missing values |
imputation` matlab` missing` EM` machinelearning |
face` image |
|
subset` netflix prize` dimensionality` reduction |
|
extract` from` graphs` hack` google` trends |
|
python` processor` semantic` web` rdf |
|
link` analysis` structure` web` crawler` stanford |
|
machine` learning` matlab` python` hackers` image |
|
flight data` airplane data` weather data` airline route data` aircraft flight data` in-flight analysis` airline on-time data |
|
healthcare analytics` subjective outcomes` healthcare customer service` quality of care` patient care` patient surveys |
|
Largest collection of longitudinal hospital care data in the US |
healthcare analytics` healthcare big data` research datasets` national in-patient statistics` local healthcare statistics` in-patient statistics` hospital cost data` hospital use data |
physician visits data` doctor visit data` outpatient care data` private practice physician data` non-federal healthcare data` physician office data |
|
mnist` xml` format |
|
mnist` |
|
data` set` collaborative` filtering` datamining` books` movie |
|
movie` netflix prize` source`netflix |
|
Submissions Guidelines for the Collectorz.com Online Movie Database |
movie` source |
plot` synopsis` movie` netflix prize` prize |
|
netflixprize` prize` european` movie` revenue` |
|
mediawiki` wikipedia` import` mysql` sql |
|
“phone ***” ” address *” “e-mail” intitle:”curriculum vitae” – Google Search |
resume` google |
random` generator` database` sql |
|
Finance`Loans`business`investing |
|
spam`email`text analysis |
|
Data Sets | Pew Research Center’s Internet & American Life Project |
demography |
flickr`taxonomy`images |
|
yahoo |
|
bibliographies`text mining |
|
weblog`blog`social media`network analysis |
|
facebook`network analysis`social |
|
Amazon Web Services` amazon` ebs` ec2` s3` publicdata` hadoop |
|
human language`text mining |
|
government`finance`economy |
|
images |
|
twitter`text mining`social |
|
spider`web analytics |
|
Amazon Web Services` amazon` ebs` ec2` s3` publicdata` hadoop |
|
youtube`image analysis`video analysis |
|
face recognition`facial recognition`image analysis |
|
data repositories |
|
learning |
|
movies`video analysis`business |
|
Translation Task – EMNLP 2011 Sixth Workshop on Statistical Machine Translation |
translation`human language |
books`text mining |
|
wordnet`corpus |
|
canada`parlaiment`government`text mining |
|
CRCNS – Collaborative Research in Computational Neuroscience – Data sharing |
Bioinformatics` fmri` neuroscience` python` neuralnetwork |
usenet`text mining |
|
bioinformatics |
|
chemoinformatics |
|
algorithms |
|
genetics`bioinformatics |
|
social science |
|
business |
|
network analysis |
|
books`text mining |
|
audio analysis |
|
health informatics`bioinformatics |
|
auctions |
|
image analysis`pets`cats |
|
Click Dataset | Center for Complex Networks and Systems Research |
web analytics |
The Electric Rice Cooker — One year of deleted weibos archive |
text mining |
Registered meteorites that has impacted on Earth visualized – AnalyticBridge |
meteorites`atmosphere |
road`traffic`accidents`transportation |
|
road`traffic`accidents`transportationccidents`transportation |
|
student performance`school demographics`standardized test performance`school quality`education |
I really need a dataset about manchanical products configuration,the data set contains the performance and structure。I want to use the dataset in datamining,then finding some rules of products configuration.thanks,I am so greating if you answer me by email.
yours junyan
dear Sir/ Madam
I am an M.A. student of Shahid Beheshti university medical informatics. My
thesis is about (Cancer DataMining, Chronic Lymphocytic Leukemia (CLL)
diagnostic based on clinical registry. not genomes)… but I could not find
these dataset. I need information on sex, age, race, family history,information CBC, information Peripheral blood Smear , information Flow cytomet , 100 patients with Chronic Lymphocytic Leukemia.
I am looking forward to your reply.
Sincerely Yours
M.Sh
im looking for road accident data set anyone please send