Data Sets for Analytics

Posted on Updated on

When working with analytics, in whatever flavor, one of the key things you need is some data. But data comes in many different shapes and sizes, but where can you get some useful data, be it transactional, time-series, meta-data, analytical, master, categorical, numeric, regression, clustering, etc.

Many of the popular analytics languages have some data sets built into them. For example the R language comes pre-loaded with data sets and these can be accessed using

data()

but many of the R packages also come with data sets.

Similarly if you are using Python, it comes with some pre-loaded data sets and similarly many of the Python libraries have data sets build into them. For example scikit learn.

from sklearn import datasets

But where else can you get data sets. There are lots and lots of website available with data sets and the list could be very long. The following is a list of, what I consider, the websites with the best data sets.

Kaggle

Crowd Analytix

Zindi

Alibaba Cloud

Amazon Open Data

UCI Machine Learning Repository

Google Search Engine

Google Open Images Data

Google Fiance

Microsoft Open Data

Oracle Open Data Repository

Awesome Public Datasets Collection

EU Open Data

US Government Data

US Census Bureau

Ireland Open Data

Northern Ireland Public Open Data

UK Open Data

UK Data Services data sets

Image Processing Data

Carnegie Mellon University Data Sets

World Bank Open Data

IMF Open Data

Movie Reviews Data Set

Amazon Reviews

Amazon public data sets

IMDb Datasets

Github List of Public Data Sets

Computer Vision data sets

Boston Housing Data Set and from here

ODSC – 25 picks of open data sets

NHS Open Data Sets – including prescriptions issued in England

Open Air Quality Datasets

Advertisement