Data Sets for Analytics
When working with analytics, in whatever flavor, one of the key things you need is some data. But data comes in many different shapes and sizes, but where can you get some useful data, be it transactional, time-series, meta-data, analytical, master, categorical, numeric, regression, clustering, etc.
Many of the popular analytics languages have some data sets built into them. For example the R language comes pre-loaded with data sets and these can be accessed using
data()
but many of the R packages also come with data sets.
Similarly if you are using Python, it comes with some pre-loaded data sets and similarly many of the Python libraries have data sets build into them. For example scikit learn.
from sklearn import datasets
But where else can you get data sets. There are lots and lots of website available with data sets and the list could be very long. The following is a list of, what I consider, the websites with the best data sets.
UCI Machine Learning Repository
Awesome Public Datasets Collection
Northern Ireland Public Open Data
Carnegie Mellon University Data Sets
Github List of Public Data Sets
Boston Housing Data Set and from here
ODSC – 25 picks of open data sets
NHS Open Data Sets – including prescriptions issued in England