データセット解説#

データセット一覧#

Dataset

Description

Associated Tasks

Target Column

Number of Columns

Number of Rows

gluon

AutoGluon Example Dataset

Binary/Multiclass classification

class (binary), occupation (multiclass)

15

39,073 (train), 9,769 (test)

bank_marketing

Bank Marketing Dataset.
The goal is to predict if the client will subscribe a term deposit.

Binary classification

y

21

28,831 (train), 12,357(test)

vehicle_coupon

Vehicle Coupon Recommendation Dataset.
Recommend a coupon to driver on different senarios.

Multiclass classification

coupon

26

8,878 (train), 3,806 (test)

online_retail

Online Retail Transactional Dataset.
The goal is to predict LTV score for each customer.

Regression (CLTV prediction), RFM

cltv

11

2,230 (train), 956 (test)

telco_churn

Telco Churn Event Dataset

Binary classification (Churn prediction)

churn

21

4,930 (train), 2,113 (test)

california_house

House Price Dataset of California.
The task is predicting house prices

Regression

median_house_value

10

14,448 (train), 6,192 (test)

transition_matrix

Sample Transition Dataset of Web Access.
The task is to analyze web access transitions.

Network Analysis

-

3

12

ts_airline

Time Series Airline Passenger Dataset.
The task is to forecast the number of passengers.

Timeseries Forecasting (Univariate)

number_of_airline_passengers

2

100 (train), 44 (test)

m4

Quartierly Time Series of M4 Dataset

Timeseries Forecasting (Multivariate)

v7 (or any v?)

867

33,600 (train), 14,400 (test)

nba

Next Best Action Dataset

Next Best Action

-

6

43,196 (train), 12,829 (test)

mta

DP6 Dataset for Marketing Attribution Models

Multi-Touch Attribution

-

4

500,000

dermatology

Dermatology Diseases Dataset.
The task for this dataset is to determine 6 types of Eryhemato-Squamous Disease.

Multi-class classification, Clustering

class

35

366

creditcard

Credit Card Fraud Dataset.
Anonymized credit card transactions labeled as fraudulent or genuine.

Binary classification (Fraud detection)

fraud

29

199,364 (train), 85,443 (test)

cluto

Cluto Dataset for Clustering

Clustering

class

3

10,000

covtype

Forest Cover Type Dataset.
Classification of pixels into 7 forest cover types.

Multiclass classification

target

55

406,708 (train),174,304(test)

20newsgroups

20 Newsgroup Documents Dataset.
This data set comes from data in 20 different newsgroups.

Multiclass classification

target

301

11,314 (train), 7,532 (test)
4,871 (inbalanced train)

cosmetics_store

Cometics Shop E-Commerce Events History Dataset

RFM analysis, Clustering

-

5

1,287,007