多値分類#
データセット#
destination |
passanger |
weather |
temperature |
time_hour |
coupon |
expiration |
gender |
age |
maritalstatus |
has_children |
education |
occupation |
income |
car |
bar |
coffeehouse |
carryaway |
restaurantlessthan20 |
restaurant20to50 |
tocoupon_geq5min |
tocoupon_geq15min |
tocoupon_geq25min |
direction_same |
direction_opp |
y |
time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No Urgent Place |
Alone |
Sunny |
80 |
14:00 |
Restaurant(<20) |
2h |
Male |
21 |
Single |
0 |
Bachelors degree |
Student |
$50000 - $62499 |
less1 |
never |
4~8 |
less1 |
never |
1 |
1 |
0 |
0 |
1 |
0 |
1677465744 |
|
No Urgent Place |
Kid(s) |
Sunny |
55 |
18:00 |
Coffee House |
2h |
Male |
50plus |
Married partner |
1 |
Some college - no degree |
Computer & Mathematical |
$37500 - $49999 |
1~3 |
less1 |
4~8 |
1~3 |
1~3 |
1 |
1 |
0 |
0 |
1 |
0 |
1677465744 |
|
Work |
Alone |
Sunny |
30 |
7:00 |
Carry out & Take away |
1d |
Female |
46 |
Single |
0 |
Some college - no degree |
Office & Administrative Support |
$37500 - $49999 |
never |
1~3 |
gt8 |
4~8 |
1~3 |
1 |
1 |
0 |
0 |
1 |
1 |
1677465744 |
|
Home |
Alone |
Sunny |
80 |
18:00 |
Bar |
2h |
Female |
50plus |
Single |
0 |
Bachelors degree |
Installation Maintenance & Repair |
$25000 - $37499 |
never |
4~8 |
4~8 |
gt8 |
less1 |
1 |
0 |
0 |
1 |
0 |
0 |
1677465744 |
|
Home |
Alone |
Sunny |
55 |
18:00 |
Restaurant(20-50) |
1d |
Male |
50plus |
Unmarried partner |
0 |
Bachelors degree |
Retired |
$100000 or More |
never |
never |
1~3 |
gt8 |
never |
1 |
1 |
0 |
0 |
1 |
0 |
1677465744 |
WF の記述#
ml_datasets.vehicle_coupon_train
テーブルを利用。複数の値を持つ coupon カラムを target_column
に設定する。今回はいくつかのオプションを設定している。
coupon |
---|
Bar |
Carry out & Take away |
Coffee House |
Restaurant(20-50) |
Restaurant(<20) |
_export:
ml:
input_database: ml_datasets
output_database: ml_results
+gluon_train:
ml_train>:
notebook: gluon_train
input_table: ${ml.input_database}.vehicle_coupon_train
target_column: coupon
model_name: coupon_model
# 以下がオプション
share_model: true
export_leaderboard: ${ml.output_database}.leaderboard_vehicle_coupon_train
export_feature_importance: ${ml.output_database}.feature_importance_vehicle_coupon_train
time_limit: 3*60
+check_model_uuid:
echo>:
"model_id: ${automl.shared_model}"
+gluon_predict:
ml_predict>:
notebook: gluon_predict
model_name: coupon_model # model_name or share_model
# shared_model: b74f7422-0fc5-4a77-96d5-042d79222a83
input_table: ${ml.input_database}.vehicle_coupon_test
output_table: ${ml.output_database}.vehicle_coupon_predicted
アウトプット#
予測結果を格納するテーブル#
+gluon_predict
タスクにおいて output_table
に指定した vehicle_coupon_predicted テーブルが、予測結果を格納するテーブルとなる。
coupon |
predicted_coupon |
predicted_proba |
predicted_probabilities |
---|---|---|---|
Carry out & Take away |
Carry out & Take away |
0.9910975695 |
{“Bar”: 0.003227, “Carry out & Take away”: 0.991098, “Coffee House”: 0.001772, “Restaurant(20-50)”: 0.001264, “Restaurant(<20)”: 0.00264} |
Coffee House |
Coffee House |
0.960514605 |
{“Bar”: 0.011655, “Carry out & Take away”: 0.012714, “Coffee House”: 0.960515, “Restaurant(20-50)”: 0.004904, “Restaurant(<20)”: 0.010212} |
Carry out & Take away |
Carry out & Take away |
0.9766068459 |
{“Bar”: 0.00382, “Carry out & Take away”: 0.976607, “Coffee House”: 0.010996, “Restaurant(20-50)”: 0.003992, “Restaurant(<20)”: 0.004586} |
Coffee House |
Restaurant(20-50) |
0.4339100122 |
{“Bar”: 0.006506, “Carry out & Take away”: 0.009184, “Coffee House”: 0.408362, “Restaurant(20-50)”: 0.43391, “Restaurant(<20)”: 0.142038} |
Coffee House |
Coffee House |
0.9851641059 |
{“Bar”: 0.003208, “Carry out & Take away”: 0.003758, “Coffee House”: 0.985164, “Restaurant(20-50)”: 0.003851, “Restaurant(<20)”: 0.004019} |
coupon または predicted_coupon#
予測対象のテーブルとして +gluon_predict
タスクで input_table
に指定した telco_churn_test テーブルはすでに coupon
カラムを持っているので、予測結果は predicted_coupon
カラムとなる。
predicted_proba#
2値分類の時と異なり、predicted_coupon の値に対しての確信度の値となっている。
predicted_probabilities#
すべての値においての確信度を MAP として格納したものになるが、テーブルでは文字列型として認識されているので扱いに注意しよう。
Leaderboard Table#
生成されたモデルに関する情報の要約を leaderboard テーブルとして出力。全モデルのテストスコアと検証スコア、モデルの学習時間、推論時間、スタックレベルなどの情報を含む。
model |
score_val |
pred_time_val |
fit_time |
pred_time_val_marginal |
fit_time_marginal |
stack_level |
can_infer |
fit_order |
num_features |
num_models |
num_models_w_ancestors |
memory_size |
memory_size_w_ancestors |
memory_size_min |
memory_size_min_w_ancestors |
num_ancestors |
num_descendants |
model_type |
child_model_type |
hyperparameters |
hyperparameters_fit |
ag_args_fit |
features |
child_hyperparameters |
child_hyperparameters_fit |
child_ag_args_fit |
ancestors |
descendants |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LightGBMLarge_BAG_L1 |
-0.3866731304 |
0.02031469345 |
3.35344696 |
0.02031469345 |
3.35344696 |
1 |
TRUE |
11 |
24 |
1 |
1 |
8879884 |
8879884 |
8879884 |
8879884 |
0 |
0 |
StackerEnsembleModel |
LGBModel |
{“use_orig_features”: True, “max_base_models”: 25, “max_base_models_per_type”: 5, “save_bag_folds”: True} |
{} |
{“max_memory_usage_ratio”: 1.0, “max_time_limit_ratio”: 1.0, “max_time_limit”: None, “min_time_limit”: 0, “ignored_type_group_special”: None, “ignored_type_group_raw”: None, “get_features_kwargs”: None, “get_features_kwargs_extra”: None, “drop_unique”: False} |
[“passanger”, “tocoupon_geq15min”, “coffeehouse”, “car”, “restaurantlessthan20”, “expiration”, “y”, “tocoupon_geq25min”, “direction_same”, “age”, “maritalstatus”, “bar”, “weather”, “restaurant20to50”, “has_children”, “occupation”, “destination”, “carryaway”, “income”, “gender”, “direction_opp”, “temperature”, “time_hour”, “education”] |
{“num_boost_round”: 10000, “num_threads”: -1, “learning_rate”: 0.03, “objective”: “multiclass”, “verbose”: -1, “boosting_type”: “gbdt”, “two_round”: True, “num_leaves”: 128, “feature_fraction”: 0.9, “min_data_in_leaf”: 3} |
{“num_boost_round”: 131} |
{“max_memory_usage_ratio”: 1.0, “max_time_limit_ratio”: 1.0, “max_time_limit”: None, “min_time_limit”: 0, “ignored_type_group_special”: None, “ignored_type_group_raw”: [“object”], “get_features_kwargs”: None, “get_features_kwargs_extra”: None} |
[] |
[] |
WeightedEnsemble_L2 |
-0.4978609895 |
0.7392261028 |
193.6493859 |
0.002149343491 |
2.073774338 |
2 |
TRUE |
12 |
10 |
1 |
17 |
5644 |
61412191 |
5644 |
6688247 |
2 |
0 |
WeightedEnsembleModel |
GreedyWeightedEnsembleModel |
{“use_orig_features”: False, “max_base_models”: 25, “max_base_models_per_type”: 5, “save_bag_folds”: True} |
{} |
{“max_memory_usage_ratio”: 1.0, “max_time_limit_ratio”: 1.0, “max_time_limit”: None, “min_time_limit”: 0, “ignored_type_group_special”: None, “ignored_type_group_raw”: None, “get_features_kwargs”: None, “get_features_kwargs_extra”: None, “drop_unique”: False} |
[“XGBoost_BAG_L1/T0_4”, “XGBoost_BAG_L1/T0_0”, “XGBoost_BAG_L1/T0_1”, “LightGBM_BAG_L1/T0_3”, “XGBoost_BAG_L1/T0_3”, “LightGBM_BAG_L1/T0_1”, “LightGBM_BAG_L1/T0_0”, “LightGBM_BAG_L1/T0_2”, “XGBoost_BAG_L1/T0_2”, “LightGBM_BAG_L1/T0_4”] |
{“ensemble_size”: 100} |
{“ensemble_size”: 57} |
{“max_memory_usage_ratio”: 1.0, “max_time_limit_ratio”: 1.0, “max_time_limit”: None, “min_time_limit”: 0, “ignored_type_group_special”: None, “ignored_type_group_raw”: None, “get_features_kwargs”: None, “get_features_kwargs_extra”: None, “drop_unique”: False} |
[“LightGBM_BAG_L1/T0”, “XGBoost_BAG_L1/T0”] |
[] |
leaderboard テーブルの各項目については Leaderboard Table を参照。
Feature Importance Table#
index |
importance |
stddev |
p_value |
n |
p99_high |
p99_low |
---|---|---|---|---|---|---|
temperature |
1.092264446 |
0.01408478844 |
2.77E-05 |
3 |
1.172971846 |
1.011557046 |
tocoupon_geq15min |
0.7782094564 |
0.01481422481 |
6.04E-05 |
3 |
0.8630966074 |
0.6933223055 |
time_hour |
0.6932984171 |
0.01219396389 |
5.16E-05 |
3 |
0.7631711825 |
0.6234256518 |
expiration |
0.6192626403 |
0.009638408032 |
4.04E-05 |
3 |
0.6744917871 |
0.5640334934 |
destination |
0.597639609 |
0.01018219804 |
4.84E-05 |
3 |
0.655984733 |
0.5392944851 |