员工一言不合就离职怎么办?我用Python写了个员工流失预测模型( 三 )


然后使用网格搜索进行参数调优 。
parameters = {'splitter':('best','random'),'criterion':("gini","entropy"),"max_depth":[*range(1, 20)],}clf = DecisionTreeClassifier(random_state=25)GS = GridSearchCV(clf, parameters, cv=10)GS.fit(X_train, y_train)print(GS.best_params_)print(GS.best_score_)
{'criterion': 'gini', 'max_depth': 15, 'splitter': 'best'}0.9800813177648042
使用最优的模型重新评估训练集和测试集效果:
train_pred = GS.best_estimator_.predict(X_train)test_pred = GS.best_estimator_.predict(X_test)print('训练集:', classification_report(y_train, train_pred))print('-' * 60) print('测试集:', classification_report(y_test, test_pred))
训练集:precisionrecallf1-scoresupport01.001.001.00914211.000.990.992857accuracy1.0011999macro avg1.000.991.0011999weighted avg1.001.001.0011999------------------------------------------------------------测试集:precisionrecallf1-scoresupport00.990.980.99228610.950.970.96714accuracy0.983000macro avg0.970.980.973000weighted avg0.980.980.983000
可见在最优模型下模型效果有较大提升,1类的F1-score训练集的分数为0.99,测试集分数为0.96 。
随机森林
下面使用集成算法随机森林进行模型建置,并调整参数 。
rf_model = RandomForestClassifier(n_estimators=1000, oob_score=True, n_jobs=-1, random_state=0)parameters = {'max_depth': np.arange(3, 17, 1) }GS = GridSearchCV(rf_model, param_grid=parameters, cv=10)GS.fit(X_train, y_train)print(GS.best_params_) print(GS.best_score_)
{'max_depth': 16}0.988582151793161
train_pred = GS.best_estimator_.predict(X_train)test_pred = GS.best_estimator_.predict(X_test)print('训练集:', classification_report(y_train, train_pred))print('-' * 60) print('测试集:', classification_report(y_test, test_pred))
训练集:precisionrecallf1-scoresupport01.001.001.00914211.000.990.992857accuracy1.0011999macro avg1.001.001.0011999weighted avg1.001.001.0011999------------------------------------------------------------测试集:precisionrecallf1-scoresupport00.991.000.99228610.990.970.98714accuracy0.993000macro avg0.990.990.993000weighted avg0.990.990.993000
可以看到在调优之后的随机森林模型中,1类的F1-score训练集的分数为0.99,测试集分数为0.98 。
模型后续可优化方向:
获取完整代码+数据:
扫描下方公众号,回复关键字“离职”,获取数据和代码吧 。

员工一言不合就离职怎么办?我用Python写了个员工流失预测模型

文章插图