partial_dependence#

sklearn.inspection.partial_dependence(estimator, X, features, *, sample_weight=None, categorical_features=None, feature_names=None, response_method='auto', percentiles=(0.05, 0.95), grid_resolution=100, method='auto', kind='average')[源代码]#

部分依赖 features .

特征（或一组特征）的部分依赖性对应于特征每个可能值的估计器的平均响应。

阅读更多的 User Guide .

警告

为 GradientBoostingClassifier 和 GradientBoostingRegressor ， 'recursion' 方法（默认使用）不会考虑 init 助推过程的预测器。在实践中，这将产生与 'brute' 目标响应中的恒定偏差，前提是 init 是一个常数估计器（这是默认值）。但如果 init 不是常数估计量，部分依赖值不正确 'recursion' 因为偏移将是样本相关的。优选使用 'brute' 法请注意，这仅适用于 GradientBoostingClassifier 和 GradientBoostingRegressor ，不是 HistGradientBoostingClassifier 和 HistGradientBoostingRegressor .

参数:

estimatorBaseEstimator

实现的匹配估计器对象 predict , predict_proba ，或者 decision_function .不支持多输出多类分类器。

X形状（n_samples，n_features）的{类数组、稀疏矩阵或双框架}

X 用于生成目标的值网格 features （其中将评估部分依赖性），并且当 method 是“残酷”。

features{int，url，bool}或int或url的类数组

功能（例如 [0] ）或一对交互特征（例如 [(0, 1)] ）应该为其计算部分依赖关系。

sample_weight形状类似数组（n_samples，），默认=无

Sample weights are used to calculate weighted means when averaging the model output. If None, then samples are equally weighted. If sample_weight is not None, then method will be set to 'brute'. Note that sample_weight is ignored for kind='individual'.

Added in version 1.3.

categorical_features形状（n_features，）或形状的阵列状（n_category_features，），dype ={bool，int，stra}，默认=无

指示类别特征。

None ：没有任何功能被认为是绝对的;
布尔数组类：形状的布尔屏蔽 (n_features,)
指示哪些特征是分类的。因此，这个阵列具有相同的形状 X.shape[1] ;
整型或字符串数组类：整型索引或字符串
指示类别特征。

Added in version 1.2.

feature_names形状类似数组（n_features，），dtype=str，默认值=None

每个功能的名称; feature_names[i] 包含索引的要素的名称 i .默认情况下，功能的名称对应于NumPy数组的数字索引和pandas rame的列名。

Added in version 1.2.

response_method'自动'，'预测_proba '，默认='自动'

指定是否使用 predict_proba 或 decision_function 作为目标响应。对于回归量，此参数被忽略，并且响应始终是 predict .默认情况下， predict_proba 先尝试然后我们恢复到 decision_function 如果它不存在。如果 method 是'recursion'，则响应始终是 decision_function .

percentilesfloat的数组，默认=（0.05，0.95）

用于创建网格的极端值的下限和上限百分位。必须处于 [0, 1] .

grid_resolutionint，默认=100

网格上每个目标要素的等距点的数量。

method' Auto '，' recursion '，'，默认='

用于计算平均预测的方法：

'recursion' 仅支持一些基于树的估计器（即 GradientBoostingClassifier , GradientBoostingRegressor , HistGradientBoostingClassifier , HistGradientBoostingRegressor , DecisionTreeRegressor , RandomForestRegressor ，）当 kind='average' .这在速度方面更有效。使用这种方法，分类器的目标响应始终是决策函数，而不是预测概率。以来 'recursion' 该方法通过设计隐式计算个人条件期望（ICE）的平均值，但它与ICE不兼容，因此 kind 必须 'average' .
'brute' 任何估计器都支持，但计算量更大。
'auto' ： 'recursion' 用于支持它的估计器，并且 'brute' 以其他方式使用。如果 sample_weight 不 None 那么 'brute' 无论估计者如何都使用。

请参阅 this note 之间的差 'brute' 和 'recursion' 法

kind'average'，'individual'，'both'}，default='average'

是返回数据集中所有样本的平均部分依赖性，还是每个样本的一个值，还是返回两者。请参阅下面的退货。

请注意，快 method='recursion' 选项仅适用于 kind='average' 和 sample_weights=None .计算个体依赖关系并进行加权平均需要使用较慢的 method='brute' .

Added in version 0.24.

返回:

predictions : Bunch群

类似字典的对象，具有以下属性。

个人形状的nd数组（n_outputs，n_instance， len（值 [0] ）、len（值 [1] ), ...): X中所有样本的网格中所有点的预测。这也称为个人条件期望（ICE）。仅在以下情况下可用 kind='individual' 或 kind='both' .
平均形状的nd数组（n_outputs，len（values [0] ), len（值 [1] ), ...): 网格中所有点的预测值，在X中的所有样本上取平均值（或在训练数据上取平均值， method 是“回归”）。仅在以下情况下可用 kind='average' 或 kind='both' .
grid_values1D ndarray的序列: 创建网格所使用的值。生成的网格是中阵列的旋转产物 grid_values 哪里 len(grid_values) == len(features) .每个数组的大小 grid_values[j] 要么是 grid_resolution ，或中的唯一值的数量 X[:, j] ，以较小者为准。

Added in version 1.3.

n_outputs 对应于多类设置中的类数，或多输出回归的任务数。对于经典回归和二元分类 n_outputs==1 . n_values_feature_j 对应于尺寸 grid_values[j] .

参见

PartialDependenceDisplay.from_estimator: 情节部分依赖。
PartialDependenceDisplay: 部分依赖可视化。

示例

>>> X = [[0, 0, 2], [1, 0, 0]]
>>> y = [0, 1]
>>> from sklearn.ensemble import GradientBoostingClassifier
>>> gb = GradientBoostingClassifier(random_state=0).fit(X, y)
>>> partial_dependence(gb, features=[0], X=X, percentiles=(0, 1),
...                    grid_resolution=2)
(array([[-4.52...,  4.52...]]), [array([ 0.,  1.])])