版本0.22#

有关该版本主要亮点的简短描述，请参阅 sphx_glr_auto_examples_release_highlights_plot_release_highlights_0_22_0.py .

换象传说

Major Feature 一些你以前做不到的大事。
Feature 一些你以前做不到的事情。
Efficiency 现有功能现在可能不需要那么多的计算或内存。
Enhancement 一个杂七杂八的小改进。
Fix 以前没有按照记录或合理预期发挥作用的事情现在应该起作用了。
API Change 您需要更改您的代码才能在将来产生相同的效果;或者将来将删除某个功能。

版本0.22.2.post1#

March 3 2020

0.22.2.post1版本包括源发行版的打包修复，但包的内容在其他方面与0.22.2版本中的轮子内容相同（没有.post1后缀）。两者都包含以下更改。

Changelog#

`sklearn.impute`#

Efficiency 减少 impute.KNNImputer 通过分块成对距离计算的渐进内存使用。 #16397 通过 Joel Nothman .

`sklearn.metrics`#

Fix 修复了中的一个错误 metrics.plot_roc_curve 其中估计器的名称被传递到 metrics.RocCurveDisplay 而非参数 name .调用时会出现不同的情节 metrics.RocCurveDisplay.plot 在以后的时间里。 #16500 通过 Guillaume Lemaitre .
Fix 修复了中的一个错误 metrics.plot_precision_recall_curve 其中估计器的名称被传递到 metrics.PrecisionRecallDisplay 而非参数 name .调用时会出现不同的情节 metrics.PrecisionRecallDisplay.plot 在以后的时间里。 #16505 通过 Guillaume Lemaitre .

`sklearn.neighbors`#

Fix 修复了将数组列表转换为2-D对象数组而不是包含NumPy数组的1-D数组的错误。这个错误正在影响 neighbors.NearestNeighbors.radius_neighbors . #16076 通过 Guillaume Lemaitre 和 Alex Shacked .

版本0.22.1#

January 2 2020

这是一个错误修复版本，主要解决0.22.0版中的一些打包问题。它还包括小的文档改进和一些错误修复。

Changelog#

`sklearn.cluster`#

Fix cluster.KMeans 与 algorithm="elkan" 现在使用与默认相同的停止条件 algorithm="full" . #15930 通过 @inder128 .

`sklearn.inspection`#

Fix inspection.permutation_importance 应该返回同样的 importances when a random_state is given for both n_jobs=1 or n_jobs>1 both with shared memory backends (thread-safety) and isolated memory, process-based backends. Also avoid casting the data as object dtype and avoid read-only error on large dataframes with n_jobs>1 as reported in #15810. Follow-up of #15898 by Shivam Gargsya. #15933 by Guillaume Lemaitre and Olivier Grisel .
Fix inspection.plot_partial_dependence and inspection.PartialDependenceDisplay.plot now consistently checks the number of axes passed in. #15760 by Thomas Fan .

`sklearn.metrics`#

Fix metrics.plot_confusion_matrix now raises error when normalize is invalid. Previously, it runs fine with no normalization. #15888 by Hanmin Qin .
Fix metrics.plot_confusion_matrix now colors the label color correctly to maximize contrast with its background. #15936 by Thomas Fan 和 @DizietAsahi .
Fix metrics.classification_report 不再忽视 zero_division 关键字参数。 #15879 通过 Bibhash Chandra Mitra .
Fix 修复了中的一个错误 metrics.plot_confusion_matrix 正确通过 values_format 参数对 metrics.ConfusionMatrixDisplay plot（）调用。 #15937 通过 Stephen Blystone .

`sklearn.model_selection`#

Fix model_selection.GridSearchCV 和 model_selection.RandomizedSearchCV 接受中提供的纯量值 fit_params . 0.22中的更改打破了向后兼容性。 #15863 通过 Adrin Jalali 和 Guillaume Lemaitre .

`sklearn.naive_bayes`#

Fix 移除 abstractmethod decorator for the method _ check_X ' in `naive_bayes.BaseNB 这可能会破坏从这个废弃的公共Base类继承的下游项目。 #15996 通过 Brigitta Sipőcz .

`sklearn.preprocessing`#

Fix preprocessing.QuantileTransformer 现在保证 quantiles_ 属性将以非递减方式完全排序。 #15751 通过 Tirth Patel .

`sklearn.semi_supervised`#

Fix semi_supervised.LabelPropagation 和 semi_supervised.LabelSpreading 现在允许可调用的内核函数返回稀疏权重矩阵。 #15868 通过 Niklas Smedemark-Margulies .

`sklearn.utils`#

Fix utils.check_array 现在可以正确地将带有布尔列的pandas DataFrame转换为浮点数。 #15797 通过 Thomas Fan .
Fix utils.validation.check_is_fitted 接受一个明确的 attributes 参数，用于检查特定属性作为匹配估计量的显式标记。当没有明确的 attributes 则只有以强调线结束且不以双强调线开始的属性才被用作“匹配”标记。的 all_or_any 论点也不再被反对。进行此更改是为了恢复与0.21版中此实用程序行为的某些向后兼容性。 #15947 通过 Thomas Fan .

版本0.22.0#

December 3 2019

网站更新#

Our website 进行了改造，焕然一新。 #14849 通过 Thomas Fan .

公共API的清晰定义#

Scikit-learn有一个公共API和一个私有API。

我们尽最大努力不破坏公共API，只引入不需要任何用户操作的向后兼容更改。然而，如果不可能做到这一点，对公共API的任何更改都会受到两个小版本的弃用周期的影响。私有API没有公开文档，也不受任何弃用周期的影响，因此用户不应依赖其稳定性。

如果函数或对象记录在 API Reference 以及是否可以使用导入路径导入，而不需要前置强调线。例如 sklearn.pipeline.make_pipeline 公开，而 sklearn.pipeline._name_estimators is private. sklearn.ensemble._gb.BaseEnsemble is private too because the whole `_ GB '模块是私有的。

在0.22之前，有些工具实际上是公共的（没有突出强调），而它们本来就应该是私有的。在0.22版本中，这些工具已被适当地私有化，并且公共API空间已被清理。此外，现在不建议从大多数子模块导入：例如，您应该使用 from sklearn.cluster import Birch 而不是 from sklearn.cluster.birch import Birch (in练习， birch.py 已经移动到 _birch.py ).

备注

公共API中的所有工具都应记录在 API Reference .如果您发现API引用中没有的公共工具（没有前置强调线），这意味着它应该是私有的或有文档的。请通过打开一期让我们知道！

这项工作被跟踪 issue 9250 和 issue 12927 .

亵渎：使用 `FutureWarning` 从现在开始#

当反对某个功能时，早期版本的scikit-learn用于引发 DeprecationWarning .以来 DeprecationWarnings Python默认不显示，scikit-learn需要求助于自定义警告过滤器来始终显示警告。该过滤器有时会干扰用户自定义警告过滤器。

从0.22版本开始，scikit-learn将显示 FutureWarnings 对于反对意见， as recommended by the Python documentation . FutureWarnings Python默认情况下总是显示，因此自定义过滤器已被删除，scikit-learn不再阻碍用户过滤器。 #15080 通过 Nicolas Hug .

更改型号#

以下估计量和函数在与相同的数据和参数进行匹配时，可能会产生与之前版本不同的模型。这种情况通常是由于建模逻辑（错误修复或增强）或随机抽样过程的变化而发生的。

cluster.KMeans 当 n_jobs=1 . Fix
decomposition.SparseCoder, decomposition.DictionaryLearning, and decomposition.MiniBatchDictionaryLearning Fix
decomposition.SparseCoder with algorithm='lasso_lars' Fix
decomposition.SparsePCA 哪里 normalize_components 由于弃用，没有任何效果。
ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor Fix , Feature , Enhancement .
impute.IterativeImputer 当 X 具有没有缺失值的功能。 Feature
linear_model.Ridge 当 X 是稀疏的 Fix
model_selection.StratifiedKFold 或使用任何 cv=int 一个分类器。 Fix
cross_decomposition.CCA 当使用scipy >= 1.3时 Fix

详细信息见下面的变更日志。

（虽然我们试图通过提供这些信息来更好地告知用户，但我们不能保证此列表是完整的。

Changelog#

`sklearn.base`#

API Change 从0.24版本开始 base.BaseEstimator.get_params 对于估计器构造函数中但未存储为实例上的属性的参数，将引发一个TARete错误，而不是返回无。 #14464 通过 Joel Nothman .

`sklearn.calibration`#

Fix 修复了一个错误 calibration.CalibratedClassifierCV 当给予一个时失败 sample_weight 值类型为 list (in的情况 sample_weights 包装的估计器不支持）。 #13575 通过 William de Vazelhes .

`sklearn.cluster`#

Feature cluster.SpectralClustering 现在接受预先计算的稀疏邻居图作为输入。 #10482 通过 Tom Dupre la Tour 和 Kumar Ashutosh .
Enhancement cluster.SpectralClustering 现在接受 n_components 参数.此参数扩展 SpectralClustering 要匹配的类功能 cluster.spectral_clustering . #13726 通过 Shuzhe Xiao .
Fix 修复了一个错误， cluster.KMeans 产生了不一致的结果 n_jobs=1 和 n_jobs>1 由于随机状态的处理。 #9288 通过 Bryan Yang .
Fix 修复了一个错误， elkan 算法在 cluster.KMeans 由于整数索引溢出，在大型数组上产生分段错误。 #15057 通过 Vladimir Korolev .
Fix MeanShift 现在接受 max_iter 默认值为300，而不是始终使用默认值300。它现在还暴露了一个 n_iter_ 指示对每个种子执行的最大迭代次数。 #15120 通过 Adrin Jalali .
Fix cluster.AgglomerativeClustering 和 cluster.FeatureAgglomeration 现在提出错误，如果 affinity='cosine' 和 X 具有全零样本。 #7943 通过 @mthorrell .

`sklearn.compose`#

Feature 添加 compose.make_column_selector 其被用于 compose.ColumnTransformer 根据名称和数据类型选择DataFrame列。 #12303 通过 Thomas Fan .
Fix 修复了中的一个错误 compose.ColumnTransformer 使用布尔列表时无法选择正确的列，NumPy早于1.12。 #14510 通过 Guillaume Lemaitre .
Fix 修复了中的一个错误 compose.TransformedTargetRegressor 没有通过 **fit_params 到基础回归量。 #14890 通过 Miguel Cabrera .
Fix 的 compose.ColumnTransformer 现在要求功能数量之间保持一致 fit and transform. A FutureWarning is raised now, and this will raise an error in 0.24. If the number of features isn't consistent and negative indexing is used, an error is raised. #14544 by Adrin Jalali .

`sklearn.cross_decomposition`#

Feature cross_decomposition.PLSCanonical 和 cross_decomposition.PLSRegression 有新功能 inverse_transform 将数据转换到原始空间。 #15304 通过 Jaime Ferrando Huertas .
Enhancement decomposition.KernelPCA 现在正确检查求解器找到的特征值是否存在数值或条件问题。这确保了求解器之间结果的一致性（不同的选择 eigen_solver ），包括近似求解器，例如 'randomized' 和 'lobpcg' （见 #12068 ). #12145 通过 Sylvain Marié
Fix Fixed a bug where cross_decomposition.PLSCanonical and cross_decomposition.PLSRegression were raising an error when fitted with a target matrix Y in which the first column was constant. #13609 by Camila Williamson.
Fix cross_decomposition.CCA 现在scipy 1.3和之前的scipy版本会产生相同的结果。 #15661 通过 Thomas Fan .

`sklearn.datasets`#

Feature datasets.fetch_openml 现在通过设置使用pandas支持异类数据 as_frame=True. #13902 by Thomas Fan .
Feature datasets.fetch_openml 现在包括 target_names in the returned Bunch. #15160 by Thomas Fan .
Enhancement 参数 return_X_y 添加至 datasets.fetch_20newsgroups 和 datasets.fetch_olivetti_faces . #14259 通过 Sourav Singh .
Enhancement datasets.make_classification 现在接受类似数组的 weights 参数，即list或numpy.Array，而不是仅list。 #14764 通过 Cat Chenal .
Enhancement 参数 normalize 添加至
datasets.fetch_20newsgroups_vectorized . #14740 通过 Stéphan Tulkens
Fix 修复了中的一个错误 datasets.fetch_openml ，无法加载包含忽略功能的OpenML数据集。 #14623 通过 Sarra Habchi .

`sklearn.decomposition`#

Efficiency decomposition.NMF 与 solver="mu" 现在，在稀疏输入矩阵上进行装配，以避免简单地分配具有大小（#非零元素，n_components）的数组。 #15257 通过 Mart Willocx .
Enhancement decomposition.dict_learning 和 decomposition.dict_learning_online 现在接受 method_max_iter and pass it to decomposition.sparse_encode. #12650 by Adrin Jalali .
Enhancement decomposition.SparseCoder , decomposition.DictionaryLearning ，而且 decomposition.MiniBatchDictionaryLearning 现在来 transform_max_iter parameter and pass it to either decomposition.dict_learning or decomposition.sparse_encode. #12650 by Adrin Jalali .
Enhancement decomposition.IncrementalPCA 现在接受稀疏矩阵作为输入，将它们批量转换为密集矩阵，从而避免了一次存储整个密集矩阵的需要。 #13960 通过 Scott Gigante .
Fix decomposition.sparse_encode 现在经过 max_iter to the underlying linear_model.LassoLars when algorithm='lasso_lars'. #12650 by Adrin Jalali .

`sklearn.dummy`#

Fix dummy.DummyClassifier 现在处理检查多输出情况下提供的常数是否存在。 #14908 通过 Martina G. Vilas .
API Change 的默认值 strategy parameter in dummy.DummyClassifier will change from 'stratified' in version 0.22 to 'prior' in 0.24. A FutureWarning is raised when the default value is used. #15382 by Thomas Fan .
API Change 的 outputs_2d_ 属性在中被废弃 dummy.DummyClassifier 和 dummy.DummyRegressor .就相当于 n_outputs > 1 . #14933 通过 Nicolas Hug

`sklearn.ensemble`#

Major Feature Added ensemble.StackingClassifier and ensemble.StackingRegressor to stack predictors using a final classifier or regressor. #11047 by Guillaume Lemaitre and Caio Oliveira and #15138 by Jon Cusick..
Major Feature 进行了许多改进， ensemble.HistGradientBoostingClassifier 和 ensemble.HistGradientBoostingRegressor :
- Feature 估计器现在天生支持具有缺失值的密集数据以进行训练和预测。他们还支持无限的价值观。 #13911 和 #14406 通过 Nicolas Hug , Adrin Jalali 和 Olivier Grisel .
- Feature 估算者现在有一个额外的 warm_start 启用热启动的参数。 #14012 通过 Johann Faouzi .
- Feature inspection.partial_dependence 和 inspection.plot_partial_dependence now support the fast 'recursion' method for both estimators. #13769 by Nicolas Hug .
- Enhancement 为 ensemble.HistGradientBoostingClassifier 现在在逐类分层子样本上监控训练损失或分数，以保持原始训练集的类别平衡。 #14194 通过 Johann Faouzi .
- Enhancement ensemble.HistGradientBoostingRegressor 现在支持“least_absolute_deflition”损失。 #13896 通过 Nicolas Hug .
- Fix 估计器现在将训练和验证数据分开分类，以避免任何数据泄露。 #13933 通过 Nicolas Hug .
- Fix 修复了提前停止会破坏字符串目标的错误。 #14710 通过 Guillaume Lemaitre .
- Fix ensemble.HistGradientBoostingClassifier 如果 categorical_crossentropy 对于二元分类问题给出了损失。 #14869 通过 Adrin Jalali .
请注意，0.21中的泡菜在0.22中将不起作用。
Enhancement 添加 max_samples 参数允许将引导样本的大小限制为小于数据集的大小。添加到 ensemble.RandomForestClassifier , ensemble.RandomForestRegressor , ensemble.ExtraTreesClassifier , ensemble.ExtraTreesRegressor . #14682 通过 Matt Hancock 和 #5963 通过 Pablo Duboue .
Fix ensemble.VotingClassifier.predict_proba 将不再存在时 voting='hard'. #14287 by Thomas Fan .
Fix 的 named_estimators_ attribute in ensemble.VotingClassifier and ensemble.VotingRegressor now correctly maps to dropped estimators. Previously, the named_estimators_ mapping was incorrect whenever one of the estimators was dropped. #15375 by Thomas Fan .
Fix 默认运行 utils.estimator_checks.check_estimator 在两 ensemble.VotingClassifier 和 ensemble.VotingRegressor .它可以解决有关形状一致性的问题 predict which was failing when the underlying estimators were not outputting consistent array dimensions. Note that it should be replaced by refactoring the common tests in the future. #14305 by Guillaume Lemaitre .
Fix ensemble.AdaBoostClassifier 与文献中一样，根据决策函数计算概率。因此， predict and predict_proba give consistent results. #14114 by Guillaume Lemaitre .
Fix 堆叠和投票估计器现在确保其基础估计器要么是所有分类器，要么是所有回归器。 ensemble.StackingClassifier , ensemble.StackingRegressor ，而且 ensemble.VotingClassifier 和 ensemble.VotingRegressor 现在引发一致的错误消息。 #15084 通过 Guillaume Lemaitre .
Fix ensemble.AdaBoostRegressor 其中损失应仅通过具有非零权重的样本的最大值进行标准化。 #14294 通过 Guillaume Lemaitre .
API Change presort 现已在 ensemble.GradientBoostingClassifier 和 ensemble.GradientBoostingRegressor ，且参数没有影响。建议用户使用 ensemble.HistGradientBoostingClassifier 和 ensemble.HistGradientBoostingRegressor 而不是. #14907 通过 Adrin Jalali .

`sklearn.feature_extraction`#

Enhancement 如果参数选择意味着在调用fit（）方法时将不使用另一个参数，则将发出警告 feature_extraction.text.HashingVectorizer , feature_extraction.text.CountVectorizer 和 feature_extraction.text.TfidfVectorizer . #14602 通过 Gaurav Chawla .
Fix 创建的函数 build_preprocessor 和 build_analyzer 的 feature_extraction.text.VectorizerMixin 现在可以腌制了。 #14430 通过 Dillon Niederhut .
Fix feature_extraction.text.strip_accents_unicode 现在可以从NFKD规范化形式的字符串中正确删除口音。 #15100 通过 Daniel Grady .
Fix 修复了导致 feature_extraction.DictVectorizer to raise an OverflowError 期间 transform 生产时的操作 scipy.sparse 大型输入数据上的矩阵。 #15463 通过 Norvan Sahiner .
API Change 废弃未使用 copy 参数 feature_extraction.text.TfidfVectorizer.transform 它将在v0.24中被删除。 #14520 通过 Guillem G. Subies .

`sklearn.feature_selection`#

Enhancement 更新了以下 sklearn.feature_selection 估计器允许NaN/Inf值 transform 和 fit : feature_selection.RFE , feature_selection.RFECV , feature_selection.SelectFromModel ，而且 feature_selection.VarianceThreshold .请注意，如果特征选择器的基本估计器不允许NaN/Inf，那么它仍然会出错，但特征选择器本身不再不必要地强制执行此限制。 #11635 通过 Alec Peters .
Fix 修复了一个错误， feature_selection.VarianceThreshold 与 threshold=0 由于数字不稳定性，没有通过使用范围而不是方差来删除恒定特征。 #13704 通过 Roddy MacSween .

`sklearn.gaussian_process`#

Feature 结构化数据上的高斯过程模型： gaussian_process.GaussianProcessRegressor 和 gaussian_process.GaussianProcessClassifier 现在可以接受通用对象列表（例如字符串、树、图形等）为 X 对其训练/预测方法的争论。应该提供用户定义的内核来计算通用对象之间的内核矩阵，并且应该继承自 gaussian_process.kernels.GenericKernelMixin 通知GMS/GCP模型它处理非载体样本。 #15557 通过 Yu-Hang Tang .
Efficiency gaussian_process.GaussianProcessClassifier.log_marginal_likelihood 和 gaussian_process.GaussianProcessRegressor.log_marginal_likelihood 现在接受 clone_kernel=True 关键字参数。如果设置为 False ，内核属性已被修改，但可能会导致性能改进。 #14378 通过 Masashi Shibata .
API Change 从0.24版本开始 gaussian_process.kernels.Kernel.get_params 将引发 AttributeError 而不是返回 None 对于位于估计器构造函数中但未存储为实例上的属性的参数。 #14464 通过 Joel Nothman .

`sklearn.impute`#

Major Feature 添加 impute.KNNImputer ，使用k-最近邻居来估算缺失值。 #12852 通过 Ashim Bhattarai 和 Thomas Fan 和 #15010 通过 Guillaume Lemaitre .
Feature impute.IterativeImputer 有新 skip_compute 默认情况下为False的标志，如果为True，则将跳过在拟合阶段没有缺失值的特征的计算。 #13773 通过 Sergey Feldman .
Efficiency impute.MissingIndicator.fit_transform 避免重复计算掩蔽矩阵。 #14356 通过 Harsh Soni .
Fix impute.IterativeImputer 现在当只有一个功能时有效。通过 Sergey Feldman .
Fix 修复了中的一个错误 impute.IterativeImputer 其中要素以相反的期望顺序估算 imputation_order 要么 "ascending" 或 "descending" . #15393 通过 Venkatachalam N .

`sklearn.inspection`#

Major Feature inspection.permutation_importance 已添加以衡量任意训练模型中每个特征相对于给定评分函数的重要性。 #13146 通过 Thomas Fan .
Feature inspection.partial_dependence 和 inspection.plot_partial_dependence now support the fast 'recursion' method for ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor. #13769 by Nicolas Hug .
Enhancement inspection.plot_partial_dependence has been extended to now support the new visualization API described in the User Guide. #14646 by Thomas Fan .
Enhancement inspection.partial_dependence 接受熊猫数据框架和 pipeline.Pipeline 含有 compose.ColumnTransformer .此外 inspection.plot_partial_dependence will use the column names by default when a dataframe is passed. #14028 and #15429 by Guillaume Lemaitre .

`sklearn.kernel_approximation`#

Fix 修复了一个错误， kernel_approximation.Nystroem 提出 KeyError 当使用 kernel="precomputed" . #14706 通过 Venkatachalam N .

`sklearn.linear_model`#

Efficiency “自由线性”逻辑回归求解器现在更快，需要更少的内存。 #14108 , #14170 , #14296 通过 Alex Henrie .
Enhancement linear_model.BayesianRidge 现在接受超参数 alpha_init 和 lambda_init 其可用于设置最大化过程的初始值， fit . #13618 通过 Yoshihiro Uchida .
Fix linear_model.Ridge 现在可以正确地匹配拦截时 X 是稀疏的， solver="auto" 和 fit_intercept=True ，因为此配置中的默认求解器已更改为 sparse_cg ，它可以用稀疏数据来匹配拦截。 #13995 通过 Jérôme Dockès .
Fix linear_model.Ridge 与 solver='sag' now accepts F-ordered and non-contiguous arrays and makes a conversion instead of failing. #14458 by Guillaume Lemaitre .
Fix linear_model.LassoCV 不再强制 precompute=False 在装配最终模型时。 #14591 通过 Andreas Müller .
Fix linear_model.RidgeCV 和 linear_model.RidgeClassifierCV 现在正确得分时 cv=None . #14864 通过 Venkatachalam N .
Fix 修复了中的一个错误 linear_model.LogisticRegressionCV 其中 scores_ , n_iter_ 和 coefs_paths_ 属性的顺序会错误 penalty='elastic-net' . #15044 通过 Nicolas Hug
Fix linear_model.MultiTaskLassoCV 和 linear_model.MultiTaskElasticNetCV 具有d类型int的X和 fit_intercept=True . #15086 通过 Alex Gramfort .
Fix Liblinear求解器现在支持 sample_weight . #15038 通过 Guillaume Lemaitre .

`sklearn.manifold`#

Feature manifold.Isomap , manifold.TSNE ，而且 manifold.SpectralEmbedding 现在接受预先计算的稀疏邻居图作为输入。 #10482 通过 Tom Dupre la Tour 和 Kumar Ashutosh .
Feature 曝光了 n_jobs 参数 manifold.TSNE 用于邻居图的多核计算。此参数在以下情况下没有影响 metric="precomputed" 或 (metric="euclidean" 和 method="exact" ). #15082 通过 Roman Yurchak .
Efficiency 的效率得到提高 manifold.TSNE 当 method="barnes-hut" 通过并行计算梯度。 #13213 通过 Thomas Moreau
Fix 修复了一个错误， manifold.spectral_embedding (and因此 manifold.SpectralEmbedding 和 cluster.SpectralClustering ）计算出错误的特征值 eigen_solver='amg' 当 n_samples < 5 * n_components . #14647 通过 Andreas Müller .
Fix 修复了中的一个错误 manifold.spectral_embedding 用于 manifold.SpectralEmbedding 和 cluster.SpectralClustering 哪里 eigen_solver="amg" 有时会导致LinAlg错误。 #13393 通过 Andrew Knyazev #13707 通过 Scott White
API Change 弃用 training_data_ 中未使用的属性 manifold.Isomap . #10482 通过 Tom Dupre la Tour .

`sklearn.metrics`#

Major Feature metrics.plot_roc_curve has been added to plot roc curves. This function introduces the visualization API described in the User Guide. #14357 by Thomas Fan .
Feature 添加了新参数 zero_division 多个分类指标： metrics.precision_score , metrics.recall_score , metrics.f1_score , metrics.fbeta_score , metrics.precision_recall_fscore_support , metrics.classification_report .这允许为定义不清的指标设置返回值。 #14900 通过 Marc Torrellas Socastro .
Feature 添加了 metrics.pairwise.nan_euclidean_distances 度量，它在存在缺失值的情况下计算欧几里得距离。 #12852 通过 Ashim Bhattarai 和 Thomas Fan .
Feature 新的排名指标 metrics.ndcg_score 和 metrics.dcg_score 已添加以计算贴现累积收益和标准化贴现累积收益。 #9951 通过 Jérôme Dockès .
Feature metrics.plot_precision_recall_curve has been added to plot precision recall curves. #14936 by Thomas Fan .
Feature metrics.plot_confusion_matrix has been added to plot confusion matrices. #15083 by Thomas Fan .
Feature 添加了多类支持 metrics.roc_auc_score 有相应的评分者 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_ovr_weighted', and 'roc_auc_ovo_weighted'. #12789 and #15274 by Kathy Chen, Mohamed Maskani, and Thomas Fan .
Feature 添加 metrics.mean_tweedie_deviance 测量给定的Tweedie偏差 power 参数.还添加平均Poisson偏差 metrics.mean_poisson_deviance 平均伽玛偏差 metrics.mean_gamma_deviance 这是Tweedie异常行为的特殊情况 power=1 和 power=2 分别 #13938 通过 Christian Lorentzen 和 Roman Yurchak .
Efficiency Improved performance of metrics.pairwise.manhattan_distances in the case of sparse matrices. #15049 by Paolo Toccaceli <ptocca>.
Enhancement 参数 beta 在 metrics.fbeta_score 已更新为接受零， float('+inf') 值 #13231 通过 Dong-hee Na .
Enhancement 添加参数 squared 在 metrics.mean_squared_error 返回均方误差。 #13467 通过 Urvang Patel .
Enhancement 在没有真阳性的情况下，允许计算平均指标。 #14595 通过 Andreas Müller .
Enhancement 多标签指标现在支持列表列表作为输入。 #14865 Srivatsan Ramesh , Herilalaina Rakotoarison , Léonard Binet .
Enhancement metrics.median_absolute_error 现在支持 multioutput 参数. #14732 通过 Agamemnon Krasoulis .
Enhancement 'roc_auc_ovr_weighted'和'roc_auc_ovo_weighted'现在可以用作 scoring 模型选择工具的参数。 #14417 通过 Thomas Fan .
Enhancement metrics.confusion_matrix 接受参数 normalize 允许按列、行或整体规范化混淆矩阵。 #15625 通过 Guillaume Lemaitre <glemaitre> .
Fix 在中引发值错误 metrics.silhouette_score 当预先计算的距离矩阵包含非零对角线条目时。 #12258 通过 Stephen Tierney .
API Change scoring="neg_brier_score" 应该使用而不是 scoring="brier_score_loss" 现已废弃。 #14898 通过 Stefan Matcovici .

`sklearn.model_selection`#

Efficiency 提高了多指标评分的性能 model_selection.cross_validate , model_selection.GridSearchCV ，而且 model_selection.RandomizedSearchCV . #14593 通过 Thomas Fan .
Enhancement model_selection.learning_curve 现在接受参数 return_times 它可用于检索计算时间，以绘制模型可扩展性（请参阅learning_curve示例）。 #13938 通过 Hadrien Reboul .
Enhancement model_selection.RandomizedSearchCV 现在接受参数分布列表。 #14549 通过 Andreas Müller .
Fix 重新实现 model_selection.StratifiedKFold 修复一个测试集可能存在的问题 n_classes larger than another. Test sets should now be near-equally sized. #14704 by Joel Nothman .
Fix 的 cv_results_ attribute of model_selection.GridSearchCV and model_selection.RandomizedSearchCV now only contains unfitted estimators. This potentially saves a lot of memory since the state of the estimators isn't stored. ##15096 by Andreas Müller .
API Change model_selection.KFold 和 model_selection.StratifiedKFold 现在提出警告，如果 random_state 已经确定， shuffle 是假的。这将在0.24中产生错误。

`sklearn.multioutput`#

Fix multioutput.MultiOutputClassifier 现在有属性 classes_ . #14629 通过 Agamemnon Krasoulis .
Fix multioutput.MultiOutputClassifier 现在有 predict_proba 作为财产，可以检查 hasattr . #15488 #15490 通过 Rebekah Kim

`sklearn.naive_bayes`#

Major Feature 添加 naive_bayes.CategoricalNB 它实现了分类朴素Bayes分类器。 #12569 通过 Tim Bicker 和 Florian Wilhelm .

`sklearn.neighbors`#

Major Feature 添加 neighbors.KNeighborsTransformer 和 neighbors.RadiusNeighborsTransformer ，它将输入数据集转换为稀疏邻居图。它们可以更好地控制最近邻居的计算，并实现简单的管道缓存以供多次使用。 #10482 通过 Tom Dupre la Tour .
Feature neighbors.KNeighborsClassifier , neighbors.KNeighborsRegressor , neighbors.RadiusNeighborsClassifier , neighbors.RadiusNeighborsRegressor ，而且 neighbors.LocalOutlierFactor 现在接受预先计算的稀疏邻居图作为输入。 #10482 通过 Tom Dupre la Tour 和 Kumar Ashutosh .
Feature neighbors.RadiusNeighborsClassifier 现在支持通过使用预测概率 predict_proba 并支持更多outlier_Label选项：“most_frequent”，或用于多输出的不同outlier_Label。 #9597 通过 Wenbo Zhao .
Efficiency 效率提高 neighbors.RadiusNeighborsClassifier.predict . #9597 通过 Wenbo Zhao .
Fix neighbors.KNeighborsRegressor 现在抛出错误时 metric='precomputed' 并适合非平方数据。 #14336 通过 Gregory Dexter .

`sklearn.neural_network`#

Feature 添加 max_fun 参数 neural_network.BaseMultilayerPerceptron , neural_network.MLPRegressor ，而且 neural_network.MLPClassifier 控制不满足的功能评估的最大数量 tol 改进. #9274 通过 Daniel Perry .

`sklearn.pipeline`#

Enhancement pipeline.Pipeline 现在支持 score_samples 如果最终估计器这样做的话。 #13806 通过 Anaël Beaugnon .
Fix 的 fit in FeatureUnion now accepts fit_params to pass to the underlying transformers. #15119 by Adrin Jalali .
API Change None as a transformer is now deprecated in pipeline.FeatureUnion. Please use 'drop' instead. #15053 by Thomas Fan .

`sklearn.preprocessing`#

Efficiency preprocessing.PolynomialFeatures 现在，当输入数据密集时，速度会更快。 #13290 通过 Xavier Dupré .
Enhancement 安装预处理器时避免不必要的数据复制 preprocessing.StandardScaler , preprocessing.MinMaxScaler , preprocessing.MaxAbsScaler , preprocessing.RobustScaler 和 preprocessing.QuantileTransformer 这会导致性能略有改进。 #13987 通过 Roman Yurchak .
Fix KernelCenterer现在在非方适应时会抛出错误 preprocessing.KernelCenterer #14336 通过 Gregory Dexter .

`sklearn.model_selection`#

Fix model_selection.GridSearchCV 和 model_selection.RandomizedSearchCV now supports the _ 成对'属性，它可以防止具有成对输入的估计量在交叉验证期间出现错误（例如 :class:`neighbors.KNeighborsClassifier 当 metric 设置为“预先计算的”）。 #13925 通过 Isaac S. Robson 和 #15524 通过 Xun Tang .

`sklearn.svm`#

Enhancement svm.SVC 和 svm.NuSVC 现在接受 break_ties 参数.此参数导致 predict 根据信心值打破联系 decision_function ，如果 decision_function_shape='ovr' ，并且目标类数量> 2。 #12557 通过 Adrin Jalali .
Enhancement 当出现以下情况时，支持者估计器现在会抛出更具体的误差 kernel='precomputed' 并适合非平方数据。 #14336 通过 Gregory Dexter .
Fix svm.SVC , svm.SVR , svm.NuSVR 和 svm.OneClassSVM 当收到参数的负值或零时 sample_weight 在方法fit（）中生成了无效模型。这种行为仅发生在某些边境场景中。现在，在这些情况下，fit（）将失败并出现异常。 #14286 通过 Alex Shacked .
Fix 的 n_support_ attribute of svm.SVR and svm.OneClassSVM was previously non-initialized, and had size 2. It has now size 1 with the correct value. #15099 by Nicolas Hug .
Fix 修复了一个错误 BaseLibSVM._sparse_fit 其中n_SV=0引发了ZeroDivisionError。 #14894 通过 Danna Naser .
Fix Liblinear求解器现在支持 sample_weight . #15038 通过 Guillaume Lemaitre .

`sklearn.tree`#

Feature 添加最小的成本复杂性修剪，由控制 ccp_alpha ，到 tree.DecisionTreeClassifier , tree.DecisionTreeRegressor , tree.ExtraTreeClassifier , tree.ExtraTreeRegressor , ensemble.RandomForestClassifier , ensemble.RandomForestRegressor , ensemble.ExtraTreesClassifier , ensemble.ExtraTreesRegressor , ensemble.GradientBoostingClassifier ，而且 ensemble.GradientBoostingRegressor . #12887 通过 Thomas Fan .
API Change presort 现已在 tree.DecisionTreeClassifier 和 tree.DecisionTreeRegressor ，且参数没有影响。 #14907 通过 Adrin Jalali .
API Change 的 classes_ 和 n_classes_ 属性 tree.DecisionTreeRegressor 现在已被废弃。 #15028 通过 Mei Guan , Nicolas Hug ，而且 Adrin Jalali .

`sklearn.utils`#

Feature check_estimator 现在可以通过设置 generate_only=True. Previously, running check_estimator will stop when the first check fails. With generate_only=True, all checks can run independently and report the ones that are failing. Read more in 滚动您自己的估计器. #14381 by Thomas Fan .
Feature 添加了一个pytest特定的装饰者， parametrize_with_checks ，以参数化估计器检查的估计器列表。 #14381 通过 Thomas Fan .
Feature 新的随机变量， utils.fixes.loguniform implements a log-uniform random variable (e.g., for use in RandomizedSearchCV). For example, the outcomes 1, 10 and 100 are all equally likely for loguniform(1, 100). See #11232 by Scott Sievert and Nathaniel Saul, and SciPy PR 10815 .
Enhancement utils.safe_indexing (now deprecated) accepts an axis parameter to index array-like across rows and columns. The column indexing can be done on NumPy array, SciPy sparse matrix, and Pandas DataFrame. An additional refactoring was done. #14035 and #14475 by Guillaume Lemaitre .
Enhancement utils.extmath.safe_sparse_dot 工作在3D+ ndray和稀疏矩阵之间。 #14538 通过 Jérémie du Boisberranger .
Fix utils.check_array 现在正在引发错误，而不是将NaN转换为integer。 #14872 通过 Roman Yurchak .
Fix utils.check_array 现在将正确检测pandas收件箱中的数字数据类型，修复了一个错误 float32 感到沮丧 float64 不必要的。 #15094 通过 Andreas Müller .
API Change 以下实用程序已被废弃，现在是私有的：
- choose_check_classifiers_labels
- enforce_estimator_tags_y
- mocking.MockDataFrame
- mocking.CheckingClassifier
- optimize.newton_cg
- random.random_choice_csc
- utils.choose_check_classifiers_labels
- utils.enforce_estimator_tags_y
- utils.optimize.newton_cg
- utils.random.random_choice_csc
- utils.safe_indexing
- utils.mocking
- utils.fast_dict
- utils.seq_dataset
- utils.weight_vector
- utils.fixes.parallel_helper （已删除）
- 所有 utils.testing 除了 all_estimators 这是现在在 utils .

`sklearn.isotonic`#

Fix 修复了一个错误， isotonic.IsotonicRegression.fit 出现错误时 X.dtype == 'float32' 和 X.dtype != y.dtype . #14902 通过 Lucas .

杂项#

Fix 端口 lobpcg from SciPy which implement some bug fixes but only available in 1.3+. #13609 and #14971 by Guillaume Lemaitre .
API Change Scikit-learn现在将实现鸭子数组的任何输入数据结构转换为numpy数组（使用 __array__ ）确保行为一致，而不是依赖 __array_function__ （见 NEP 18 ). #14702 通过 Andreas Müller .
API Change 将手动检查替换为 check_is_fitted .使用非匹配估计量时产生的误差现在更加均匀。 #13013 通过 Agamemnon Krasoulis .

估计器检查的更改#

这些变化主要影响库开发人员。

预计估算者将筹集 NotFittedError 如果 predict 或 transform 被传唤到 fit ;以前 AttributeError 或 ValueError 可以接受。 #13013 一股是由 Agamemnon Krasoulis .
现在估计器检查中支持仅二进制分类器。此类分类器需要具有 binary_only=True estimator tag. #13875 by Trevor Stephens .
预计估计者将转换输入数据 (X , y , sample_weights ）到 numpy.ndarray 从不打电话 __array_function__ 基于传递的原始数据类型（请参阅 NEP 18 ). #14702 通过 Andreas Müller .
requires_positive_X 估计器标签（对于要求X为非负的模型）现在由 utils.estimator_checks.check_estimator 以确保如果X包含一些负项，会引发正确的错误消息。 #14680 通过 Alex Gramfort .
添加了成对估计量在非平方数据上引起误差的检查 #14336 通过 Gregory Dexter .
添加了两个常见的多输出估计器测试 utils.estimator_checks.check_classifier_multioutput 和 utils.estimator_checks.check_regressor_multioutput . #13392 通过 Rok Mihevc .
Fix 添加 check_transformer_data_not_an_array 检查缺失的地方
Fix 估计器标记分辨率现在遵循常规MRO。它们过去只能被重写一次。 #14884 通过 Andreas Müller .

代码和文档贡献者

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.21, including:

Aaron Alphonsus, Abbie Popa, Abdur-Rahmaan Janhangeer, abenbihi, Abhinav Sagar, Abhishek Jana, Abraham K. Lagat, Adam J. Stewart, Aditya Vyas, Adrin Jalali, Agamemnon Krasoulis, Alec Peters, Alessandro Surace, Alexandre de Siqueira, Alexandre Gramfort, alexgoryainov, Alex Henrie, Alex Itkes, alexshacked, Allen Akinkunle, Anaël Beaugnon, Anders Kaseorg, Andrea Maldonado, Andrea Navarrete, Andreas Mueller, Andreas Schuderer, Andrew Nystrom, Angela Ambroz, Anisha Keshavan, Ankit Jha, Antonio Gutierrez, Anuja Kelkar, Archana Alva, arnaudstiegler, arpanchowdhry, ashimb9, Ayomide Bamidele, Baran Buluttekin, barrycg, Bharat Raghunathan, Bill Mill, Biswadip Mandal, blackd0t, Brian G. Barkley, Brian Wignall, Bryan Yang, c56pony, camilaagw, cartman_nabana, catajara, Cat Chenal, Cathy, cgsavard, Charles Vesteghem, Chiara Marmo, Chris Gregory, Christian Lorentzen, Christos Aridas, Dakota Grusak, Daniel Grady, Daniel Perry, Danna Naser, DatenBergwerk, David Dormagen, deeplook, Dillon Niederhut, Dong-hee Na, Dougal J. Sutherland, DrGFreeman, Dylan Cashman, edvardlindelof, Eric Larson, Eric Ndirangu, Eunseop Jeong, Fanny, federicopisanu, Felix Divo, flaviomorelli, FranciDona, Franco M. Luque, Frank Hoang, Frederic Haase, g0g0gadget, Gabriel Altay, Gabriel do Vale Rios, Gael Varoquaux, ganevgv, gdex1, getgaurav2, Gideon Sonoiya, Gordon Chen, gpapadok, Greg Mogavero, Grzegorz Szpak, Guillaume Lemaitre, Guillem García Subies, H4dr1en, hadshirt, Hailey Nguyen, Hanmin Qin, Hannah Bruce Macdonald, Harsh Mahajan, Harsh Soni, Honglu Zhang, Hossein Pourbozorg, Ian Sanders, Ingrid Spielman, J-A16, jaehong park, Jaime Ferrando Huertas, James Hill, James Myatt, Jay, jeremiedbb, Jérémie du Boisberranger, jeromedockes, Jesper Dramsch, Joan Massich, Joanna Zhang, Joel Nothman, Johann Faouzi, Jonathan Rahn, Jon Cusick, Jose Ortiz, Kanika Sabharwal, Katarina Slama, kellycarmody, Kennedy Kang'ethe, Kensuke Arai, Kesshi Jordan, Kevad, Kevin Loftis, Kevin Winata, Kevin Yu-Sheng Li, Kirill Dolmatov, Kirthi Shankar Sivamani, krishna katyal, Lakshmi Krishnan, Lakshya KD, LalliAcqua, lbfin, Leland McInnes, Léonard Binet, Loic Esteve, loopyme, lostcoaster, Louis Huynh, lrjball, Luca Ionescu, Lutz Roeder, MaggieChege, Maithreyi Venkatesh, Maltimore, Maocx, Marc Torrellas, Marie Douriez, Markus, Markus Frey, Martina G. Vilas, Martin Oywa, Martin Thoma, Masashi SHIBATA, Maxwell Aladago, mbillingr, m-clare, Meghann Agarwal, m.fab, Micah Smith, miguelbarao, Miguel Cabrera, Mina Naghshhnejad, Ming Li, motmoti, mschaffenroth, mthorrell, Natasha Borders, nezar-a, Nicolas Hug, Nidhin Pattaniyil, Nikita Titov, Nishan Singh Mann, Nitya Mandyam, norvan, notmatthancock, novaya, nxorable, Oleg Stikhin, Oleksandr Pavlyk, Olivier Grisel, Omar Saleem, Owen Flanagan, panpiort8, Paolo, Paolo Toccaceli, Paresh Mathur, Paula, Peng Yu, Peter Marko, pierretallotte, poorna-kumar, pspachtholz, qdeffense, Rajat Garg, Raphaël Bournhonesque, Ray, Ray Bell, Rebekah Kim, Reza Gharibi, Richard Payne, Richard W, rlms, Robert Juergens, Rok Mihevc, Roman Feldbauer, Roman Yurchak, R Sanjabi, RuchitaGarde, Ruth Waithera, Sackey, Sam Dixon, Samesh Lakhotia, Samuel Taylor, Sarra Habchi, Scott Gigante, Scott Sievert, Scott White, Sebastian Pölsterl, Sergey Feldman, SeWook Oh, she-dares, Shreya V, Shubham Mehta, Shuzhe Xiao, SimonCW, smarie, smujjiga, Sönke Behrends, Soumirai, Sourav Singh, stefan-matcovici, steinfurt, Stéphane Couvreur, Stephan Tulkens, Stephen Cowley, Stephen Tierney, SylvainLan, th0rwas, theoptips, theotheo, Thierno Ibrahima DIOP, Thomas Edwards, Thomas J Fan, Thomas Moreau, Thomas Schmitt, Tilen Kusterle, Tim Bicker, Timsaur, Tim Staley, Tirth Patel, Tola A, Tom Augspurger, Tom Dupré la Tour, topisan, Trevor Stephens, ttang131, Urvang Patel, Vathsala Achar, veerlosar, Venkatachalam N, Victor Luzgin, Vincent Jeanselme, Vincent Lostanlen, Vladimir Korolev, vnherdeiro, Wenbo Zhao, Wendy Hu, willdarnell, William de Vazelhes, wolframalpha, xavier dupré, xcjason, x-martian, xsat, xun-tang, Yinglr, yokasre, Yu-Hang "Maxin" Tang, Yulia Zamriy, Zhao Feng

版本0.22#

版本0.22.2.post1#

Changelog#

版本0.22.1#

Changelog#

版本0.22.0#

网站更新#

公共API的清晰定义#

亵渎：使用 FutureWarning 从现在开始#

更改型号#

Changelog#

杂项#

估计器检查的更改#

亵渎：使用 `FutureWarning` 从现在开始#