0.21.0版(2017年10月27日)#

这是从0.20.3开始的一个主要版本，包括许多API更改、弃用、新功能、增强功能和性能改进，以及大量的错误修复。我们建议所有用户升级到此版本。

亮点包括：

与 Apache Parquet ，包括一个新的顶层 read_parquet() 函数和 DataFrame.to_parquet() 方法，请参见 here 。
面向新用户 pandas.api.types.CategoricalDtype 有关独立于数据指定类别的信息，请参见 here 。
的行为 sum and prod on all-NaN Series/DataFrames is now consistent and no longer depends on whether bottleneck 已安装，并且 sum 和 prod 在空系列上，现在返回NaN而不是0，请参见 here 。
PyPy的兼容性修复，请参见 here 。
添加到 drop ， reindex 和 rename API以使它们更一致，请参见 here 。
新方法的添加 DataFrame.infer_objects (请参阅 here )和 GroupBy.pipe (请参阅 here )。
不建议使用缺少一个或多个标签的标签列表进行索引，这将在将来的版本中引发KeyError，请参见 here 。

V0.21.0中的新特性

新功能
向后不兼容的API更改
不推荐使用
- Series.select和DataFrame.select
- Series.argmax和Series.argmin
删除先前版本的弃用/更改
性能改进
文档更改
错误修复
- 转换
- 标引
- IO
- 标绘
- 分组依据/重采样/滚动
- 稀疏
- 重塑
- 数字
- 直截了当的
- PyPy
- 其他
贡献者

新功能#

与ApacheParquet文件格式集成#

Integration with Apache Parquet, including a new top-level read_parquet() and DataFrame.to_parquet() method, see here (GH15838, GH17438).

Apache Parquet 提供一种跨语言的二进制文件格式，用于高效地读写数据帧。拼花被设计成忠实地序列化和反序列化 DataFrame S，支持所有Pandas数据类型，包括扩展数据类型，如带时区的DateTime。

此功能取决于 pyarrow 或 fastparquet 类库。有关更多详细信息，请参阅 the IO docs on Parquet 。

方法 `infer_objects` 类型转换#

这个 DataFrame.infer_objects() 和 Series.infer_objects() 添加了对对象列执行数据类型推断的方法，取代了已弃用的 convert_objects 方法。请参阅文档 here 了解更多详细信息。 (GH11221 )

此方法仅对对象列执行软转换，将Python对象转换为本机类型，但不执行任何强制转换。例如：

In [1]: df = pd.DataFrame({'A': [1, 2, 3],
   ...:                    'B': np.array([1, 2, 3], dtype='object'),
   ...:                    'C': ['1', '2', '3']})
   ...: 

In [2]: df.dtypes
Out[2]: 
A     int64
B    object
C    object
Length: 3, dtype: object

In [3]: df.infer_objects().dtypes
Out[3]: 
A     int64
B     int64
C    object
Length: 3, dtype: object

请注意，该栏 'C' 未转换-仅标量数值类型将转换为新类型。其他类型的转换应使用 to_numeric() 函数(或 to_datetime() ， to_timedelta() )。

In [4]: df = df.infer_objects()

In [5]: df['C'] = pd.to_numeric(df['C'], errors='coerce')

In [6]: df.dtypes
Out[6]: 
A    int64
B    int64
C    int64
Length: 3, dtype: object

改进了尝试创建列时的警告#

新用户经常被列操作和属性访问之间的关系弄糊涂 DataFrame 实例 (GH7175 )。这种困惑的一个具体例子是试图通过在 DataFrame ：

In [1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In [2]: df.two = [4, 5, 6]

这不会引发任何明显的例外，但也不会创建新的列：

In [3]: df
Out[3]:
    one
0  1.0
1  2.0
2  3.0

现在，将类似列表的数据结构设置为新属性会引发 UserWarning 关于意外行为的可能性。看见 Attribute Access 。

方法 `drop` 现在还接受索引/列关键字#

The drop() method has gained index/columns keywords as an alternative to specifying the axis. This is similar to the behavior of reindex (GH12392).

例如：

In [7]: df = pd.DataFrame(np.arange(8).reshape(2, 4),
   ...:                   columns=['A', 'B', 'C', 'D'])
   ...: 

In [8]: df
Out[8]: 
   A  B  C  D
0  0  1  2  3
1  4  5  6  7

[2 rows x 4 columns]

In [9]: df.drop(['B', 'C'], axis=1)
Out[9]: 
   A  D
0  0  3
1  4  7

[2 rows x 2 columns]

# the following is now equivalent
In [10]: df.drop(columns=['B', 'C'])
Out[10]: 
   A  D
0  0  3
1  4  7

[2 rows x 2 columns]

方法： `rename` ， `reindex` 现在还接受AXIS关键字#

这个 DataFrame.rename() 和 DataFrame.reindex() 方法已经获得了 axis 关键字以指定操作的目标轴 (GH12392 )。

这是 rename ：

In [11]: df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

In [12]: df.rename(str.lower, axis='columns')
Out[12]: 
   a  b
0  1  4
1  2  5
2  3  6

[3 rows x 2 columns]

In [13]: df.rename(id, axis='index')
Out[13]: 
                 A  B
139699223642320  1  4
139699223642352  2  5
139699223642384  3  6

[3 rows x 2 columns]

和 reindex ：

In [14]: df.reindex(['A', 'B', 'C'], axis='columns')
Out[14]: 
   A  B   C
0  1  4 NaN
1  2  5 NaN
2  3  6 NaN

[3 rows x 3 columns]

In [15]: df.reindex([0, 1, 3], axis='index')
Out[15]: 
     A    B
0  1.0  4.0
1  2.0  5.0
3  NaN  NaN

[3 rows x 2 columns]

“索引，列”的风格一如既往地发挥作用。

In [16]: df.rename(index=id, columns=str.lower)
Out[16]: 
                 a  b
139699223642320  1  4
139699223642352  2  5
139699223642384  3  6

[3 rows x 2 columns]

In [17]: df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C'])
Out[17]: 
     A    B   C
0  1.0  4.0 NaN
1  2.0  5.0 NaN
3  NaN  NaN NaN

[3 rows x 3 columns]

我们高度鼓励使用命名参数，以避免在使用这两种样式时产生混淆。

`CategoricalDtype` 用于指定类别#

pandas.api.types.CategoricalDtype has been added to the public API and expanded to include the categories and ordered attributes. A CategoricalDtype can be used to specify the set of categories and orderedness of an array, independent of the data. This can be useful for example, when converting string data to a Categorical (GH14711, GH15078, GH16015, GH17643):

In [18]: from pandas.api.types import CategoricalDtype

In [19]: s = pd.Series(['a', 'b', 'c', 'a'])  # strings

In [20]: dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)

In [21]: s.astype(dtype)
Out[21]: 
0    a
1    b
2    c
3    a
Length: 4, dtype: category
Categories (4, object): ['a' < 'b' < 'c' < 'd']

有一个地方值得特别提及，那就是 read_csv() 。以前，使用 dtype={{'col': 'category'}} ，则返回值和类别始终为字符串。

In [22]: data = 'A,B\na,1\nb,2\nc,3'

In [23]: pd.read_csv(StringIO(data), dtype={'B': 'category'}).B.cat.categories
Out[23]: Index(['1', '2', '3'], dtype='object')

请注意“Object”数据类型。

使用一个 CategoricalDtype 在所有数字、日期时间或时间增量中，我们可以自动转换为正确的类型

In [24]: dtype = {'B': CategoricalDtype([1, 2, 3])}

In [25]: pd.read_csv(StringIO(data), dtype=dtype).B.cat.categories
Out[25]: Int64Index([1, 2, 3], dtype='int64')

这些值被正确地解释为整数。

这个 .dtype 对象的属性 Categorical ， CategoricalIndex 或者是 Series 现在将返回 CategoricalDtype 。虽然REPRR已经改变了， str(CategoricalDtype()) 仍然是那根弦 'category' 。我们将借此机会提醒用户，首选检测分类数据的方法是使用 pandas.api.types.is_categorical_dtype() ，而不是 str(dtype) == 'category' 。

请参阅 CategoricalDtype docs 想要更多。

`GroupBy` 对象现在具有一个 `pipe` 方法#

GroupBy 对象现在具有一个 pipe 方法，类似于 DataFrame 和 Series ，这允许函数接受 GroupBy 以干净、易读的语法编写。 (GH17871 )

以获取有关组合的具体示例 .groupby 和 .pipe ，想象一下有一个DataFrame，其中包含商店、产品、收入和销售量的列。我们想做一个GroupWise计算价格 (即收入/数量)每个商店和每个产品。我们可以在一个多步骤的操作中实现这一点，但是用管道的方式来表达它可以使代码更具可读性。

首先，我们设置数据：

In [26]: import numpy as np

In [27]: n = 1000

In [28]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
   ....:                    'Product': np.random.choice(['Product_1',
   ....:                                                 'Product_2',
   ....:                                                 'Product_3'
   ....:                                                 ], n),
   ....:                    'Revenue': (np.random.random(n) * 50 + 10).round(2),
   ....:                    'Quantity': np.random.randint(1, 10, size=n)})
   ....: 

In [29]: df.head(2)
Out[29]: 
     Store    Product  Revenue  Quantity
0  Store_2  Product_2    32.09         7
1  Store_1  Product_3    14.20         1

[2 rows x 4 columns]

现在，要查找每个商店/产品的价格，我们只需执行以下操作：

In [30]: (df.groupby(['Store', 'Product'])
   ....:    .pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum())
   ....:    .unstack().round(2))
   ....: 
Out[30]: 
Product  Product_1  Product_2  Product_3
Store                                   
Store_1       6.73       6.72       7.14
Store_2       7.59       6.98       7.23

[2 rows x 3 columns]

请参阅 documentation 想要更多。

`Categorical.rename_categories` 接受类似DICT的#

rename_categories() 现在接受类似于dict的参数 new_categories 。之前的类别会在词典的关键字中进行查找，如果找到就会被替换。缺少密钥和额外密钥的行为与 DataFrame.rename() 。

In [31]: c = pd.Categorical(['a', 'a', 'b'])

In [32]: c.rename_categories({"a": "eh", "b": "bee"})
Out[32]: 
['eh', 'eh', 'bee']
Categories (2, object): ['eh', 'bee']

警告

为了帮助大Pandas升级， rename_categories 招待 Series 就像清单一样。通常，系列被认为是类似词典的(例如，在 .rename ， .map )。在未来的Pandas版本中 rename_categories 会变得像对待字典一样对待它们。按照警告消息的建议编写面向未来的代码。

In [33]: c.rename_categories(pd.Series([0, 1], index=['a', 'c']))
FutureWarning: Treating Series 'new_categories' as a list-like and using the values.
In a future version, 'rename_categories' will treat Series like a dictionary.
For dict-like, use 'new_categories.to_dict()'
For list-like, use 'new_categories.values'.
Out[33]:
[0, 0, 1]
Categories (2, int64): [0, 1]

其他增强功能#

新函数或新方法#

nearest() 被添加以支持最近邻上采样 (GH17496 )。
Index 添加了对 to_frame 方法 (GH15230 )。

新关键字#

添加了一个 skipna 参数设置为 infer_dtype() 支持在缺少值的情况下进行类型推断 (GH17059 )。
Series.to_dict() 和 DataFrame.to_dict() 现在支持 into 关键字，该关键字允许您指定 collections.Mapping 您希望返回的子类。默认为 dict 它是向后兼容的。 (GH16122 )
Series.set_axis() 和 DataFrame.set_axis() 现在支持 inplace 参数。 (GH14636 )
Series.to_pickle() 和 DataFrame.to_pickle() 已经获得了一个 protocol parameter (GH16252). By default, this parameter is set to HIGHEST_PROTOCOL
read_feather() 已经获得了 nthreads 用于多线程操作的参数 (GH16359 )
DataFrame.clip() 和 Series.clip() 已经获得了一个 inplace 论点。 (GH15388 )
crosstab() 已经获得了 margins_name 参数定义将在以下情况下包含总计的行/列的名称 margins=True 。 (GH15972 )
read_json() 现在接受 chunksize 在以下情况下可以使用的参数 lines=True 。如果 chunksize 现在，Read_json返回一个迭代器，该迭代器读入 chunksize 每一次迭代都有线条。 (GH17048 )
read_json() 和 to_json() 现在接受一个 compression 参数，该参数允许它们透明地处理压缩文件。 (GH17798 )

各种增强功能#

将大Pandas进口时间缩短约2.25倍。 (GH16764 )
支持 PEP 519 -- Adding a file system path protocol 在大多数阅读器上(例如 read_csv() )和作者(例如 DataFrame.to_csv() ) (GH13823 )。
添加了一个 __fspath__ 方法来执行以下操作 pd.HDFStore ， pd.ExcelFile ，以及 pd.ExcelWriter 要正确使用文件系统路径协议 (GH13823 )。
The validate argument for merge() now checks whether a merge is one-to-one, one-to-many, many-to-one, or many-to-many. If a merge is found to not be an example of specified merge type, an exception of type MergeError will be raised. For more, see here (GH16270)
Added support for PEP 518 (pyproject.toml) to the build system (GH16745)
RangeIndex.append() 现在返回一个 RangeIndex 对象(如果可能) (GH16212 )
Series.rename_axis() 和 DataFrame.rename_axis() 使用 inplace=True 现在回来吧 None 同时对轴进行原地重命名。 (GH15704 )
api.types.infer_dtype() 现在可以推断出小数。 (GH15690 )
DataFrame.select_dtypes() 现在接受INCLUDE/EXCLUDE和类似列表的标量值。 (GH16855 )
date_range() 现在接受‘AS’之外的‘YS’作为年初的别名。 (GH9313 )
date_range() 现在接受‘A’之外的‘Y’作为年终的别名。 (GH9313 )
DataFrame.add_prefix() 和 DataFrame.add_suffix() 现在接受包含‘%’字符的字符串。 (GH17151 )
推断压缩的读/写方法 (read_csv() ， read_table() ， read_pickle() ，以及 to_pickle() )现在可以从类似路径的对象进行推断，例如 pathlib.Path 。 (GH17206 )
read_sas() 现在可以识别SAS7BDAT文件中更多最常用的日期(日期时间)格式。 (GH15871 )
DataFrame.items() 和 Series.items() 现在在Python2和3中都存在，并且在所有情况下都是懒惰的。 (GH13918 ， GH17213 )
pandas.io.formats.style.Styler.where() 的实现是为了方便 pandas.io.formats.style.Styler.applymap() 。 (GH17474 )
MultiIndex.is_monotonic_decreasing() 已经实施。以前退回的 False 在所有情况下。 (GH16554 )
read_excel() 加薪 ImportError 如果有更好的信息 xlrd 未安装。 (GH17613 )
DataFrame.assign() 将保持原来的顺序 **kwargs 用于Python3.6+用户，而不是对列名排序。 (GH14207 )
Series.reindex() ， DataFrame.reindex() ， Index.get_indexer() 现在支持列表形式的论点 tolerance 。 (GH17367 )

向后不兼容的API更改#

依赖项增加了最低版本#

我们已经更新了依赖项的最低支持版本 (GH15206 ， GH15543 ， GH15214 )。如果已安装，我们现在需要：

套餐

最低版本

必填项

Numpy

1.9.0

X

Matplotlib

1.4.3

Scipy

0.14.0

瓶颈

1.0.0

套餐	最低版本	必填项
Numpy	1.9.0	X
Matplotlib	1.4.3
Scipy	0.14.0
瓶颈	1.0.0

此外，已取消对Python3.4的支持 (GH15251 )。

全NaN或空系列/数据帧的总和/生产现在是一致的NaN#

备注

此处描述的更改已部分恢复。请参阅 v0.22.0 Whatsnew 想要更多。

的行为 sum and prod on all-NaN Series/DataFrames no longer depends on whether bottleneck 已安装，并返回值 sum 和 prod On a Empty Series已更改 (GH9422 ， GH15507 )。

呼叫 sum 或 prod 在空的或全是-NaN Series ，或一列 DataFrame ，将导致 NaN 。请参阅 docs 。

In [33]: s = pd.Series([np.nan])

以前没有 bottleneck 已安装：

In [2]: s.sum()
Out[2]: np.nan

以前使用 bottleneck ：

In [2]: s.sum()
Out[2]: 0.0

新行为，不考虑瓶颈安装：

In [34]: s.sum()
Out[34]: 0.0

请注意，这还会更改空的 Series 。以前，它总是返回0，而不管 bottleneck 安装：

In [1]: pd.Series([]).sum()
Out[1]: 0

但为了与全NaN的情况保持一致，也将其更改为返回NaN：

In [35]: pd.Series([]).sum()
Out[35]: 0.0

不建议使用缺少标注的列表进行索引#

Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning NaN for missing labels. This will now show a FutureWarning. In the future this will raise a KeyError (GH15747). This warning will trigger on a DataFrame or a Series for using .loc[] or [[]] when passing a list-of-labels with at least 1 missing label. See the deprecation docs.

In [36]: s = pd.Series([1, 2, 3])

In [37]: s
Out[37]: 
0    1
1    2
2    3
Length: 3, dtype: int64

以前的行为

In [4]: s.loc[[1, 2, 3]]
Out[4]:
1    2.0
2    3.0
3    NaN
dtype: float64

当前行为

In [4]: s.loc[[1, 2, 3]]
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike

Out[4]:
1    2.0
2    3.0
3    NaN
dtype: float64

实现选择可能找不到的元素的惯用方法是通过 .reindex()

In [38]: s.reindex([1, 2, 3])
Out[38]: 
1    2.0
2    3.0
3    NaN
Length: 3, dtype: float64

找到所有关键字后的选择不变。

In [39]: s.loc[[1, 2]]
Out[39]: 
1    2
2    3
Length: 2, dtype: int64

NA命名更改#

为了促进PandasAPI之间的更多一致性，我们添加了额外的顶级函数 isna() 和 notna() 是的别名 isnull() 和 notnull() 。命名方案现在与以下方法更一致 .dropna() 和 .fillna() 。此外，在所有情况下 .isnull() 和 .notnull() 方法是定义的，这些方法具有名为 .isna() 和 .notna() ，这些都包括在课程中 Categorical ， Index ， Series ，以及 DataFrame 。 (GH15001 )。

配置选项 pd.options.mode.use_inf_as_null 已弃用，并且 pd.options.mode.use_inf_as_na 被添加为替代。

系列/索引的迭代现在将返回Python标量#

以前，当使用某些迭代方法处理 Series 使用数据类型 int 或 float ，您将收到一个 numpy 标量，例如a np.int64 ，而不是一条 Python int 。发行 (GH10904 )更正了这一点 Series.tolist() 和 list(Series) 。此更改使所有迭代方法保持一致，尤其是对于 __iter__() 和 .map() ；请注意，这只影响整型/浮点型数据类型。 (GH13236 ， GH13258 ， GH14216 )。

In [40]: s = pd.Series([1, 2, 3])

In [41]: s
Out[41]: 
0    1
1    2
2    3
Length: 3, dtype: int64

之前：

In [2]: type(list(s)[0])
Out[2]: numpy.int64

新行为：

In [42]: type(list(s)[0])
Out[42]: int

此外，这现在将正确地装箱迭代的结果 DataFrame.to_dict() 也是。

In [43]: d = {'a': [1], 'b': ['b']}

In [44]: df = pd.DataFrame(d)

之前：

In [8]: type(df.to_dict()['a'][0])
Out[8]: numpy.int64

新行为：

In [45]: type(df.to_dict()['a'][0])
Out[45]: int

使用布尔索引进行索引#

以前在传递布尔值时 Index 至 .loc ，如果 Series/DataFrame 有 boolean 标签，您将获得基于标签的选择，这可能会复制结果标签，而不是布尔索引选择(其中 True 选择元素)，这与布尔数组的索引方式不一致。新的行为类似于布尔型数字数组索引器。 (GH17738 )

以前的行为：

In [46]: s = pd.Series([1, 2, 3], index=[False, True, False])

In [47]: s
Out[47]: 
False    1
True     2
False    3
Length: 3, dtype: int64

In [59]: s.loc[pd.Index([True, False, True])]
Out[59]:
True     2
False    1
False    3
True     2
dtype: int64

当前行为

In [48]: s.loc[pd.Index([True, False, True])]
Out[48]: 
False    1
False    3
Length: 2, dtype: int64

此外，以前如果您有一个非数字的索引(例如字符串)，那么布尔索引将引发 KeyError 。现在，它将被视为布尔索引器。

以前的行为：

In [49]: s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

In [50]: s
Out[50]: 
a    1
b    2
c    3
Length: 3, dtype: int64

In [39]: s.loc[pd.Index([True, False, True])]
KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]"

当前行为

In [51]: s.loc[pd.Index([True, False, True])]
Out[51]: 
a    1
c    3
Length: 2, dtype: int64

`PeriodIndex` 重采样#

In previous versions of pandas, resampling a Series/DataFrame indexed by a PeriodIndex returned a DatetimeIndex in some cases (GH12884). Resampling to a multiplied frequency now returns a PeriodIndex (GH15944). As a minor enhancement, resampling a PeriodIndex can now handle NaT values (GH13224)

以前的行为：

In [1]: pi = pd.period_range('2017-01', periods=12, freq='M')

In [2]: s = pd.Series(np.arange(12), index=pi)

In [3]: resampled = s.resample('2Q').mean()

In [4]: resampled
Out[4]:
2017-03-31     1.0
2017-09-30     5.5
2018-03-31    10.0
Freq: 2Q-DEC, dtype: float64

In [5]: resampled.index
Out[5]: DatetimeIndex(['2017-03-31', '2017-09-30', '2018-03-31'], dtype='datetime64[ns]', freq='2Q-DEC')

新行为：

In [52]: pi = pd.period_range('2017-01', periods=12, freq='M')

In [53]: s = pd.Series(np.arange(12), index=pi)

In [54]: resampled = s.resample('2Q').mean()

In [55]: resampled
Out[55]: 
2017Q1    2.5
2017Q3    8.5
Freq: 2Q-DEC, Length: 2, dtype: float64

In [56]: resampled.index
Out[56]: PeriodIndex(['2017Q1', '2017Q3'], dtype='period[2Q-DEC]')

Upsampling and calling .ohlc() previously returned a Series, basically identical to calling .asfreq(). OHLC upsampling now returns a DataFrame with columns open, high, low and close (GH13083). This is consistent with downsampling and DatetimeIndex behavior.

以前的行为：

In [1]: pi = pd.period_range(start='2000-01-01', freq='D', periods=10)

In [2]: s = pd.Series(np.arange(10), index=pi)

In [3]: s.resample('H').ohlc()
Out[3]:
2000-01-01 00:00    0.0
                ...
2000-01-10 23:00    NaN
Freq: H, Length: 240, dtype: float64

In [4]: s.resample('M').ohlc()
Out[4]:
         open  high  low  close
2000-01     0     9    0      9

新行为：

In [57]: pi = pd.period_range(start='2000-01-01', freq='D', periods=10)

In [58]: s = pd.Series(np.arange(10), index=pi)

In [59]: s.resample('H').ohlc()
Out[59]: 
                  open  high  low  close
2000-01-01 00:00   0.0   0.0  0.0    0.0
2000-01-01 01:00   NaN   NaN  NaN    NaN
2000-01-01 02:00   NaN   NaN  NaN    NaN
2000-01-01 03:00   NaN   NaN  NaN    NaN
2000-01-01 04:00   NaN   NaN  NaN    NaN
...                ...   ...  ...    ...
2000-01-10 19:00   NaN   NaN  NaN    NaN
2000-01-10 20:00   NaN   NaN  NaN    NaN
2000-01-10 21:00   NaN   NaN  NaN    NaN
2000-01-10 22:00   NaN   NaN  NaN    NaN
2000-01-10 23:00   NaN   NaN  NaN    NaN

[240 rows x 4 columns]

In [60]: s.resample('M').ohlc()
Out[60]: 
         open  high  low  close
2000-01     0     9    0      9

[1 rows x 4 columns]

改进了pd.eval中项目分配过程中的错误处理#

eval() 现在将引发一个 ValueError 当指定了项目分配故障或就地操作，但表达式中没有项目分配时 (GH16732 )

In [61]: arr = np.array([1, 2, 3])

以前，如果您尝试以下表达式，则会得到一条帮助不大的错误消息：

In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`)
and integer or boolean arrays are valid indices

这是一种很长的说法，说明NumPy数组不支持字符串项索引。进行此更改后，错误消息现在如下所示：

In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
ValueError: Cannot assign expression output to target

过去，即使没有项目分配，也可以就地计算表达式：

In [4]: pd.eval("1 + 2", target=arr, inplace=True)
Out[4]: 3

但是，该输入没有多大意义，因为输出没有分配给目标。现在，一个 ValueError 在传入此类输入时将引发：

In [4]: pd.eval("1 + 2", target=arr, inplace=True)
...
ValueError: Cannot operate inplace if there is no assignment

数据类型转换#

以前的任务， .where() 和 .fillna() 使用一个 bool 赋值时，将强制使用相同的类型(例如int/Float)，或对类似日期的类型进行提升。现在，这些将通过以下方式保存BOOL object 数据类型。 (GH16821 )。

In [62]: s = pd.Series([1, 2, 3])

In [5]: s[1] = True

In [6]: s
Out[6]:
0    1
1    1
2    3
dtype: int64

新行为

In [63]: s[1] = True

In [64]: s
Out[64]: 
0       1
1    True
2       3
Length: 3, dtype: object

以前，将AS赋值给具有非DateTime的类DateTime会强制分配非DateTime的项 (GH14145 )。

In [65]: s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')])

In [1]: s[1] = 1

In [2]: s
Out[2]:
0   2011-01-01 00:00:00.000000000
1   1970-01-01 00:00:00.000000001
dtype: datetime64[ns]

这些现在迫使他们 object 数据类型。

In [66]: s[1] = 1

In [67]: s
Out[67]: 
0    2011-01-01 00:00:00
1                      1
Length: 2, dtype: object

Inconsistent behavior in .where() with datetimelikes which would raise rather than coerce to object (GH16402)
分配中的错误 int64 具有以下功能的数据 np.ndarray 使用 float64 DTYPE可以保留 int64 数据类型 (GH14001 )

具有单层的多索引构造函数#

这个 MultiIndex 构造函数不再压缩所有长度的多重索引--1级向下压缩为常规的 Index 。这会影响到所有 MultiIndex 构造函数。 (GH17178 )

以前的行为：

In [2]: pd.MultiIndex.from_tuples([('a',), ('b',)])
Out[2]: Index(['a', 'b'], dtype='object')

长度为1的级别不再是特殊大小写。它们的行为完全像你有长度为2+的级别，所以一个 MultiIndex 始终从所有 MultiIndex 构造函数：

In [68]: pd.MultiIndex.from_tuples([('a',), ('b',)])
Out[68]: 
MultiIndex([('a',),
            ('b',)],
           )

UTC系列本地化#

在此之前， to_datetime() 未本地化日期时间 Series 数据何时 utc=True 通过了。现在, to_datetime() 将正确本地化 Series 使用一个 datetime64[ns, UTC] 数据类型与List-Like和 Index 数据被处理。 (GH6415 )。

以前的行为

In [69]: s = pd.Series(['20130101 00:00:00'] * 3)

In [12]: pd.to_datetime(s, utc=True)
Out[12]:
0   2013-01-01
1   2013-01-01
2   2013-01-01
dtype: datetime64[ns]

新行为

In [70]: pd.to_datetime(s, utc=True)
Out[70]: 
0   2013-01-01 00:00:00+00:00
1   2013-01-01 00:00:00+00:00
2   2013-01-01 00:00:00+00:00
Length: 3, dtype: datetime64[ns, UTC]

此外，具有DateTime列的DataFrame由 read_sql_table() 和 read_sql_query() 仅当原始SQL列是支持时区的DATETIME列时，才会将其本地化为UTC。

值域函数的一致性#

在以前的版本中，各种范围函数之间存在一些不一致： date_range() ， bdate_range() ， period_range() ， timedelta_range() ，以及 interval_range() 。 (GH17471 )。

时发生的不一致行为之一 start ， end 和 period 所有参数都已指定，可能会导致范围不明确。当所有三个参数都被传递时， interval_range 忽略了 period 参数， period_range 忽略了 end 参数，并引发其他范围函数。为了促进范围函数之间的一致性，并避免潜在的模糊范围， interval_range 和 period_range 现在将在传递所有三个参数时引发。

以前的行为：

 In [2]: pd.interval_range(start=0, end=4, periods=6)
 Out[2]:
 IntervalIndex([(0, 1], (1, 2], (2, 3]]
               closed='right',
               dtype='interval[int64]')

In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q')
Out[3]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1', '2018Q2'], dtype='period[Q-DEC]', freq='Q-DEC')

新行为：

In [2]: pd.interval_range(start=0, end=4, periods=6)
---------------------------------------------------------------------------
ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q')
---------------------------------------------------------------------------
ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

此外，端点参数 end 不包括在由 interval_range 。但是，所有其他范围函数包括 end 在他们的产出中。为了促进范围函数之间的一致性， interval_range 现在将包括 end 作为最终间隔的右端点，除非 freq 是以一种跳过 end 。

以前的行为：

In [4]: pd.interval_range(start=0, end=4)
Out[4]:
IntervalIndex([(0, 1], (1, 2], (2, 3]]
              closed='right',
              dtype='interval[int64]')

新行为：

In [71]: pd.interval_range(start=0, end=4)
Out[71]: IntervalIndex([[0, 1], [1, 2], [2, 3], [3, 4]], dtype='interval[int64, both]')

无自动Matplotlib转换器#

Pandas不再登记我们的 date ， time ， datetime ， datetime64 ，以及 Period 进口Pandas时使用matplotlib的转换器。Matplotlib作图方法 (plt.plot ， ax.plot ，...)，将不会很好地格式化x轴 DatetimeIndex 或 PeriodIndex 价值观。您必须显式注册以下方法：

pandas built-in Series.plot and DataFrame.plot will register these converters on first-use (GH17710).

备注

这一变化在Pandas0.21.1中暂时恢复，更多细节请参见 here 。

其他API更改#

类别构造函数不再接受 categories 关键字。 (GH16022 )
Accessing a non-existent attribute on a closed HDFStore will now raise an AttributeError rather than a ClosedFileError (GH16301)
read_csv() 现在发布一个 UserWarning 如果 names 参数包含重复项 (GH17095 )
read_csv() 现在招待 'null' 和 'n/a' 缺省情况下将字符串作为缺失值 (GH16471 ， GH16078 )
pandas.HDFStore 的字符串表示现在更快，更不详细。 pandas.HDFStore.info() 。 (GH16503 )。
HDF商店中的压缩默认设置现在遵循PYTABLE标准。默认设置为无压缩，并且如果 complib 失踪了，而且 complevel >0 zlib vt.使用 (GH15943 )
Index.get_indexer_non_unique() now returns a ndarray indexer rather than an Index; this is consistent with Index.get_indexer() (GH16819)
删除了 @slow 装饰师来自 pandas._testing ，这给一些下游包的测试套件带来了问题。使用 @pytest.mark.slow 取而代之的是，它实现了同样的事情 (GH16850 )
移动的定义 MergeError 发送到 pandas.errors 模块。
The signature of Series.set_axis() and DataFrame.set_axis() has been changed from set_axis(axis, labels) to set_axis(labels, axis=0), for consistency with the rest of the API. The old signature is deprecated and will show a FutureWarning (GH14636)
Series.argmin() and Series.argmax() will now raise a TypeError when used with object dtypes, instead of a ValueError (GH13595)
Period 现在是不可变的，并且现在将引发 AttributeError 当用户尝试将新值分配给 ordinal 或 freq 属性 (GH17116 )。
to_datetime() when passed a tz-aware origin= kwarg will now raise a more informative ValueError rather than a TypeError (GH16842)
to_datetime() 现在引发一个 ValueError 当格式包括 %W 或 %U 不包括星期几和日历年 (GH16774 )
已重命名为非功能性 index 至 index_col 在……里面 read_stata() 提高API一致性 (GH16342 )
窃听 DataFrame.drop() 导致的布尔标签 False 和 True 从数值索引中删除索引时分别被视为标签0和1。这现在将引发ValueError (GH16877 )
受限制的DateOffset关键字参数。以前， DateOffset 子类允许任意关键字参数，这可能导致意外行为。现在，只有有效的论点才会被接受。 (GH17176 )。

不推荐使用#

DataFrame.from_csv() and Series.from_csv() have been deprecated in favor of read_csv() (GH4191)
read_excel() has deprecated sheetname in favor of sheet_name for consistency with .to_excel() (GH10559).
read_excel() has deprecated parse_cols in favor of usecols for consistency with read_csv() (GH4988)
read_csv() has deprecated the tupleize_cols argument. Column tuples will always be converted to a MultiIndex (GH17060)
DataFrame.to_csv() 已不推荐使用 tupleize_cols 争论。多索引列将始终以行的形式写入CSV文件 (GH17060 )
这个 convert 参数已被弃用。 .take() 方法，因为它不被尊重 (GH16948 )
pd.options.html.border has been deprecated in favor of pd.options.display.html.border (GH15793).
SeriesGroupBy.nth() has deprecated True in favor of 'all' for its kwarg dropna (GH11038).
DataFrame.as_blocks() 已弃用，因为这会公开内部实现 (GH17302 )
pd.TimeGrouper is deprecated in favor of pandas.Grouper (GH16747)
cdate_range 已被弃用，取而代之的是 bdate_range() ，它已经获得了 weekmask 和 holidays 用于构建自定义频率日期范围的参数。请参阅 documentation 获取更多详细信息 (GH17596 )
passing categories or ordered kwargs to Series.astype() is deprecated, in favor of passing a CategoricalDtype (GH17636)
.get_value 和 .set_value 在……上面 Series ， DataFrame ， Panel ， SparseSeries ，以及 SparseDataFrame 不推荐使用，而是使用 .iat[] 或 .at[] 访问者 (GH15269 )
传入一个不存在的列 .to_excel(..., columns=) 已弃用，并将引发 KeyError 在未来 (GH17295 )
raise_on_error parameter to Series.where(), Series.mask(), DataFrame.where(), DataFrame.mask() is deprecated, in favor of errors= (GH14968)
使用 DataFrame.rename_axis() 和 Series.rename_axis() 更改索引或列标签现在已弃用，转而支持使用 .rename 。 rename_axis 仍可用于更改索引或列的名称 (GH17833 )。
reindex_axis() 已被弃用，取而代之的是 reindex() 。看见 here 了解更多信息 (GH17833 )。

Series.select和DataFrame.select#

The Series.select() and DataFrame.select() methods are deprecated in favor of using df.loc[labels.map(crit)] (GH12401)

In [72]: df = pd.DataFrame({'A': [1, 2, 3]}, index=['foo', 'bar', 'baz'])

In [3]: df.select(lambda x: x in ['bar', 'baz'])
FutureWarning: select is deprecated and will be removed in a future release. You can use .loc[crit] as a replacement
Out[3]:
     A
bar  2
baz  3

In [73]: df.loc[df.index.map(lambda x: x in ['bar', 'baz'])]
Out[73]: 
     A
bar  2
baz  3

[2 rows x 1 columns]

Series.argmax和Series.argmin#

的行为 Series.argmax() 和 Series.argmin() 已被弃用，取而代之的是 Series.idxmax() 和 Series.idxmin() ，分别 (GH16830 )。

为了与NumPy阵列兼容， pd.Series 机具 argmax 和 argmin 。从Pandas0.13.0开始， argmax 一直是一个别名 pandas.Series.idxmax() ，以及 argmin 一直是一个别名 pandas.Series.idxmin() 。它们返回标签的最大值或最小值，而不是职位。

我们已经不赞成当前的行为 Series.argmax 和 Series.argmin 。使用这两种方法之一都会发出一个 FutureWarning 。使用 Series.idxmax() 如果你想要最大值的标签。使用 Series.values.argmax() 如果你想要最大值的位置。最低限度也是如此。在未来的版本中 Series.argmax 和 Series.argmin 将返回最大值或最小值的位置。

删除先前版本的弃用/更改#

read_excel() 已经放弃了 has_index_names 参数 (GH10967 )
这个 pd.options.display.height 配置已丢弃 (GH3663 )
这个 pd.options.display.line_width 配置已丢弃 (GH2881 )
这个 pd.options.display.mpl_style 配置已丢弃 (GH12190 )
Index has dropped the .sym_diff() method in favor of .symmetric_difference() (GH12591)
Categorical has dropped the .order() and .sort() methods in favor of .sort_values() (GH12882)
eval() and DataFrame.eval() have changed the default of inplace from None to False (GH11149)
该函数 get_offset_name 已经放弃，取而代之的是 .freqstr 偏移量的属性 (GH11834 )
Pandas不再测试与hdf5的兼容性-使用Pandas<0.11创建的文件 (GH17404 )。

性能改进#

Improved performance of instantiating SparseDataFrame (GH16773)
Series.dt 不再执行频率推断，从而在访问属性时产生较大的加速比 (GH17210 )
改进的性能 set_categories() 通过不实现价值 (GH17508 )
Timestamp.microsecond 不再对属性访问进行重新计算 (GH17331 )
提高了 CategoricalIndex 对于已经是绝对数据类型的数据 (GH17513 )
改进的性能 RangeIndex.min() 和 RangeIndex.max() 通过使用 RangeIndex 属性来执行计算。 (GH17607 )

文档更改#

几个 NaT 方法文档字符串(例如 NaT.ctime() )不正确 (GH17327 )
文档中已删除并清理了版本<v0.17 (GH17442 ， GH17442 ， GH17404 & GH17504 )

错误修复#

转换#

对类似DateTime的数据进行赋值时出现错误， int 可能会错误地转换为类似DateTime (GH14145 )
分配中的错误 int64 具有以下功能的数据 np.ndarray 使用 float64 DTYPE可以保留 int64 数据类型 (GH14001 )
修复了的返回类型 IntervalIndex.is_non_overlapping_monotonic 成为一条 Python bool 用于与类似属性/方法的一致性。以前返回了一个 numpy.bool_ 。 (GH17237 )
窃听 IntervalIndex.is_non_overlapping_monotonic 当间隔在两侧闭合并在一点重叠时 (GH16560 )
窃听 Series.fillna() 在以下情况下返回Frame inplace=True 和 value 是DICT (GH16156 )
窃听 Timestamp.weekday_name 在本地化为时区时返回基于UTC的工作日名称 (GH17354 )
窃听 Timestamp.replace 当更换时 tzinfo 围绕DST的变化 (GH15683 )
窃听 Timedelta 构造和算法，它们不会传播 Overflow 例外情况 (GH17367 )
窃听 astype() 传递扩展类型类时转换为对象数据类型 (DatetimeTZDtype ， CategoricalDtype )而不是实例。现在是一个 TypeError 在传递类时引发 (GH17780 )。
Bug in to_numeric() in which elements were not always being coerced to numeric when errors='coerce' (GH17007, GH17125)
Bug in DataFrame and Series constructors where range objects are converted to int32 dtype on Windows instead of int64 (GH16804)

标引#

当使用空片调用时(例如 df.iloc[:] )、 .iloc 和 .loc 索引器返回原始对象的浅表副本。以前，它们返回原始对象。 (GH13873 )。
在未排序的 MultiIndex ，即 loc 索引器现在将引发 UnsortedIndexError 仅当在未排序的级别上使用适当的切片时 (GH16734 )。
Fixes regression in 0.20.3 when indexing with a string on a TimedeltaIndex (GH16896).
固定的 TimedeltaIndex.get_loc() 处理 np.timedelta64 输入 (GH16909 )。
修复 MultiIndex.sort_index() 订购时间 ascending 参数是一个列表，但不是指定了所有级别，或者不是以不同的顺序指定 (GH16934 )。
修复了使用进行索引的错误 np.inf 导致了一个 OverflowError 待养 (GH16957 )
Bug in reindexing on an empty CategoricalIndex (GH16770)
Fixes DataFrame.loc for setting with alignment and tz-aware DatetimeIndex (GH16889)
避免 IndexError 将索引或级数传递给 .iloc 带着老麻木 (GH17193 )
允许在Python2的多级列中使用Unicode空字符串作为占位符 (GH17099 )
窃听 .iloc 与就地加法或赋值一起使用时，以及 MultiIndex 导致读取和写入错误的索引 (GH17148 )
窃听 .isin() 其中将成员资格检查为空 Series 对象引发错误 (GH16991 )
窃听 CategoricalIndex 不遵守包含重复项的指定索引的重建索引 (GH17323 )
的交叉点出现错误 RangeIndex 带负阶跃 (GH17296 )
窃听 IntervalIndex 其中，对包含的非重叠单调递减索引的右端点执行标量查找失败 (GH16417 ， GH17271 )
窃听 DataFrame.first_valid_index() 和 DataFrame.last_valid_index() 当没有有效条目时 (GH17400 )
窃听 Series.rename() 方法调用时，会错误地更改 Series ，而不是 Index 。 (GH17407 )
窃听 String.str_get() 加薪 IndexError 而不是在使用负索引时插入NAN。 (GH17704 )

IO#

窃听 read_hdf() 从读取时区感知索引时 fixed 格式化HDFStore (GH17618 )
窃听 read_csv() 其中的列未彻底消除重复数据 (GH17060 )
窃听 read_csv() 其中指定的列名没有被彻底消除重复 (GH17095 )
窃听 read_csv() 其中，Header参数的非整数值生成了一条无用/无关的错误消息 (GH16338 )
窃听 read_csv() 其中异常处理中的内存管理问题在某些情况下会导致解释器段错误 (GH14696 ， GH16798 )。
Bug in read_csv() when called with low_memory=False in which a CSV with at least one column > 2GB in size would incorrectly raise a MemoryError (GH16798).
窃听 read_csv() 使用单元素列表调用时 header 将返回一个 DataFrame 在所有NaN值中 (GH7757 )
窃听 DataFrame.to_csv() 默认使用‘ascii’编码，而不是‘utf-8’。 (GH17097 )
窃听 read_stata() 其中，使用迭代器时无法读取值标签 (GH16923 )
窃听 read_stata() 未设置索引的位置 (GH16342 )
窃听 read_html() 当在多个线程中运行时，导入检查失败 (GH16928 )
窃听 read_csv() 其中，自动分隔符检测导致 TypeError 遇到错误行时抛出，而不是正确的错误消息 (GH13374 )
窃听 DataFrame.to_html() 使用 notebook=True 其中，具有命名索引或非多索引索引的DataFrame对于列或行标签分别具有不需要的水平对齐或垂直对齐 (GH16792 )
窃听 DataFrame.to_html() 其中没有验证 justify 参数 (GH17527 )
窃听 HDFStore.select() 当读取具有VLArray的连续混合数据表时 (GH17021 )
窃听 to_json() 其中有几种情况(包括具有不可打印符号的对象、具有深度递归的对象、过长的标签)导致分段错误，而不是引发相应的异常 (GH14256 )

标绘#

使用绘图方法时出现错误 secondary_y 和 fontsize 未设置副轴字体大小 (GH12565 )
打印时出现错误 timedelta 和 datetime Y轴上的数据类型 (GH16953 )
线状图在计算xlims时不再假定单调的x数据，现在即使是未排序的x数据，它们也会显示整条线。 (GH11310 ， GH11471 )
在matplotlib 2.0.0和更高版本中，线条图的x限制的计算留给matplotlib，以便应用其新的默认设置。 (GH15495 )
Bug in Series.plot.bar or DataFrame.plot.bar with y not respecting user-passed color (GH16822)
错误导致 plotting.parallel_coordinates 使用随机颜色时重置随机种子 (GH17525 )

分组依据/重采样/滚动#

Bug in DataFrame.resample(...).size() where an empty DataFrame did not return a Series (GH14962)
窃听 infer_freq() 导致工作日有两天间隔的指数被错误地推断为商业日报 (GH16624 )
Bug in .rolling(...).quantile() which incorrectly used different defaults than Series.quantile() and DataFrame.quantile() (GH9413, GH16211)
窃听 groupby.transform() 这将强制布尔数据类型返回浮点型 (GH16875 )
Bug in Series.resample(...).apply() where an empty Series modified the source index and did not return the name of a Series (GH14313)
Bug in .rolling(...).apply(...) with a DataFrame with a DatetimeIndex, a window of a timedelta-convertible and min_periods >= 1 (GH15305)
窃听 DataFrame.groupby 当键的数量等于GROUPBY轴上的元素数量时，无法正确识别索引键和列键 (GH16859 )
窃听 groupby.nunique() 使用 TimeGrouper 它不能处理 NaT 正确无误 (GH17575 )
窃听 DataFrame.groupby 其中，单个级别选择来自 MultiIndex 出乎意料地排序 (GH17537 )
窃听 DataFrame.groupby 在以下情况下发出虚假警告 Grouper 对象用于重写不明确的列名。 (GH17383 )
窃听 TimeGrouper 作为列表和标量传递时不同 (GH17530 )

稀疏#

窃听 SparseSeries 加薪 AttributeError 当词典作为数据传入时 (GH16905 )
窃听 SparseDataFrame.fillna() 当从SciPy稀疏矩阵实例化帧时，未填充所有NAN (GH16112 )
Bug in SparseSeries.unstack() and SparseDataFrame.stack() (GH16614, GH15045)
Bug in make_sparse() treating two numeric/boolean data, which have same bits, as same when array dtype is object (GH17574)
SparseArray.all() 和 SparseArray.any() 现已实现为处理 SparseArray ，这些已被使用，但未实现 (GH17570 )

重塑#

Joining/Merging with a non unique PeriodIndex raised a TypeError (GH16871)
窃听 crosstab() 其中，非对齐的整数序列被强制转换为浮点型 (GH17005 )
Bug in merging with categorical dtypes with datetimelikes incorrectly raised a TypeError (GH16900)
使用时出现错误 isin() 关于大对象序列和大比较数组 (GH16012 )
修正了从0.2开始的回归， Series.aggregate() 和 DataFrame.aggregate() 再次允许将词典作为返回值 (GH16741 )
Fixes dtype of result with integer dtype input, from pivot_table() when called with margins=True (GH17013)
Bug in crosstab() where passing two Series with the same name raised a KeyError (GH13279)
Series.argmin() ， Series.argmax() ，和他们的同行在 DataFrame Groupby对象可以正确处理包含无限值的浮点数据 (GH13595 )。
Bug in unique() where checking a tuple of strings raised a TypeError (GH17108)
窃听 concat() 其中，如果结果索引包含不可比较元素，则结果索引的顺序不可预测 (GH17344 )
修复了按多列排序时的回归问题 datetime64 数据类型 Series 使用 NaT 值 (GH16836 )
Bug in pivot_table() where the result's columns did not preserve the categorical dtype of columns when dropna was False (GH17842)
Bug in DataFrame.drop_duplicates where dropping with non-unique column names raised a ValueError (GH17836)
窃听 unstack() 当对级别列表调用时，它将丢弃 fillna 论据 (GH13971 )
对齐中的错误 range 对象和其他列表点赞 DataFrame 导致按行而不是按列执行操作 (GH17901 )

数字#

Bug in .clip() with axis=1 and a list-like for threshold is passed; previously this raised ValueError (GH15390)
Series.clip() and DataFrame.clip() now treat NA values for upper and lower arguments as None instead of raising ValueError (GH17276).

直截了当的#

窃听 Series.isin() 当使用一个绝对的 (GH16639 )
具有空值和类别的分类构造函数中的错误，导致 .categories 做一个空虚的人 Float64Index 而不是空荡荡的 Index 具有对象数据类型 (GH17248 )
分类运算中的错误，使用 Series.cat 不保留原版剧集的名称 (GH17509 )
窃听 DataFrame.merge() 对于数据类型为Boolean/int的类别列失败 (GH17187 )
在构造一个 Categorical/CategoricalDtype 当指定的 categories 是绝对型的 (GH17884 )。

PyPy#

Compatibility with PyPy in read_csv() with usecols=[<unsorted ints>] and read_json() (GH17351)
Split tests into cases for CPython and PyPy where needed, which highlights the fragility of index matching with float('nan'), np.nan and NAT (GH17351)
修复 DataFrame.memory_usage() 来支持PYPy。PyPy上的对象没有固定大小，因此使用近似值 (GH17228 )

其他#

一些Inplace运算符没有包装并在调用时生成副本的错误 (GH12962 )
窃听 eval() 其中 inplace 参数被错误地处理 (GH16732 )

贡献者#

共有206人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

3553x +
Aaron Barber
Adam Gleave +
Adam Smith +
AdamShamlian +
Adrian Liaw +
Alan Velasco +
Alan Yee +
Alex B +
Alex Lubbock +
Alex Marchenko +
Alex Rychyk +
Amol K +
Andreas Winkler
Andrew +
Andrew 亮
André Jonasson +
Becky Sweger
Berkay +
Bob Haffner +
Bran Yang
Brian Tu +
Brock Mendel +
Carol Willing +
Carter Green +
Chankey Pathak +
Chris
Chris Billington
Chris Filo Gorgolewski +
Chris Kerr
Chris M +
Chris Mazzullo +
Christian Prinoth
Christian Stade-Schuldt
Christoph Moehl +
DSM
Daniel Chen +
Daniel Grady
Daniel Himmelstein
Dave Willmer
David Cook
David Gwynne
David Read +
Dillon Niederhut +
Douglas Rudd
Eric Stein +
Eric Wieser +
Erik Fredriksen
Florian Wilhelm +
Floris Kint +
Forbidden Donut
Gabe F +
Giftlin +
Giftlin Rajaiah +
Giulio Pepe +
Guilherme Beltramini
Guillem Borrell +
Hanmin Qin +
Hendrik Makait +
Hugues Valois
Hussain Tamboli +
Iva Miholic +
Jan Novotný +
Jan Rudolph
Jean Helie +
Jean-Baptiste Schiratti +
Jean-Mathieu Deschenes
Jeff Knupp +
Jeff Reback
Jeff Tratner
JennaVergeynst
JimStearns206
Joel Nothman
John W. O'Brien
Jon Crall +
Jon Mease
Jonathan J. Helmus +
Joris Van den Bossche
JosephWagner
Juarez Bochi
Julian Kuhlmann +
Karel De Brabandere
Kassandra Keeton +
Keiron Pizzey +
Keith Webber
Kernc
Kevin Sheppard
Kirk Hansen +
Licht Takeuchi +
Lucas Kushner +
Mahdi Ben Jelloul +
Makarov Andrey +
Malgorzata Turzanska +
Marc Garcia +
Margaret Sy +
MarsGuy +
Matt Bark +
Matthew Roeschke
Matti Picus
Mehmet Ali "Mali" Akmanalp
Michael Gasvoda +
Michael Penkov +
Milo +
Morgan Stuart +
Morgan243 +
Nathan Ford +
Nick Eubank
Nick Garvey +
Oleg Shteynbuk +
P-Tillmann +
Pankaj Pandey
Patrick Luo
Patrick O'Melveny
Paul Reidy +
Paula +
Peter Quackenbush
Peter Yanovich +
Phillip Cloud
Pierre Haessig
Pietro Battiston
Pradyumna Reddy Chinthala
Prasanjit Prakash
RobinFiveWords
Ryan Hendrickson
Sam Foo
Sangwoong Yoon +
Simon Gibbons +
SimonBaron
Steven Cutting +
Sudeep +
Sylvia +
T N +
Telt
Thomas A Caswell
Tim Swast +
Tom Augspurger
Tong SHEN
Tuan +
Utkarsh Upadhyay +
Vincent La +
Vivek +
WANG Aiyong
WBare
Wes McKinney
XF +
Yi Liu +
Yosuke Nakabayashi +
aaron315 +
abarber4gh +
aernlund +
agustín méndez +
andymaheshw +
ante328 +
aviolov +
bpraggastis
cbertinato +
cclauss +
chernrick
chris-b1
dkamm +
dwkenefick
economy
faic +
fding253 +
gfyoung
guygoldberg +
hhuuggoo +
huashuai +
ian
iulia +
jaredsnyder
jbrockmendel +
jdeschenes
jebob +
jschendel +
keitakurita
kernc +
kiwirob +
kjford
linebp
lloydkirk
louispotok +
majiang +
manikbhandari +
matthiashuschle +
mattip
maxwasserman +
mjlove12 +
nmartensen +
pandas-docs-bot +
parchd-1 +
philipphanemann +
rdk1024 +
reidy-p +
ri938
ruiann +
rvernica +
s-weigand +
scotthavard92 +
skwbc +
step4me +
tobycheese +
topper-123 +
tsdlovell
ysau +
zzgao +

0.21.1版(2017年12月12日)

0.20.3版(2017年7月7日)

0.21.0版(2017年10月27日)#

新功能#

与ApacheParquet文件格式集成#

方法 infer_objects 类型转换#

改进了尝试创建列时的警告#

方法 drop 现在还接受索引/列关键字#

方法： rename ， reindex 现在还接受AXIS关键字#

CategoricalDtype 用于指定类别#

GroupBy 对象现在具有一个 pipe 方法#

Categorical.rename_categories 接受类似DICT的#

其他增强功能#

新函数或新方法#

新关键字#

各种增强功能#

向后不兼容的API更改#

依赖项增加了最低版本#

全NaN或空系列/数据帧的总和/生产现在是一致的NaN#

不建议使用缺少标注的列表进行索引#

NA命名更改#

系列/索引的迭代现在将返回Python标量#

使用布尔索引进行索引#

PeriodIndex 重采样#

改进了pd.eval中项目分配过程中的错误处理#

数据类型转换#

具有单层的多索引构造函数#

UTC系列本地化#

值域函数的一致性#

无自动Matplotlib转换器#

其他API更改#

不推荐使用#

Series.select和DataFrame.select#

Series.argmax和Series.argmin#

删除先前版本的弃用/更改#

性能改进#

文档更改#

错误修复#

转换#

标引#

IO#

标绘#

分组依据/重采样/滚动#

稀疏#

重塑#

数字#

直截了当的#

PyPy#

其他#

贡献者#

方法 `infer_objects` 类型转换#

方法 `drop` 现在还接受索引/列关键字#

方法： `rename` ， `reindex` 现在还接受AXIS关键字#

`CategoricalDtype` 用于指定类别#

`GroupBy` 对象现在具有一个 `pipe` 方法#

`Categorical.rename_categories` 接受类似DICT的#

`PeriodIndex` 重采样#