0.23.0中的新特性(2018年5月15日)#

这是从0.22.0开始的一个主要版本，包括许多API更改、弃用、新功能、增强和性能改进，以及大量的错误修复。我们建议所有用户升级到此版本。

亮点包括：

Round-trippable JSON format with 'table' orient 。
Instantiation from dicts respects order for Python 3.6+ 。
Dependent column arguments for assign 。
Merging / sorting on a combination of columns and index levels 。
Extending pandas with custom types 。
Excluding unobserved categories from groupby 。
Changes to make output shape of DataFrame.apply consistent 。

检查 API Changes 和 deprecations 在更新之前。

警告

从2019年1月1日开始，Pandas功能发布将只支持Python3。看见 Dropping Python 2.7 想要更多。

V0.23.0中的新特性

新功能
向后不兼容的API更改
不推荐使用
删除先前版本的弃用/更改
性能改进
文档更改
错误修复
- 直截了当的
- 类似日期的
- Timedelta
- 时区
- 偏移
- 数字
- 字符串
- 标引
- MultiIndex
- IO
- 标绘
- 分组依据/重采样/滚动
- 稀疏
- 重塑
- 其他
贡献者

新功能#

JSON读/写可往返，带 `orient='table'`#

A DataFrame 现在可以通过JSON写入并随后读回，同时通过使用 orient='table' 参数(请参见 GH18912 和 GH9146 )。以前，没有可用的 orient 值保证了数据类型和索引名以及其他元数据的保留。

In [1]: df = pd.DataFrame({'foo': [1, 2, 3, 4],
   ...:                    'bar': ['a', 'b', 'c', 'd'],
   ...:                    'baz': pd.date_range('2018-01-01', freq='d', periods=4),
   ...:                    'qux': pd.Categorical(['a', 'b', 'c', 'c'])},
   ...:                   index=pd.Index(range(4), name='idx'))
   ...: 

In [2]: df
Out[2]: 
     foo bar        baz qux
idx                        
0      1   a 2018-01-01   a
1      2   b 2018-01-02   b
2      3   c 2018-01-03   c
3      4   d 2018-01-04   c

[4 rows x 4 columns]

In [3]: df.dtypes
Out[3]: 
foo             int64
bar            object
baz    datetime64[ns]
qux          category
Length: 4, dtype: object

In [4]: df.to_json('test.json', orient='table')

In [5]: new_df = pd.read_json('test.json', orient='table')

In [6]: new_df
Out[6]: 
     foo bar        baz qux
idx                        
0      1   a 2018-01-01   a
1      2   b 2018-01-02   b
2      3   c 2018-01-03   c
3      4   d 2018-01-04   c

[4 rows x 4 columns]

In [7]: new_df.dtypes
Out[7]: 
foo             int64
bar            object
baz    datetime64[ns]
qux          category
Length: 4, dtype: object

请注意，该字符串 index 不支持往返格式，因为默认情况下 write_json 以指示缺少索引名。

In [8]: df.index.name = 'index'

In [9]: df.to_json('test.json', orient='table')

In [10]: new_df = pd.read_json('test.json', orient='table')

In [11]: new_df
Out[11]: 
   foo bar        baz qux
0    1   a 2018-01-01   a
1    2   b 2018-01-02   b
2    3   c 2018-01-03   c
3    4   d 2018-01-04   c

[4 rows x 4 columns]

In [12]: new_df.dtypes
Out[12]: 
foo             int64
bar            object
baz    datetime64[ns]
qux          category
Length: 4, dtype: object

方法 `.assign()` 接受从属参数#

The DataFrame.assign() now accepts dependent keyword arguments for python version later than 3.6 (see also PEP 468). Later keyword arguments may now refer to earlier ones if the argument is a callable. See the documentation here (GH14207)

In [13]: df = pd.DataFrame({'A': [1, 2, 3]})

In [14]: df
Out[14]: 
   A
0  1
1  2
2  3

[3 rows x 1 columns]

In [15]: df.assign(B=df.A, C=lambda x: x['A'] + x['B'])
Out[15]: 
   A  B  C
0  1  1  2
1  2  2  4
2  3  3  6

[3 rows x 3 columns]

警告

这可能会微妙地更改您的代码在使用 .assign() 若要更新现有列，请执行以下操作。以前，引用正在更新的其他变量的可计算函数将获得“旧”值

以前的行为：

In [2]: df = pd.DataFrame({"A": [1, 2, 3]})

In [3]: df.assign(A=lambda df: df.A + 1, C=lambda df: df.A * -1)
Out[3]:
   A  C
0  2 -1
1  3 -2
2  4 -3

新行为：

In [16]: df.assign(A=df.A + 1, C=lambda df: df.A * -1)
Out[16]: 
   A  C
0  2 -2
1  3 -3
2  4 -4

[3 rows x 2 columns]

在列和索引级的组合上合并#

传递到的字符串 DataFrame.merge() 作为 on ， left_on ，以及 right_on 参数现在可以引用列名或索引级名称。这使合并成为可能 DataFrame 实例的索引级别和列的组合而不重置索引。请参阅 Merge on columns and levels 文档部分。 (GH14355 )

In [17]: left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')

In [18]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
   ....:                      'B': ['B0', 'B1', 'B2', 'B3'],
   ....:                      'key2': ['K0', 'K1', 'K0', 'K1']},
   ....:                     index=left_index)
   ....: 

In [19]: right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')

In [20]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
   ....:                       'D': ['D0', 'D1', 'D2', 'D3'],
   ....:                       'key2': ['K0', 'K0', 'K0', 'K1']},
   ....:                      index=right_index)
   ....: 

In [21]: left.merge(right, on=['key1', 'key2'])
Out[21]: 
       A   B key2   C   D
key1                     
K0    A0  B0   K0  C0  D0
K1    A2  B2   K0  C1  D1
K2    A3  B3   K1  C3  D3

[3 rows x 5 columns]

按列和索引级别的组合进行排序#

传递到的字符串 DataFrame.sort_values() 作为 by 参数现在可以引用列名或索引级名称。这将启用排序 DataFrame 实例的索引级别和列的组合而不重置索引。请参阅 Sorting by Indexes and Values 文档部分。 (GH14353 )

# Build MultiIndex
In [22]: idx = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('a', 2),
   ....:                                  ('b', 2), ('b', 1), ('b', 1)])
   ....: 

In [23]: idx.names = ['first', 'second']

# Build DataFrame
In [24]: df_multi = pd.DataFrame({'A': np.arange(6, 0, -1)},
   ....:                         index=idx)
   ....: 

In [25]: df_multi
Out[25]: 
              A
first second   
a     1       6
      2       5
      2       4
b     2       3
      1       2
      1       1

[6 rows x 1 columns]

# Sort by 'second' (index) and 'A' (column)
In [26]: df_multi.sort_values(by=['second', 'A'])
Out[26]: 
              A
first second   
b     1       1
      1       2
a     1       6
b     2       3
a     2       4
      2       5

[6 rows x 1 columns]

使用自定义类型扩展Pandas(试验性)#

Pandas现在支持将不一定是一维NumPy数组的类似数组的对象存储为DataFrame中的列或Series中的值。这允许第三方库实现对NumPy类型的扩展，类似于Pandas实现带有时区、周期和间隔的分类、日期时间。

作为演示，我们将使用 cyberpandas, 它提供了一种 IPArray 用于存储IP地址的类型。

In [1]: from cyberpandas import IPArray

In [2]: values = IPArray([
   ...:     0,
   ...:     3232235777,
   ...:     42540766452641154071740215577757643572
   ...: ])
   ...:
   ...:

IPArray 不是普通的一维NumPy数组，而是因为它是Pandas ExtensionArray ，它可以妥善地储存在Pandas的容器里。

In [3]: ser = pd.Series(values)

In [4]: ser
Out[4]:
0                         0.0.0.0
1                     192.168.1.1
2    2001:db8:85a3::8a2e:370:7334
dtype: ip

请注意，数据类型为 ip 。考虑基础数组的缺失值语义：

In [5]: ser.isna()
Out[5]:
0     True
1    False
2    False
dtype: bool

有关更多信息，请参阅 extension types 文件。如果您构建了扩展数组，请在我们的 ecosystem page 。

新的 `observed` 中排除未观察到的类别的关键字 `GroupBy`#

按类别分组包括输出中未观察到的类别。当按多个分类列分组时，这意味着您将获得所有类别的笛卡尔乘积，包括没有观测的组合，这可能会导致大量的组。我们添加了一个关键字 observed 要控制此行为，它缺省为 observed=False 以实现向后兼容性。 (GH14942 ， GH8138 ， GH15217 ， GH17594 ， GH8669 ， GH20583 ， GH20902 )

In [27]: cat1 = pd.Categorical(["a", "a", "b", "b"],
   ....:                       categories=["a", "b", "z"], ordered=True)
   ....: 

In [28]: cat2 = pd.Categorical(["c", "d", "c", "d"],
   ....:                       categories=["c", "d", "y"], ordered=True)
   ....: 

In [29]: df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})

In [30]: df['C'] = ['foo', 'bar'] * 2

In [31]: df
Out[31]: 
   A  B  values    C
0  a  c       1  foo
1  a  d       2  bar
2  b  c       3  foo
3  b  d       4  bar

[4 rows x 4 columns]

要显示所有值，请执行以下操作：

In [32]: df.groupby(['A', 'B', 'C'], observed=False).count()
Out[32]: 
         values
A B C          
a c bar       0
    foo       1
  d bar       1
    foo       0
  y bar       0
...         ...
z c foo       0
  d bar       0
    foo       0
  y bar       0
    foo       0

[18 rows x 1 columns]

要仅显示观察值，请执行以下操作：

In [33]: df.groupby(['A', 'B', 'C'], observed=True).count()
Out[33]: 
         values
A B C          
a c foo       1
  d bar       1
b c foo       1
  d bar       1

[4 rows x 1 columns]

对于旋转操作，此行为为已经由 dropna 关键词：

In [34]: cat1 = pd.Categorical(["a", "a", "b", "b"],
   ....:                       categories=["a", "b", "z"], ordered=True)
   ....: 

In [35]: cat2 = pd.Categorical(["c", "d", "c", "d"],
   ....:                       categories=["c", "d", "y"], ordered=True)
   ....: 

In [36]: df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})

In [37]: df
Out[37]: 
   A  B  values
0  a  c       1
1  a  d       2
2  b  c       3
3  b  d       4

[4 rows x 3 columns]

In [38]: pd.pivot_table(df, values='values', index=['A', 'B'],
   ....:                dropna=True)
   ....: 
Out[38]: 
     values
A B        
a c       1
  d       2
b c       3
  d       4

[4 rows x 1 columns]

In [39]: pd.pivot_table(df, values='values', index=['A', 'B'],
   ....:                dropna=False)
   ....: 
Out[39]: 
     values
A B        
a c     1.0
  d     2.0
  y     NaN
b c     3.0
  d     4.0
  y     NaN
z c     NaN
  d     NaN
  y     NaN

[9 rows x 1 columns]

Rolling/Expaning.Apply()接受 `raw=False` 要通过一个 `Series` 传递给函数#

Series.rolling().apply() ， DataFrame.rolling().apply() ， Series.expanding().apply() ，以及 DataFrame.expanding().apply() 已经获得了一个 raw=None 参数。这类似于 DataFame.apply() 。此参数，如果 True 允许用户发送 np.ndarray 应用的函数。如果 False 一个 Series 将会通过。默认为 None ，这保留了向后兼容性，因此这将默认为 True ，发送一个 np.ndarray 。在将来的版本中，默认设置将更改为 False ，发送一个 Series 。 (GH5071 ， GH20584 )

In [40]: s = pd.Series(np.arange(5), np.arange(5) + 1)

In [41]: s
Out[41]: 
1    0
2    1
3    2
4    3
5    4
Length: 5, dtype: int64

传递一个 Series ：

In [42]: s.rolling(2, min_periods=1).apply(lambda x: x.iloc[-1], raw=False)
Out[42]: 
1    0.0
2    1.0
3    2.0
4    3.0
5    4.0
Length: 5, dtype: float64

模仿传递ndarray的原始行为：

In [43]: s.rolling(2, min_periods=1).apply(lambda x: x[-1], raw=True)
Out[43]: 
1    0.0
2    1.0
3    2.0
4    3.0
5    4.0
Length: 5, dtype: float64

`DataFrame.interpolate` 已经获得了 `limit_area` 科瓦格#

DataFrame.interpolate() 已经获得了 limit_area 参数以允许进一步控制 NaN %s被替换。使用 limit_area='inside' 仅填充由有效值包围的NAN或使用 limit_area='outside' 仅填充 NaN %s在现有有效值之外，同时保留其中的那些值。 (GH16284 )请参阅 full documentation here 。

In [44]: ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan,
   ....:                  np.nan, 13, np.nan, np.nan])
   ....: 

In [45]: ser
Out[45]: 
0     NaN
1     NaN
2     5.0
3     NaN
4     NaN
5     NaN
6    13.0
7     NaN
8     NaN
Length: 9, dtype: float64

在两个方向上填充一个连续的内部值

In [46]: ser.interpolate(limit_direction='both', limit_area='inside', limit=1)
Out[46]: 
   NaN
   NaN
   5.0
   7.0
   NaN
  11.0
  13.0
   NaN
   NaN
Length: 9, dtype: float64

向后填充所有连续的外部值

In [47]: ser.interpolate(limit_direction='backward', limit_area='outside')
Out[47]: 
   5.0
   5.0
   5.0
   NaN
   NaN
   NaN
  13.0
   NaN
   NaN
Length: 9, dtype: float64

在两个方向上填充所有连续的外部值

In [48]: ser.interpolate(limit_direction='both', limit_area='outside')
Out[48]: 
   5.0
   5.0
   5.0
   NaN
   NaN
   NaN
  13.0
  13.0
  13.0
Length: 9, dtype: float64

功能 `get_dummies` 现在支持 `dtype` 论据#

这个 get_dummies() 现在接受 dtype 参数，该参数指定新列的数据类型。默认设置为uint8。 (GH18330 )

In [49]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})

In [50]: pd.get_dummies(df, columns=['c']).dtypes
Out[50]: 
a      int64
b      int64
c_5    uint8
c_6    uint8
Length: 4, dtype: object

In [51]: pd.get_dummies(df, columns=['c'], dtype=bool).dtypes
Out[51]: 
a      int64
b      int64
c_5     bool
c_6     bool
Length: 4, dtype: object

Timedelta mod方法#

mod (%)和 divmod 操作现在在上定义 Timedelta 当使用类时间增量或使用数值参数操作时，。请参阅 documentation here 。 (GH19365 )

In [52]: td = pd.Timedelta(hours=37)

In [53]: td % pd.Timedelta(minutes=45)
Out[53]: Timedelta('0 days 00:15:00')

方法 `.rank()` 手柄 `inf` 值符合以下条件 `NaN` 都在现场#

在以前的版本中， .rank() 会分配给 inf 元素 NaN 作为他们的队伍。现在，排名得到了正确的计算。 (GH6945 )

In [54]: s = pd.Series([-np.inf, 0, 1, np.nan, np.inf])

In [55]: s
Out[55]: 
0   -inf
1    0.0
2    1.0
3    NaN
4    inf
Length: 5, dtype: float64

以前的行为：

In [11]: s.rank()
Out[11]:
0    1.0
1    2.0
2    3.0
3    NaN
4    NaN
dtype: float64

当前行为：

In [56]: s.rank()
Out[56]: 
0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
Length: 5, dtype: float64

此外，在此之前，如果你排名 inf 或 -inf 价值与 NaN 值，则计算不会区分 NaN 在使用‘top’或‘Bottom’参数时从无穷大开始。

In [57]: s = pd.Series([np.nan, np.nan, -np.inf, -np.inf])

In [58]: s
Out[58]: 
0    NaN
1    NaN
2   -inf
3   -inf
Length: 4, dtype: float64

以前的行为：

In [15]: s.rank(na_option='top')
Out[15]:
0    2.5
1    2.5
2    2.5
3    2.5
dtype: float64

当前行为：

In [59]: s.rank(na_option='top')
Out[59]: 
0    1.5
1    1.5
2    3.5
3    3.5
Length: 4, dtype: float64

这些臭虫被压扁了：

窃听 DataFrame.rank() 和 Series.rank() 什么时候 method='dense' 和 pct=True 其中百分位数排名没有与不同观察值的数量一起使用 (GH15630 )
窃听 Series.rank() 和 DataFrame.rank() 什么时候 ascending='False' 在以下情况下，无法返回无穷大的正确排名 NaN 都在现场 (GH19538 )
窃听 DataFrameGroupBy.rank() 其中，当无穷大和无穷大都是不正确的 NaN 都在现场 (GH20561 )

`Series.str.cat` 已经获得了 `join` 科瓦格#

在此之前， Series.str.cat() 没有--与大多数 pandas --对齐 Series 在连接之前对它们的索引(请参见 GH18657 )。该方法现在获得了一个关键字 join 要控制对齐方式，请参见下面的示例和 here 。

在版本0.23中 join 将默认为无(表示无对齐)，但此默认值将更改为 'left' 在未来的Pandas版本中。

In [60]: s = pd.Series(['a', 'b', 'c', 'd'])

In [61]: t = pd.Series(['b', 'd', 'e', 'c'], index=[1, 3, 4, 2])

In [62]: s.str.cat(t)
Out[62]: 
0    NaN
1     bb
2     cc
3     dd
Length: 4, dtype: object

In [63]: s.str.cat(t, join='left', na_rep='-')
Out[63]: 
0    a-
1    bb
2    cc
3    dd
Length: 4, dtype: object

此外， Series.str.cat() 现在为 CategoricalIndex 也是如此(之前引发了 ValueError ；请参阅 GH20842 )。

`DataFrame.astype` performs column-wise conversion to `Categorical`#

DataFrame.astype() 现在可以执行按列转换为 Categorical 通过提供字符串 'category' 或者是 CategoricalDtype 。以前，尝试这样做会引发 NotImplementedError 。请参阅对象创建部分获取更多详细信息和示例。 (GH12860 ， GH18099 )

提供字符串 'category' 执行按列转换，只有标签出现在给定列集中作为类别：

In [64]: df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')})

In [65]: df = df.astype('category')

In [66]: df['A'].dtype
Out[66]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)

In [67]: df['B'].dtype
Out[67]: CategoricalDtype(categories=['b', 'c', 'd'], ordered=False)

提供一个 CategoricalDtype 将使每列中的类别与提供的数据类型一致：

In [68]: from pandas.api.types import CategoricalDtype

In [69]: df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')})

In [70]: cdt = CategoricalDtype(categories=list('abcd'), ordered=True)

In [71]: df = df.astype(cdt)

In [72]: df['A'].dtype
Out[72]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)

In [73]: df['B'].dtype
Out[73]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)

其他增强功能#

一元 + 现在允许 Series 和 DataFrame 作为数值运算符 (GH16073 )
更好地支持 to_excel() 的输出。 xlsxwriter 引擎。 (GH16149 )
pandas.tseries.frequencies.to_offset() 现在接受前导‘+’符号，例如‘+1h’。 (GH18171 )
MultiIndex.unique() 现在支持 level= 参数，以从特定索引级别获取唯一值 (GH17896 )
pandas.io.formats.style.Styler 现在有了方法 hide_index() 确定是否在输出中呈现索引 (GH14194 )
pandas.io.formats.style.Styler 现在有了方法 hide_columns() 确定是否在输出中隐藏列 (GH14194 )
改进的措辞 ValueError 成长于 to_datetime() 什么时候 unit= 是使用不可转换的值传递的 (GH14350 )
Series.fillna() 现在接受系列片或词典作为 value 对于绝对数据类型 (GH17033 )
pandas.read_clipboard() 更新为使用qtpy，回退到PyQt5，然后是PyQt4，添加了与Python3和多个python-qt绑定的兼容性 (GH17722 )
改进的措辞 ValueError 成长于 read_csv() 当 usecols 参数不能匹配所有列。 (GH17301 )
DataFrame.corrwith() 现在，当传递Series时，会自动删除非数字列。在此之前，引发了一个异常 (GH18570 )。
IntervalIndex 现在支持时区感知 Interval 对象 (GH18537 ， GH18538 )
Series() / DataFrame() 制表符补齐还返回 MultiIndex() 。 (GH16326 )
read_excel() 已经获得了 nrows 参数 (GH16645 )
DataFrame.append() 现在可以在更多情况下保留调用DataFrame的列的类型(例如，如果两者都是 CategoricalIndex ) (GH18359 )
DataFrame.to_json() 和 Series.to_json() 现在接受一个 index 参数，允许用户从JSON输出中排除索引 (GH17394 )
IntervalIndex.to_tuples() 已经获得了 na_tuple 参数来控制NA是作为NA的元组返回，还是作为NA本身返回 (GH18756 )
Categorical.rename_categories ， CategoricalIndex.rename_categories 和 Series.cat.rename_categories 现在可以将Callable作为他们的参数 (GH18862 )
Interval 和 IntervalIndex 已经获得了一个 length 属性 (GH18789 )
Resampler 对象现在具有一个功能 pipe 方法。以前，调用 pipe 被分流到 mean 方法 (GH17905 )。
is_scalar() 现在返回 True 为 DateOffset 对象 (GH18943 )。
DataFrame.pivot() 现在接受 values= 科瓦格 (GH17160 )。
已添加 pandas.api.extensions.register_dataframe_accessor() ， pandas.api.extensions.register_series_accessor() ，以及 pandas.api.extensions.register_index_accessor() ，用于Pandas下游的库的访问器来注册自定义访问器，如 .cat 关于Pandas的物品。看见 Registering Custom Accessors 了解更多信息 (GH14781 )。
IntervalIndex.astype now supports conversions between subtypes when passed an IntervalDtype (GH19197)
IntervalIndex 及其关联的构造函数方法 (from_arrays ， from_breaks ， from_tuples )已经获得了 dtype 参数 (GH19262 )
Added pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing() and pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing() (GH17015)
对于子类化 DataFrames ， DataFrame.apply() 现在将保留 Series 将数据传递给应用函数时的子类(如果已定义 (GH19822 )
DataFrame.from_dict() 现在接受 columns 参数，该参数可用于在以下情况下指定列名 orient='index' vt.使用 (GH18529 )
添加了选项 display.html.use_mathjax 所以 MathJax 在中呈现表时可以禁用 Jupyter 笔记本电脑 (GH19856 ， GH19824 )
DataFrame.replace() now supports the method parameter, which can be used to specify the replacement method when to_replace is a scalar, list or tuple and value is None (GH19632)
Timestamp.month_name() ， DatetimeIndex.month_name() ，以及 Series.dt.month_name() 现已推出 (GH12805 )
Timestamp.day_name() 和 DatetimeIndex.day_name() 现在可以返回具有指定区域设置的日期名称 (GH12806 )
DataFrame.to_sql() 现在，如果基础连接支持ITK，则执行多值插入，而不是逐行插入。 SQLAlchemy 支持多值插入的方言包括： mysql ， postgresql ， sqlite 和任何带有 supports_multivalues_insert 。 (GH14315 ， GH8953 )
read_html() 现在接受 displayed_only 关键字参数控制是否解析隐藏元素 (True 默认情况下) (GH20027 )
read_html() 现在读取所有 <tbody> 元素中的元素 <table> ，而不仅仅是第一个。 (GH20690 )
quantile() 和 quantile() 现在接受 interpolation 关键字， linear 默认情况下， (GH20497 )
通过以下方式支持Zip压缩 compression=zip 在……里面 DataFrame.to_pickle() ， Series.to_pickle() ， DataFrame.to_csv() ， Series.to_csv() ， DataFrame.to_json() ， Series.to_json() 。 (GH17778 )
WeekOfMonth constructor now supports n=0 (GH20517).
DataFrame 和 Series 现在支持矩阵乘法 (@ )运算符 (GH10259 )>=3.5
已更新 DataFrame.to_gbq() 和 pandas.read_gbq() 签名和文档，以反映Pandas-GBQ库0.4.0版的更改。将intersphinx映射添加到pandas-gbq库。 (GH20564 )
在版本117中添加了用于导出Stata DTA文件的新编写器， StataWriter117 。此格式支持导出最长为2,000,000个字符的字符串 (GH16450 )
to_hdf() 和 read_hdf() 现在接受一个 errors 用于控制编码错误处理的关键字参数 (GH20835 )
cut() 已经获得了 duplicates='raise'|'drop' 用于控制是否在重复边上升高的选项 (GH20947 )
date_range() ， timedelta_range() ，以及 interval_range() 现在返回一个线性间隔的索引，如果 start ， stop ，以及 periods 是指定的，但是 freq 不是的。 (GH20808 ， GH20983 ， GH20976 )

向后不兼容的API更改#

依赖项增加了最低版本#

我们已经更新了依赖项的最低支持版本 (GH15184 )。如果已安装，我们现在需要：

套餐	最低版本	必填项	问题
Python-Dateutil	2.5.0	X	GH15184
OpenPyxl	2.4.0		GH15184
美味可口的汤	4.2.1		GH20082
安装工具	24.2.0		GH20698

从词典实例化保留了Python3.6+的词典插入顺序#

在Python3.6之前，Python中的词典没有正式定义的顺序。对于Python3.6版和更高版本，词典按插入顺序排序，请参见 PEP 468 。Pandas将使用词典的插入顺序，当创建 Series 或 DataFrame 并且您使用的是版本3.6或更高版本的Python。 (GH19884 )

以前的行为(如果在Python<3.6上，则为当前行为)：

In [16]: pd.Series({'Income': 2000,
   ....:            'Expenses': -1500,
   ....:            'Taxes': -200,
   ....:            'Net result': 300})
Out[16]:
Expenses     -1500
Income        2000
Net result     300
Taxes         -200
dtype: int64

注意：以上系列按索引值的字母顺序排序。

新行为(对于Python>=3.6)：

In [74]: pd.Series({'Income': 2000,
   ....:            'Expenses': -1500,
   ....:            'Taxes': -200,
   ....:            'Net result': 300})
   ....: 
Out[74]: 
Income        2000
Expenses     -1500
Taxes         -200
Net result     300
Length: 4, dtype: int64

请注意，系列现在是按插入顺序排序的。这一新行为适用于所有相关类型的Pandas (Series ， DataFrame ， SparseSeries 和 SparseDataFrame )。

如果您希望在使用Python>=3.6时保留旧行为，则可以使用 .sort_index() ：

In [75]: pd.Series({'Income': 2000,
   ....:            'Expenses': -1500,
   ....:            'Taxes': -200,
   ....:            'Net result': 300}).sort_index()
   ....: 
Out[75]: 
Expenses     -1500
Income        2000
Net result     300
Taxes         -200
Length: 4, dtype: int64

弃用面板#

Panel was deprecated in the 0.20.x release, showing as a DeprecationWarning. Using Panel will now show a FutureWarning. The recommended way to represent 3-D data are with a MultiIndex on a DataFrame via the to_frame() or with the xarray package 。Pandas提供了一种 to_xarray() 方法自动执行此转换。 (GH13563 ， GH18324 )。

In [75]: import pandas._testing as tm

In [76]: p = tm.makePanel()

In [77]: p
Out[77]:
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 3 (major_axis) x 4 (minor_axis)
Items axis: ItemA to ItemC
Major_axis axis: 2000-01-03 00:00:00 to 2000-01-05 00:00:00
Minor_axis axis: A to D

转换为多索引数据帧

In [78]: p.to_frame()
Out[78]:
                     ItemA     ItemB     ItemC
major      minor
2000-01-03 A      0.469112  0.721555  0.404705
           B     -1.135632  0.271860 -1.039268
           C      0.119209  0.276232 -1.344312
           D     -2.104569  0.113648 -0.109050
2000-01-04 A     -0.282863 -0.706771  0.577046
           B      1.212112 -0.424972 -0.370647
           C     -1.044236 -1.087401  0.844885
           D     -0.494929 -1.478427  1.643563
2000-01-05 A     -1.509059 -1.039575 -1.715002
           B     -0.173215  0.567020 -1.157892
           C     -0.861849 -0.673690  1.075770
           D      1.071804  0.524988 -1.469388

[12 rows x 3 columns]

转换为XARRAY数据数组

In [79]: p.to_xarray()
Out[79]:
<xarray.DataArray (items: 3, major_axis: 3, minor_axis: 4)>
array([[[ 0.469112, -1.135632,  0.119209, -2.104569],
        [-0.282863,  1.212112, -1.044236, -0.494929],
        [-1.509059, -0.173215, -0.861849,  1.071804]],

       [[ 0.721555,  0.27186 ,  0.276232,  0.113648],
        [-0.706771, -0.424972, -1.087401, -1.478427],
        [-1.039575,  0.56702 , -0.67369 ,  0.524988]],

       [[ 0.404705, -1.039268, -1.344312, -0.10905 ],
        [ 0.577046, -0.370647,  0.844885,  1.643563],
        [-1.715002, -1.157892,  1.07577 , -1.469388]]])
Coordinates:
  * items       (items) object 'ItemA' 'ItemB' 'ItemC'
  * major_axis  (major_axis) datetime64[ns] 2000-01-03 2000-01-04 2000-01-05
  * minor_axis  (minor_axis) object 'A' 'B' 'C' 'D'

Pandas.core.常见删除#

The following error & warning messages are removed from pandas.core.common (GH13634, GH19769):

PerformanceWarning
UnsupportedFunctionCall
UnsortedIndexError
AbstractMethodError

可从以下位置导入 pandas.errors (自0.19.0起)。

要输出的更改 `DataFrame.apply` 始终如一#

DataFrame.apply() 在应用任意用户定义的函数时不一致，该函数返回类似于 axis=1 。解决了几个错误和不一致问题。如果应用的函数返回Series，则Pandas将返回DataFrame；否则将返回Series，这包括类似列表的(例如， tuple 或 list 已返回) (GH16353 ， GH17437 ， GH17970 ， GH17348 ， GH17892 ， GH18573 ， GH17602 ， GH18775 ， GH18901 ， GH18919 )。

In [76]: df = pd.DataFrame(np.tile(np.arange(3), 6).reshape(6, -1) + 1,
   ....:                   columns=['A', 'B', 'C'])
   ....: 

In [77]: df
Out[77]: 
   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3
4  1  2  3
5  1  2  3

[6 rows x 3 columns]

以前的行为：如果返回的形状恰好与原始列的长度匹配，则将返回一个 DataFrame 。如果返回形状不匹配，则引发 Series 返回带有列表的。

In [3]: df.apply(lambda x: [1, 2, 3], axis=1)
Out[3]:
   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3
4  1  2  3
5  1  2  3

In [4]: df.apply(lambda x: [1, 2], axis=1)
Out[4]:
0    [1, 2]
1    [1, 2]
2    [1, 2]
3    [1, 2]
4    [1, 2]
5    [1, 2]
dtype: object

新的行为：当应用的函数返回一个类似列表的函数时，现在将始终返回一个 Series 。

In [78]: df.apply(lambda x: [1, 2, 3], axis=1)
Out[78]: 
0    [1, 2, 3]
1    [1, 2, 3]
2    [1, 2, 3]
3    [1, 2, 3]
4    [1, 2, 3]
5    [1, 2, 3]
Length: 6, dtype: object

In [79]: df.apply(lambda x: [1, 2], axis=1)
Out[79]: 
0    [1, 2]
1    [1, 2]
2    [1, 2]
3    [1, 2]
4    [1, 2]
5    [1, 2]
Length: 6, dtype: object

要展开列，您可以使用 result_type='expand'

In [80]: df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')
Out[80]: 
1  2
1  2  3
1  2  3
1  2  3
1  2  3
1  2  3
1  2  3

[6 rows x 3 columns]

要跨原始列广播结果(具有正确长度的List-like的旧行为)，您可以使用 result_type='broadcast' 。形状必须与原始列匹配。

In [81]: df.apply(lambda x: [1, 2, 3], axis=1, result_type='broadcast')
Out[81]: 
   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3
4  1  2  3
5  1  2  3

[6 rows x 3 columns]

返回一个 Series 允许用户控制确切的返回结构和列名：

In [82]: df.apply(lambda x: pd.Series([1, 2, 3], index=['D', 'E', 'F']), axis=1)
Out[82]: 
   D  E  F
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3
4  1  2  3
5  1  2  3

[6 rows x 3 columns]

串联将不再排序#

在未来的Pandas版本中 pandas.concat() 当非串联轴尚未对齐时，将不再对其进行排序。当前行为与以前的行为相同(排序)，但现在在以下情况下发出警告 sort 未指定，并且非串联轴未对齐 (GH4588 )。

In [83]: df1 = pd.DataFrame({"a": [1, 2], "b": [1, 2]}, columns=['b', 'a'])

In [84]: df2 = pd.DataFrame({"a": [4, 5]})

In [85]: pd.concat([df1, df2])
Out[85]: 
     b  a
0  1.0  1
1  2.0  2
0  NaN  4
1  NaN  5

[4 rows x 2 columns]

要保留以前的行为(排序)并使警告静默，请传递 sort=True

In [86]: pd.concat([df1, df2], sort=True)
Out[86]: 
   a    b
0  1  1.0
1  2  2.0
0  4  NaN
1  5  NaN

[4 rows x 2 columns]

要接受将来的行为(不排序)，请传递 sort=False

请注意，此更改也适用于 DataFrame.append() ，它还收到了一个 sort 用于控制此行为的关键字。

构建更改#

Building pandas for development now requires cython >= 0.24 (GH18613)
Building from source now explicitly requires setuptools in setup.py (GH18113)
更新了Conda配方以符合Conda-Build 3.0+ (GH18002 )

索引除以零可以正确填充#

除法运算 Index 子类现在将用以下内容填充正数除以零 np.inf ，负数除以零用 -np.inf 和 0 / 0 使用 np.nan 。这与现有的匹配 Series 行为举止。 (GH19322 ， GH19347 )

以前的行为：

In [6]: index = pd.Int64Index([-1, 0, 1])

In [7]: index / 0
Out[7]: Int64Index([0, 0, 0], dtype='int64')

# Previous behavior yielded different results depending on the type of zero in the divisor
In [8]: index / 0.0
Out[8]: Float64Index([-inf, nan, inf], dtype='float64')

In [9]: index = pd.UInt64Index([0, 1])

In [10]: index / np.array([0, 0], dtype=np.uint64)
Out[10]: UInt64Index([0, 0], dtype='uint64')

In [11]: pd.RangeIndex(1, 5) / 0
ZeroDivisionError: integer division or modulo by zero

当前行为：

In [12]: index = pd.Int64Index([-1, 0, 1])
# division by zero gives -infinity where negative,
# +infinity where positive, and NaN for 0 / 0
In [13]: index / 0

# The result of division by zero should not depend on
# whether the zero is int or float
In [14]: index / 0.0

In [15]: index = pd.UInt64Index([0, 1])
In [16]: index / np.array([0, 0], dtype=np.uint64)

In [17]: pd.RangeIndex(1, 5) / 0

从字符串中提取匹配模式#

默认情况下，从字符串中提取匹配模式 str.extract() 用于返回一个 Series 如果正在提取单个组(a DataFrame 如果提取了一个以上的组)。截至大Pandas0.23.0 str.extract() 始终返回一个 DataFrame ，除非 expand 设置为 False 。最后， None 是可接受的 expand 参数(该参数相当于 False )，但现在引发了一个 ValueError 。 (GH11386 )

以前的行为：

In [1]: s = pd.Series(['number 10', '12 eggs'])

In [2]: extracted = s.str.extract(r'.*(\d\d).*')

In [3]: extracted
Out [3]:
0    10
1    12
dtype: object

In [4]: type(extracted)
Out [4]:
pandas.core.series.Series

新行为：

In [87]: s = pd.Series(['number 10', '12 eggs'])

In [88]: extracted = s.str.extract(r'.*(\d\d).*')

In [89]: extracted
Out[89]: 
    0
0  10
1  12

[2 rows x 1 columns]

In [90]: type(extracted)
Out[90]: pandas.core.frame.DataFrame

要恢复以前的行为，只需设置 expand 至 False ：

In [91]: s = pd.Series(['number 10', '12 eggs'])

In [92]: extracted = s.str.extract(r'.*(\d\d).*', expand=False)

In [93]: extracted
Out[93]: 
0    10
1    12
Length: 2, dtype: object

In [94]: type(extracted)
Out[94]: pandas.core.series.Series

的缺省值 `ordered` 的参数 `CategoricalDtype`#

The default value of the ordered parameter for CategoricalDtype has changed from False to None to allow updating of categories without impacting ordered. Behavior should remain consistent for downstream objects, such as Categorical (GH18790)

在以前的版本中， ordered 参数为 False 。这可能会导致 ordered 参数被无意地从 True 至 False 当用户尝试更新时 categories 如果 ordered 未显式指定，因为它将默认默认为 False 。的新行为 ordered=None 就是保留现有的价值 ordered 。

新行为：

In [2]: from pandas.api.types import CategoricalDtype

In [3]: cat = pd.Categorical(list('abcaba'), ordered=True, categories=list('cba'))

In [4]: cat
Out[4]:
[a, b, c, a, b, a]
Categories (3, object): [c < b < a]

In [5]: cdt = CategoricalDtype(categories=list('cbad'))

In [6]: cat.astype(cdt)
Out[6]:
[a, b, c, a, b, a]
Categories (4, object): [c < b < a < d]

请注意，在上面的示例中，转换的 Categorical 保留了 ordered=True 。的缺省值为 ordered 保持为 False ，皈依者 Categorical 会变得无序，尽管 ordered=False 从来没有明确规定过。改变…的价值 ordered 显式地将其传递给新的dtype，例如 CategoricalDtype(categories=list('cbad'), ordered=False) 。

请注意，无意中将 ordered 上述讨论在以前的版本中没有出现，因为单独的错误阻止了 astype 从进行任何类型的类别到类别转换 (GH10696 ， GH18593 )。这些错误已在此版本中得到修复，并主动更改了 ordered 。

在终端中更好地美化打印DataFrames#

以前，最大列数的缺省值为 pd.options.display.max_columns=20 。这意味着相对较宽的数据框不适合终端宽度，Pandas将引入换行符来显示这20列。这导致了一个相对难以阅读的输出：

如果在终端中运行Python，现在会自动确定最大列数，以便打印的数据框适合当前终端宽度 (pd.options.display.max_columns=0 ) (GH17023 )。如果Python作为Jupyter内核(如Jupyter QtConole或Jupyter笔记本，以及在许多IDE中)运行，则无法自动推断此值，因此将其设置为 20 与以前的版本一样。在终端中，这会产生更好的输出：

请注意，如果您不喜欢新的默认设置，您始终可以自己设置此选项。要恢复到旧设置，可以运行下面的代码行：

pd.options.display.max_columns = 20

类似DateTimeliAPI的更改#

默认设置 Timedelta 构造函数现在接受 ISO 8601 Duration 字符串作为参数 (GH19040 )
Subtracting NaT from a Series with dtype='datetime64[ns]' returns a Series with dtype='timedelta64[ns]' instead of dtype='datetime64[ns]' (GH18808)
Addition or subtraction of NaT from TimedeltaIndex will return TimedeltaIndex instead of DatetimeIndex (GH19124)
DatetimeIndex.shift() and TimedeltaIndex.shift() will now raise NullFrequencyError (which subclasses ValueError, which was raised in older versions) when the index object frequency is None (GH19147)
Addition and subtraction of NaN from a Series with dtype='timedelta64[ns]' will raise a TypeError instead of treating the NaN as NaT (GH19274)
NaT 除法与 datetime.timedelta 现在就会回来 NaN 与其提高 (GH17876 )
Operations between a Series with dtype dtype='datetime64[ns]' and a PeriodIndex will correctly raises TypeError (GH18850)
Subtraction of Series with timezone-aware dtype='datetime64[ns]' with mismatched timezones will raise TypeError instead of ValueError (GH18817)
Timestamp 将不再静默忽略未使用或无效 tz 或 tzinfo 关键字参数 (GH17690 )
Timestamp 将不再默默忽略无效 freq 论据 (GH5168 )
CacheableOffset 和 WeekDay 中不再提供 pandas.tseries.offsets 模块 (GH17830 )
pandas.tseries.frequencies.get_freq_group() 和 pandas.tseries.frequencies.DAYS 从公共API中移除 (GH18034 )
Series.truncate() and DataFrame.truncate() will raise a ValueError if the index is not sorted instead of an unhelpful KeyError (GH17935)
Series.first and DataFrame.first will now raise a TypeError rather than NotImplementedError when index is not a DatetimeIndex (GH20725).
Series.last and DataFrame.last will now raise a TypeError rather than NotImplementedError when index is not a DatetimeIndex (GH20725).
受限 DateOffset 关键字参数。以前， DateOffset 子类允许任意关键字参数，这可能导致意外行为。现在，只有有效的论点才会被接受。 (GH17176 ， GH18226 )。
pandas.merge() 提供了在尝试合并支持时区的列和支持时区的初级列时的更详细的错误消息 (GH15800 )
For DatetimeIndex and TimedeltaIndex with freq=None, addition or subtraction of integer-dtyped array or Index will raise NullFrequencyError instead of TypeError (GH19895)
Timestamp 构造函数现在接受 nanosecond 关键字或位置参数 (GH18898 )
DatetimeIndex 现在将引发一个 AttributeError 当 tz 属性是在实例化之后设置的 (GH3746 )
DatetimeIndex 使用一个 pytz 时区现在将返回一致的 pytz 时区 (GH18595 )

其他API更改#

Series.astype() and Index.astype() with an incompatible dtype will now raise a TypeError rather than a ValueError (GH18231)
Series 使用一个 object Dtype支持TZ的DateTime和 dtype=object ，现在将返回一个 object 数据类型 Series 之前，这将推断DateTime dtype (GH18231 )
A Series 的 dtype=category 从一个空的 dict 现在将具有以下类别 dtype=object 而不是 dtype=float64 ，与传递空列表的情况一致 (GH18515 )
All-NaN levels in a MultiIndex are now assigned float rather than object dtype, promoting consistency with Index (GH17929).
Levels names of a MultiIndex (when not None) are now required to be unique: trying to create a MultiIndex with repeated names will raise a ValueError (GH18872)
Both construction and renaming of Index/MultiIndex with non-hashable name/names will now raise TypeError (GH20527)
Index.map() 现在可以接受 Series 和字典输入对象 (GH12756 ， GH18482 ， GH18509 )。
DataFrame.unstack() 现在将默认为填充 np.nan 为 object 柱子。 (GH12815 )
IntervalIndex 构造函数将在 closed 参数与推断关闭输入数据的方式冲突 (GH18421 )
将缺失值插入索引将适用于所有类型的索引，并自动插入正确类型的缺失值 (NaN ， NaT (等)不考虑传入的类型 (GH18295 )
当使用重复标签创建时， MultiIndex 现在引发一个 ValueError 。 (GH17464 )
Series.fillna() now raises a TypeError instead of a ValueError when passed a list, tuple or DataFrame as a value (GH18293)
pandas.DataFrame.merge() 不再使用 float 列到 object 在合并时 int 和 float 列 (GH16572 )
pandas.merge() 现在引发一个 ValueError 尝试合并不兼容的数据类型时 (GH9780 )
The default NA value for UInt64Index has changed from 0 to NaN, which impacts methods that mask with NA, such as UInt64Index.where() (GH18398)
重构 setup.py 要使用 find_packages 而不是显式列出所有子程序包 (GH18535 )
Rearranged the order of keyword arguments in read_excel() to align with read_csv() (GH16672)
wide_to_long() 以前保留的类似数字的后缀为 object 数据类型。现在，如果可能，它们被强制转换为数字 (GH17627 )
在……里面 read_excel() ，即 comment 参数现在公开为命名参数 (GH18735 )
Rearranged the order of keyword arguments in read_excel() to align with read_csv() (GH16672)
The options html.border and mode.use_inf_as_null were deprecated in prior versions, these will now show FutureWarning rather than a DeprecationWarning (GH19003)
IntervalIndex 和 IntervalDtype 不再支持分类、对象和字符串子类型 (GH19016 )
IntervalDtype 现在返回 True 与之相比 'interval' 与子类型无关，以及 IntervalDtype.name 现在返回 'interval' 不管是什么子类型 (GH18980 )
KeyError 现在加薪而不是 ValueError 在……里面 drop() ， drop() ， drop() ， drop() 在具有重复项的轴上放置不存在的元素时 (GH19186 )
Series.to_csv() now accepts a compression argument that works in the same way as the compression argument in DataFrame.to_csv() (GH18958)
Set operations (union, difference...) on IntervalIndex with incompatible index types will now raise a TypeError rather than a ValueError (GH19329)
DateOffset objects render more simply, e.g. <DateOffset: days=1> instead of <DateOffset: kwds={'days': 1}> (GH19403)
Categorical.fillna now validates its value and method keyword arguments. It now raises when both or none are specified, matching the behavior of Series.fillna() (GH19682)
pd.to_datetime('today') 现在返回一个日期时间，与 pd.Timestamp('today') ；以前 pd.to_datetime('today') 返回了一个 .normalized() 日期时间 (GH19935 )
Series.str.replace() 现在使用可选的 regex 关键字，当设置为 False ，使用文字字符串替换而不是正则表达式替换 (GH16808 )
DatetimeIndex.strftime() 和 PeriodIndex.strftime() 现在返回一个 Index 而不是用数字数组来与类似的访问器保持一致 (GH20127 )
如果指定了更长的索引，则从长度为1的列表构建序列不再广播此列表 (GH19714 ， GH20391 )。
DataFrame.to_dict() 使用 orient='index' 不再将仅具有整型和浮点型列的DataFrame的整型列强制转换为浮点型 (GH18580 )
传递给的用户定义的函数 Series.rolling().aggregate() ， DataFrame.rolling().aggregate() ，或其不断扩大的表亲，现在将始终被传给一位 Series ，而不是 np.array ； .apply() 只有 raw 关键字，请参见 here 。这与他的签名一致。 .aggregate() 横跨大Pandas (GH20584 )
滚压式和膨胀式加高 NotImplementedError 在迭代时 (GH11704 )。

不推荐使用#

Series.from_array 和 SparseSeries.from_array 都已被弃用。使用普通构造函数 Series(..) 和 SparseSeries(..) 取而代之的是 (GH18213 )。
DataFrame.as_matrix 已弃用。使用 DataFrame.values 取而代之的是 (GH18458 )。
Series.asobject ， DatetimeIndex.asobject ， PeriodIndex.asobject 和 TimeDeltaIndex.asobject 已经被弃用了。使用 .astype(object) 取而代之的是 (GH18572 )
按键的元组分组现在会发出一个 FutureWarning 并且已被弃用。将来，一个元组传递给 'by' 将始终引用作为实际元组的单个键，而不是将该元组视为多个键。要保留以前的行为，请使用列表而不是元组 (GH18314 )
Series.valid 已弃用。使用 Series.dropna() 取而代之的是 (GH18800 )。
read_excel() 已不推荐使用 skip_footer 参数。使用 skipfooter 取而代之的是 (GH18836 )
ExcelFile.parse() has deprecated sheetname in favor of sheet_name for consistency with read_excel() (GH20920).
这个 is_copy 属性已弃用，并将在未来版本中删除 (GH18801 )。
IntervalIndex.from_intervals 不推荐使用，而支持 IntervalIndex 构造函数 (GH19263 )
DataFrame.from_items 已弃用。使用 DataFrame.from_dict() 相反，或者 DataFrame.from_dict(OrderedDict()) 如果您希望保留键顺序 (GH17320 ， GH17312 )
索引为 MultiIndex 或者是 FloatIndex 包含一些丢失密钥的列表，现在将显示一个 FutureWarning ，这与其他类型的索引一致 (GH17758 )。
The broadcast parameter of .apply() is deprecated in favor of result_type='broadcast' (GH18577)
The reduce parameter of .apply() is deprecated in favor of result_type='reduce' (GH18577)
这个 order 的参数 factorize() 已弃用，并将在将来的版本中删除 (GH19727 )
Timestamp.weekday_name, DatetimeIndex.weekday_name, and Series.dt.weekday_name are deprecated in favor of Timestamp.day_name(), DatetimeIndex.day_name(), and Series.dt.day_name() (GH12806)
pandas.tseries.plotting.tsplot 已弃用。使用 Series.plot() 取而代之的是 (GH18627 )
Index.summary() 已弃用，并将在将来的版本中删除 (GH18217 )
NDFrame.get_ftype_counts() 已弃用，并将在将来的版本中删除 (GH18243 )
The convert_datetime64 parameter in DataFrame.to_records() has been deprecated and will be removed in a future version. The NumPy bug motivating this parameter has been resolved. The default value for this parameter has also changed from True to None (GH18160).
Series.rolling().apply() ， DataFrame.rolling().apply() ， Series.expanding().apply() ，以及 DataFrame.expanding().apply() 已弃用传递 np.array 默认情况下。一个人需要通过新的 raw 参数明确表示传递的内容 (GH20584 )
这个 data ， base ， strides ， flags 和 itemsize 的属性 Series 和 Index 类已弃用，并将在未来版本中删除 (GH20419 )。
DatetimeIndex.offset 已弃用。使用 DatetimeIndex.freq 取而代之的是 (GH20716 )
整数ndarray和 Timedelta 已弃用。除以 Timedelta.value 取而代之的是 (GH19761 )
设置 PeriodIndex.freq (不能保证它能正常工作)已被弃用。使用 PeriodIndex.asfreq() 取而代之的是 (GH20678 )
Index.get_duplicates() 已弃用，并将在将来的版本中删除 (GH20239 )
The previous default behavior of negative indices in Categorical.take is deprecated. In a future version it will change from meaning missing values to meaning positional indices from the right. The future behavior is consistent with Series.take() (GH20664).
将多个轴传递给 axis 中的参数 DataFrame.dropna() 已被弃用，并将在未来版本中删除 (GH20987 )

删除先前版本的弃用/更改#

对过时用法的警告 Categorical(codes, categories) ，例如，当前两个参数为 Categorical() 具有不同的数据类型，并建议使用 Categorical.from_codes ，现已被移除 (GH8074 )
这个 levels 和 labels 对象的属性 MultiIndex 不能再直接设置 (GH4039 )。
pd.tseries.util.pivot_annual 已被删除(从0.19版起不推荐使用)。使用 pivot_table 取而代之的是 (GH18370 )
pd.tseries.util.isleapyear 已被删除(从0.19版起不推荐使用)。使用 .is_leap_year DateTime-Like中的 (GH18370 )
pd.ordered_merge 已被删除(从0.19版起不推荐使用)。使用 pd.merge_ordered 取而代之的是 (GH18459 )
这个 SparseList 类已被删除 (GH14007 )
这个 pandas.io.wb 和 pandas.io.data 存根模块已移除 (GH13735 )
Categorical.from_array 已被删除 (GH13854 )
这个 freq and how parameters have been removed from the rolling/expanding /DataFrame和Series的``ewm`方法(从v0.18起不推荐使用)。相反，应在调用方法之前重新采样。 (:issue:`18601 & GH18668 )
DatetimeIndex.to_datetime ， Timestamp.to_datetime ， PeriodIndex.to_datetime ，以及 Index.to_datetime 已被移除 (GH8254 ， GH14096 ， GH14113 )
read_csv() 已经放弃了 skip_footer 参数 (GH13386 )
read_csv() 已经放弃了 as_recarray 参数 (GH13373 )
read_csv() 已经放弃了 buffer_lines 参数 (GH13360 )
read_csv() 已经放弃了 compact_ints 和 use_unsigned 参数 (GH13323 )
The Timestamp class has dropped the offset attribute in favor of freq (GH13593)
这个 Series ， Categorical ，以及 Index 班级已经放弃了 reshape 方法 (GH13012 )
pandas.tseries.frequencies.get_standard_freq has been removed in favor of pandas.tseries.frequencies.to_offset(freq).rule_code (GH13874)
The freqstr keyword has been removed from pandas.tseries.frequencies.to_offset in favor of freq (GH13874)
这个 Panel4D 和 PanelND 类已被删除 (GH13776 )
这个 Panel 类已经删除了 to_long 和 toLong 方法： (GH19077 )
选项 display.line_with 和 display.height 被删除，以支持 display.width 和 display.max_rows 分别 (GH4391 ， GH19107 )
The labels attribute of the Categorical class has been removed in favor of Categorical.codes (GH7768)
这个 flavor 参数已从 to_sql() 方法 (GH13611 )
模块 pandas.tools.hashing 和 pandas.util.hashing 已被移除 (GH16223 )
The top-level functions pd.rolling_*, pd.expanding_* and pd.ewm* have been removed (Deprecated since v0.18). Instead, use the DataFrame/Series methods rolling, expanding and ewm (GH18723)
从以下位置导入 pandas.core.common 对于功能，如 is_datetime64_dtype 现在被移除了。这些文件位于 pandas.api.types 。 (GH13634 ， GH19769 )
The infer_dst keyword in Series.tz_localize(), DatetimeIndex.tz_localize() and DatetimeIndex have been removed. infer_dst=True is equivalent to ambiguous='infer', and infer_dst=False to ambiguous='raise' (GH7963).
什么时候 .resample() 从急切的操作变成了懒惰的操作，就像 .groupby() 在v0.18.0中，我们实现了兼容性(具有 FutureWarning )，因此业务将继续工作。它现在已完全删除，因此 Resampler 将不再转发Compat运营 (GH20554 )
Remove long deprecated axis=None parameter from .replace() (GH20271)

性能改进#

启用索引器 Series 或 DataFrame 不再创建参考循环 (GH17956 )
添加了关键字参数， cache ，至 to_datetime() 这提高了转换重复日期时间参数的性能 (GH11665 )
DateOffset 提高了运算性能 (GH18218 )
转换为 Series 的 Timedelta 对象的天数、秒数等。通过底层方法的矢量化加快速度 (GH18092 )
改进的性能 .map() 使用一个 Series/dict 输入 (GH15081 )
被超越的 Timedelta 天、秒和微秒的属性已被移除，取而代之的是利用其内置的Python版本 (GH18242 )
Series 在某些情况下，构造会减少输入数据的副本数量 (GH17449 )
Improved performance of Series.dt.date() and DatetimeIndex.date() (GH18058)
Improved performance of Series.dt.time() and DatetimeIndex.time() (GH18461)
Improved performance of IntervalIndex.symmetric_difference() (GH18475)
改进的性能 DatetimeIndex 和 Series 商业月度和商业季度频率的算术运算 (GH18489 )
Series() / DataFrame() 制表符补全限制为100个值，以获得更好的性能。 (GH18587 )
改进的性能 DataFrame.median() 使用 axis=1 未安装瓶颈时 (GH16468 )
改进的性能 MultiIndex.get_loc() 对于大型索引，以降低小型索引的性能为代价 (GH18519 )
改进的性能 MultiIndex.remove_unused_levels() 当没有未使用的级别时，以性能降低为代价 (GH19289 )
改进的性能 Index.get_loc() 对于非唯一索引 (GH19478 )
改进了成对算法的性能 .rolling() 和 .expanding() 使用 .cov() 和 .corr() 运营 (GH17917 )
Improved performance of pandas.core.groupby.GroupBy.rank() (GH15779)
Improved performance of variable .rolling() on .min() and .max() (GH19521)
Improved performance of pandas.core.groupby.GroupBy.ffill() and pandas.core.groupby.GroupBy.bfill() (GH11296)
Improved performance of pandas.core.groupby.GroupBy.any() and pandas.core.groupby.GroupBy.all() (GH15435)
Improved performance of pandas.core.groupby.GroupBy.pct_change() (GH19165)
改进的性能 Series.isin() 在绝对数据类型的情况下 (GH20003 )
Improved performance of getattr(Series, attr) when the Series has certain index types. This manifested in slow printing of large Series with a DatetimeIndex (GH19764)
修复了以下项的性能回归 GroupBy.nth() 和 GroupBy.last() 带有一些对象列 (GH19283 )
Improved performance of pandas.core.arrays.Categorical.from_codes() (GH18501)

文档更改#

感谢所有参与了3月10日举行的Pandas文档冲刺活动的贡献者。我们有来自世界各地30多个地点的约500名参与者。您应该注意到，许多 API docstrings 都有了很大的改善。

有太多的同步贡献，不能为每个改进都包括一个版本说明，但是这个 GitHub search 应该会让你知道有多少个文件串被改进了。

特别感谢 Marc Garcia 组织这次冲刺。有关更多信息，请阅读 NumFOCUS blogpost 回顾一下冲刺的过程。

将“NumPy”的拼写更改为“NumPy”，并将“python”的拼写更改为“Python”。 (GH19017 )
引入代码示例时的一致性，使用冒号或句点。为了更清晰，重写了一些句子，添加了对函数、方法和类的更动态引用。 (GH18941 ， GH18948 ， GH18973 ， GH19017 )
添加了对 DataFrame.assign() 在合并文档的连接部分中 (GH18665 )

错误修复#

直截了当的#

警告

在大Pandas0.21中引入了一类臭虫 CategoricalDtype 这会影响操作的正确性，如 merge ， concat ，并在比较多个无序时进行索引 Categorical 具有相同类别但顺序不同的数组。我们强烈建议在执行这些操作之前升级或手动对齐您的类别。

窃听 Categorical.equals 在比较两个无序的 Categorical 具有相同类别但顺序不同的数组 (GH16603 )
窃听 pandas.api.types.union_categoricals() 对于类别顺序不同的无序类别，返回错误结果。这影响了 pandas.concat() 使用分类数据 (GH19096 )。
窃听 pandas.merge() 在无序连接时返回错误结果 Categorical 具有相同的类别，但顺序不同 (GH19551 )
窃听 CategoricalIndex.get_indexer() 在以下情况下返回错误结果 target 是一个无序的 Categorical 具有相同类别的 self 但顺序不同 (GH19551 )
窃听 Index.astype() ，其中生成的索引不会转换为 CategoricalIndex 对于所有类型的索引 (GH18630 )
窃听 Series.astype() 和 Categorical.astype() 现有分类数据未更新的情况 (GH10696 ， GH18593 )
窃听 Series.str.split() 使用 expand=True 在空字符串上错误地引发IndexError (GH20002 )。
窃听 Index 的构造函数 dtype=CategoricalDtype(...) 哪里 categories 和 ordered 不会被维护 (GH19032 )
窃听 Series 具有标量和的构造函数 dtype=CategoricalDtype(...) 哪里 categories 和 ordered 不会被维护 (GH19565 )
窃听 Categorical.__iter__ 不转换为Python类型 (GH19909 )
窃听 pandas.factorize() 返回的唯一代码。 uniques 。现在，它返回一个 Categorical 具有与输入相同的数据类型 (GH19721 )
窃听 pandas.factorize() 中包含一项缺少值的项 uniques 返回值 (GH19721 )
窃听 Series.take() 使用分类数据解释 -1 在……里面 indices 作为缺失的值标记，而不是系列的最后一个元素 (GH20664 )

类似日期的#

窃听 Series.__sub__() 减去非纳秒 np.datetime64 对象中的 Series 给出了错误的结果 (GH7996 )
窃听 DatetimeIndex ， TimedeltaIndex 零维整数数组的加法和减法结果不正确 (GH19012 )
窃听 DatetimeIndex 和 TimedeltaIndex 其中，添加或减去类似数组的 DateOffset 所引发的对象 (np.array ， pd.Index )或广播不正确 (pd.Series ) (GH18849 )
窃听 Series.__add__() 添加具有数据类型的系列 timedelta64[ns] 转到支持时区的 DatetimeIndex 错误丢弃的时区信息 (GH13905 )
Adding a Period object to a datetime or Timestamp object will now correctly raise a TypeError (GH17983)
Bug in Timestamp where comparison with an array of Timestamp objects would result in a RecursionError (GH15183)
窃听 Series 按标量操作的楼层划分 timedelta 引发异常 (GH18846 )
窃听 DatetimeIndex REPR在一天结束时没有显示高精度的时间值(例如，23：59：59.999999999) (GH19030 )
窃听 .astype() 到非ns时间增量单位将包含不正确的数据类型 (GH19176 ， GH19223 ， GH12425 )
Bug in subtracting Series from NaT incorrectly returning NaT (GH19158)
Bug in Series.truncate() which raises TypeError with a monotonic PeriodIndex (GH17717)
窃听 pct_change() 使用 periods 和 freq 返回不同长度的输出 (GH7292 )
BUG与 DatetimeIndex 反对 None 或 datetime.date 对象提升 TypeError 为 == 和 != 比较而不是全部-``False``和ALL-``True` (GH19301 )
Bug in Timestamp and to_datetime() where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising OutOfBoundsDatetime (GH19382)
窃听 Timestamp.floor() DatetimeIndex.floor() 其中遥远的未来和过去的时间戳没有正确四舍五入 (GH19206 )
Bug in to_datetime() where passing an out-of-bounds datetime with errors='coerce' and utc=True would raise OutOfBoundsDatetime instead of parsing to NaT (GH19612)
窃听 DatetimeIndex 和 TimedeltaIndex 加法和减法，其中返回对象的名称设置不总是一致。 (GH19744 )
Bug in DatetimeIndex and TimedeltaIndex addition and subtraction where operations with numpy arrays raised TypeError (GH19847)
窃听 DatetimeIndex 和 TimedeltaIndex 其中设置了 freq 属性不完全受支持 (GH20678 )

Timedelta#

Bug in Timedelta.__mul__() where multiplying by NaT returned NaT instead of raising a TypeError (GH19819)
Bug in Series with dtype='timedelta64[ns]' where addition or subtraction of TimedeltaIndex had results cast to dtype='int64' (GH17250)
窃听 Series 使用 dtype='timedelta64[ns]' 其中加或减 TimedeltaIndex 可以返回一个 Series 名称不正确 (GH19043 )
窃听 Timedelta.__floordiv__() 和 Timedelta.__rfloordiv__() 被许多不兼容的Numy对象除以是错误的 (GH18846 )
用来划分标量类时间增量对象的错误 TimedeltaIndex 执行了倒数运算 (GH19125 )
Bug in TimedeltaIndex where division by a Series would return a TimedeltaIndex instead of a Series (GH19042)
Bug in Timedelta.__add__(), Timedelta.__sub__() where adding or subtracting a np.timedelta64 object would return another np.timedelta64 instead of a Timedelta (GH19738)
窃听 Timedelta.__floordiv__() ， Timedelta.__rfloordiv__() 其中，操作时使用 Tick 对象将引发 TypeError 而不是返回数值 (GH19738 )
窃听 Period.asfreq() 当周期接近的时候 datetime(1, 1, 1) 可能转换不正确 (GH19643 ， GH19834 )
Bug in Timedelta.total_seconds() causing precision errors, for example Timedelta('30S').total_seconds()==30.000000000000004 (GH19458)
Bug in Timedelta.__rmod__() where operating with a numpy.timedelta64 returned a timedelta64 object instead of a Timedelta (GH19820)
乘法 TimedeltaIndex 通过 TimedeltaIndex 现在将提高 TypeError 与其提高 ValueError 长度不匹配的情况下 (GH19333 )
Bug in indexing a TimedeltaIndex with a np.timedelta64 object which was raising a TypeError (GH20393)

时区#

在创建一个 Series 从同时包含Tz-naive和Tz-Aware值的数组中获取将导致 Series 其数据类型是TZ感知的，而不是对象 (GH16406 )
Bug in comparison of timezone-aware DatetimeIndex against NaT incorrectly raising TypeError (GH19276)
窃听 DatetimeIndex.astype() 在时区感知数据类型之间进行转换，并从时区感知转换为朴素 (GH18951 )
比较中出现错误 DatetimeIndex ，未能筹集到 TypeError 尝试比较支持时区的对象和类似时区的朴素的日期时间对象时 (GH18162 )
中简单的日期时间字符串的本地化错误 Series 具有一个 datetime64[ns, tz] 数据类型 (GH174151 )
Timestamp.replace() 现在将优雅地处理夏令时转换 (GH18319 )
TZ感知中的错误 DatetimeIndex 其中，使用 TimedeltaIndex 或数组 dtype='timedelta64[ns]' 是不正确的 (GH17558 )
窃听 DatetimeIndex.insert() 在哪里插入 NaT 错误地引发了时区感知索引 (GH16357 )
Bug in DataFrame constructor, where tz-aware Datetimeindex and a given column name will result in an empty DataFrame (GH19157)
窃听 Timestamp.tz_localize() 如果将时间戳本地化到最小或最大有效值附近，可能会溢出并返回具有不正确纳秒值的时间戳 (GH12677 )
迭代时出现错误 DatetimeIndex 使用固定时区偏移量进行本地化，将纳秒精度舍入到微秒 (GH19603 )
窃听 DataFrame.diff() 这引发了一个 IndexError 具有TZ感知值 (GH18578 )
窃听 melt() 将支持tz的dtype转换为tz-naive (GH15785 )
窃听 Dataframe.count() 这引发了一个 ValueError ，如果 Dataframe.dropna() 为具有时区感知值的单个列调用。 (GH13407 )

偏移#

窃听 WeekOfMonth 和 Week 其中加法和减法不能正确滚动 (GH18510 ， GH18672 ， GH18864 )
Bug in WeekOfMonth and LastWeekOfMonth where default keyword arguments for constructor raised ValueError (GH19142)
窃听 FY5253Quarter ， LastWeekOfMonth 其中回滚和前滚行为与加法和减法行为不一致 (GH18854 )
窃听 FY5253 哪里 datetime 年终日期的加法和减法增量不正确，但未归一化为午夜 (GH18854 )
窃听 FY5253 其中日期偏移量可能错误地引发 AssertionError 在算术运算中 (GH14774 )

数字#

窃听 Series 带有整型或浮点型列表的构造函数，其中指定 dtype=str ， dtype='str' 或 dtype='U' 无法将数据元素转换为字符串 (GH16605 )
窃听 Index 乘法和除法方法 Series 将返回一个 Index 对象，而不是 Series 对象 (GH19042 )
Bug in the DataFrame constructor in which data containing very large positive or very large negative numbers was causing OverflowError (GH18584)
Bug in Index constructor with dtype='uint64' where int-like floats were not coerced to UInt64Index (GH18400)
窃听 DataFrame Flex算法(例如 df.add(other, fill_value=foo) )，并具有 fill_value 除 None 募集失败 NotImplementedError 在边框或边角的情况下 other 长度为零 (GH19522 )
Multiplication and division of numeric-dtyped Index objects with timedelta-like scalars returns TimedeltaIndex instead of raising TypeError (GH19333)
Bug where NaN was returned instead of 0 by Series.pct_change() and DataFrame.pct_change() when fill_method is not None (GH19873)

字符串#

Bug in Series.str.get() with a dictionary in the values and the index not in the keys, raising KeyError (GH20671)

标引#

窃听 Index 从混合类型元组列表构造 (GH18505 )
窃听 Index.drop() 当传递元组和非元组的列表时 (GH18304 )
窃听 DataFrame.drop() ， Panel.drop() ， Series.drop() ， Index.drop() 哪里没有 KeyError 从包含重复项的轴中删除不存在的元素时引发 (GH19186 )
Bug in indexing a datetimelike Index that raised ValueError instead of IndexError (GH18386).
Index.to_series() 现在接受 index 和 name 科瓦格人 (GH18699 )
DatetimeIndex.to_series() 现在接受 index 和 name 科瓦格人 (GH18699 )
索引中的非标量值时出错 Series 具有非唯一性 Index 将返回展平的值 (GH17610 )
使用仅包含缺少键的迭代器进行索引时出现错误，这不会引发错误 (GH20748 )
修复了中的不一致问题 .ix 当索引具有整型数据类型且不包括所需的键时，在列表键和标量键之间 (GH20753 )
窃听 __setitem__ 编制索引时 DataFrame 使用二维布尔ndarray (GH18582 )
Bug in str.extractall when there were no matches empty Index was returned instead of appropriate MultiIndex (GH19034)
窃听 IntervalIndex 其中，根据构造方法，不一致地构造空的和纯NA数据 (GH18421 )
窃听 IntervalIndex.symmetric_difference() 其中非``IntervalIndex``的对称差未引发 (GH18475 )
窃听 IntervalIndex 其中，设置操作返回空值 IntervalIndex 具有错误的数据类型 (GH19101 )
Bug in DataFrame.drop_duplicates() where no KeyError is raised when passing in columns that don't exist on the DataFrame (GH19726)
窃听 Index 忽略意外关键字参数的子类构造函数 (GH19348 )
窃听 Index.difference() 当取差值时 Index 用它自己 (GH20040 )
窃听 DataFrame.first_valid_index() 和 DataFrame.last_valid_index() 在值中间有整行NAN时 (GH20499 )。
窃听 IntervalIndex 其中某些索引操作不支持重叠或非单调 uint64 数据 (GH20636 )
窃听 Series.is_unique 其中，如果Series包含以下对象，则显示stderr中的无关输出 __ne__ 已定义 (GH20661 )
窃听 .loc 使用类似单元素列表的赋值错误地将其赋值为列表 (GH19474 )
Bug in partial string indexing on a Series/DataFrame with a monotonic decreasing DatetimeIndex (GH19362)
Bug in performing in-place operations on a DataFrame with a duplicate Index (GH17105)
窃听 IntervalIndex.get_loc() 和 IntervalIndex.get_indexer() 当与 IntervalIndex 包含单个间隔的 (GH17284 ， GH20921 )
窃听 .loc 使用一个 uint64 索引器 (GH20722 )

MultiIndex#

窃听 MultiIndex.__contains__() 非元组关键字将在何处返回 True 即使它们被丢弃了 (GH19027 )
Bug in MultiIndex.set_labels() which would cause casting (and potentially clipping) of the new labels if the level argument is not 0 or a list like [0, 1, ... ] (GH19057)
窃听 MultiIndex.get_level_values() 这将返回具有缺失值的INT级别的无效索引 (GH17924 )
Bug in MultiIndex.unique() when called on empty MultiIndex (GH20568)
窃听 MultiIndex.unique() 它不会保留级别名称 (GH20570 )
窃听 MultiIndex.remove_unused_levels() 这将填充NaN值 (GH18417 )
窃听 MultiIndex.from_tuples() 它将无法在python3中获取压缩的元组 (GH18434 )
窃听 MultiIndex.get_loc() 它将无法在浮点型和整型之间自动转换值。 (GH18818 ， GH15994 )
窃听 MultiIndex.get_loc() 它会将布尔值转换为整数标签 (GH19086 )
Bug in MultiIndex.get_loc() which would fail to locate keys containing NaN (GH18485)
窃听 MultiIndex.get_loc() 大体上 MultiIndex ，将在级别具有不同的数据类型时失败 (GH18520 )
索引中的错误，其中不正确地处理仅具有NumPy数组的嵌套索引器 (GH19686 )

IO#

read_html() 现在，在尝试使用新的解析器进行解析之前，可以在解析失败后倒回可查找的IO对象。如果解析器出错并且对象不可查找，则会引发信息性错误，建议使用其他解析器 (GH17975 )
DataFrame.to_html() 现在可以选择将id添加到前导 <table> 标签 (GH8496 )
窃听 read_msgpack() 一个不存在的文件被传递到Python2中 (GH15296 )
窃听 read_csv() 其中一个 MultiIndex 具有重复列的列未被适当损坏 (GH18062 )
Bug in read_csv() where missing values were not being handled properly when keep_default_na=False with dictionary na_values (GH19227)
窃听 read_csv() 在32位大端体系结构上导致堆损坏 (GH20785 )
Bug in read_sas() where a file with 0 variables gave an AttributeError incorrectly. Now it gives an EmptyDataError (GH18184)
窃听 DataFrame.to_latex() 在那里，用作不可见占位符的一对花括号被转义 (GH18667 )
窃听 DataFrame.to_latex() 其中一个 NaN 在一个 MultiIndex 会导致一个 IndexError 或不正确的输出 (GH14249 )
Bug in DataFrame.to_latex() where a non-string index-level name would result in an AttributeError (GH19981)
窃听 DataFrame.to_latex() 其中索引名和 index_names=False 选项将导致不正确的输出 (GH18326 )
窃听 DataFrame.to_latex() 其中一个 MultiIndex 使用空字符串作为其名称将导致不正确的输出 (GH18669 )
窃听 DataFrame.to_latex() 在某些情况下，缺少空格字符会导致错误的转义并生成无效的LaTeX (GH20859 )
Bug in read_json() where large numeric values were causing an OverflowError (GH18842)
窃听 DataFrame.to_parquet() 如果写入目标为S3，则会引发异常 (GH19134 )
Interval 现在受支持 DataFrame.to_excel() 适用于所有Excel文件类型 (GH19242 )
Timedelta 现在受支持 DataFrame.to_excel() 适用于所有Excel文件类型 (GH19242 ， GH9155 ， GH19900 )
窃听 pandas.io.stata.StataReader.value_labels() 举起一个 AttributeError 当调用非常旧的文件时。现在返回一个空的dict (GH19417 )
窃听 read_pickle() 使用取消对对象进行酸洗时 TimedeltaIndex 或 Float64Index 使用0.20版之前的Pandas创建 (GH19939 )
窃听 pandas.io.json.json_normalize() 其中，如果任何子记录值为NoneType，则子记录未正确规范化 (GH20030 )
窃听 usecols 中的参数 read_csv() 其中，在传递字符串时未正确引发错误。 (GH20529 )
窃听 HDFStore.keys() 读取带有软链接的文件时导致异常 (GH20523 )
Bug in HDFStore.select_column() where a key which is not a valid store raised an AttributeError instead of a KeyError (GH17912)

标绘#

尝试打印但未安装matplotlib时出现更好的错误信息 (GH19810 )。
DataFrame.plot() 现在引发一个 ValueError 当 x 或 y 论点的形式不正确 (GH18671 )
窃听 DataFrame.plot() 什么时候 x 和 y 以位置形式给出的参数导致线状图、条形图和面积图的引用列不正确 (GH20056 )
使用格式化记号标签时出现错误 datetime.time() 和小数秒 (GH18478 )。
Series.plot.kde() 已经曝光了args ind 和 bw_method 在文档字符串中 (GH18461 )。这一论点 ind 现在也可以是整数(采样点的数量)。
DataFrame.plot() 现在支持多列到 y 论据 (GH19699 )

分组依据/重采样/滚动#

Bug when grouping by a single column and aggregating with a class like list or tuple (GH18079)
修复了中的回归问题 DataFrame.groupby() 在使用不在索引中的元组键调用时，不会发出错误 (GH18798 )
Bug in DataFrame.resample() which silently ignored unsupported (or mistyped) options for label, closed and convention (GH19303)
窃听 DataFrame.groupby() 其中，元组被解释为键列表，而不是键 (GH17979 ， GH18249 )
窃听 DataFrame.groupby() 聚合依据： first/last /min/``Max``导致时间戳丢失精度 (GH19526 )
窃听 DataFrame.transform() 其中特定聚合函数被错误地强制转换以匹配分组数据的数据类型 (GH19200 )
Bug in DataFrame.groupby() passing the on= kwarg, and subsequently using .apply() (GH17813)
窃听 DataFrame.resample().aggregate 而不是引发 KeyError 聚合不存在的列时 (GH16766 ， GH19566 )
窃听 DataFrameGroupBy.cumsum() 和 DataFrameGroupBy.cumprod() 什么时候 skipna 已通过 (GH19806 )
窃听 DataFrame.resample() 丢弃的时区信息 (GH13238 )
Bug in DataFrame.groupby() where transformations using np.all and np.any were raising a ValueError (GH20653)
窃听 DataFrame.resample() 哪里 ffill ， bfill ， pad ， backfill ， fillna ， interpolate ，以及 asfreq 我们忽视了 loffset 。 (GH20744 )
窃听 DataFrame.groupby() 当应用具有混合数据类型的函数并且用户提供的函数可能在分组列上失败时 (GH20949 )
窃听 DataFrameGroupBy.rolling().apply() 其中对关联的 DataFrameGroupBy 对象可能会影响结果中包含分组的项 (GH14013 )

稀疏#

在其中创建 SparseDataFrame 从密集的 Series 或者不受支持的类型引发了非受控异常 (GH19374 )
窃听 SparseDataFrame.to_csv 引发异常 (GH19384 )
窃听 SparseSeries.memory_usage ，它通过访问非稀疏元素而导致段错误。 (GH19368 )
在构造一个 SparseArray ：如果 data 是标量，并且 index 被定义为它将强制 float64 而不考虑标量的数据类型。 (GH19163 )

重塑#

Bug in DataFrame.merge() where referencing a CategoricalIndex by name, where the by kwarg would KeyError (GH20777)
窃听 DataFrame.stack() 尝试在Python3下对混合类型级别进行排序失败 (GH18310 )
窃听 DataFrame.unstack() 它将int强制转换为Float，如果 columns 是一种 MultiIndex 未使用的标高 (GH17845 )
窃听 DataFrame.unstack() 如果出现以下情况，则会引发错误 index 是一种 MultiIndex 未堆叠标高上有未使用的标签 (GH18562 )
固定构造的 Series 从一个 dict 包含 NaN 作为关键字 (GH18480 )
固定构造的 DataFrame 从一个 dict 包含 NaN 作为关键字 (GH18455 )
Disabled construction of a Series where len(index) > len(data) = 1, which previously would broadcast the data item, and now raises a ValueError (GH18819)
中的构造错误。 DataFrame 从一个 dict 在传递的索引中未包含相应的键时包含标量值 (GH18600 )
Fixed (changed from object to float64) dtype of DataFrame initialized with axes, no data, and dtype=int (GH19646)
窃听 Series.rank() 哪里 Series 包含 NaT 修改 Series 在位 (GH18521 )
窃听 cut() 在使用只读数组时失败 (GH18773 )
Bug in DataFrame.pivot_table() which fails when the aggfunc arg is of type string. The behavior is now consistent with other methods like agg and apply (GH18713)
窃听 DataFrame.merge() 其中，合并使用 Index 作为向量的对象引发异常 (GH19038 )
窃听 DataFrame.stack() ， DataFrame.unstack() ， Series.unstack() 它们没有返回子类 (GH15563 )
Bug in timezone comparisons, manifesting as a conversion of the index to UTC in .concat() (GH18523)
窃听 concat() 连接稀疏和密集序列时，它仅返回一个 SparseDataFrame 。应该是一个 DataFrame 。 (GH18914 ， GH18686 ，以及 GH16874 )
改进了以下项的错误消息 DataFrame.merge() 当没有公共合并键时 (GH19427 )
窃听 DataFrame.join() 它做了一个 outer 而不是一个 left 在使用多个DataFrame调用且某些DataFrame具有非唯一索引时加入 (GH19624 )
Series.rename() 现在接受 axis 作为一名武士 (GH18589 )
窃听 rename() 其中，相同长度元组的索引被转换为多索引 (GH19497 )
比较以下几种情况 Series 和 Index 将返回一个 Series 使用不正确的名称，忽略 Index 的名称属性 (GH19582 )
Bug in qcut() where datetime and timedelta data with NaT present raised a ValueError (GH19768)
窃听 DataFrame.iterrows() ，它会推断出不符合的字符串 ISO8601 至约会时间 (GH19671 )
窃听 Series 的构造函数 Categorical 其中一个 ValueError 当给定不同长度的索引时，不会引发 (GH19342 )
窃听 DataFrame.astype() 在转换为分类数据类型或数据类型词典时丢失列元数据的位置 (GH19920 )
窃听 cut() 和 qcut() 时区信息被丢弃的位置 (GH19872 )
窃听 Series 具有一个 dtype=str ，以前在某些情况下提出 (GH19853 )
窃听 get_dummies() ，以及 select_dtypes() ，其中重复的列名导致不正确的行为 (GH20848 )
窃听 isna() ，它不能处理不明确的类型化列表 (GH20675 )
窃听 concat() 这在连接TZ感知数据帧和全NAT数据帧时引发错误 (GH12396 )
窃听 concat() 这在连接空的支持TZ的序列时引发错误 (GH18447 )

其他#

改进了在尝试将Python关键字用作 numexpr 支持的查询 (GH18221 )
访问时出现错误 pandas.get_option() ，这引发了 KeyError 而不是 OptionError 在某些情况下查找不存在的选项键时 (GH19789 )
窃听 testing.assert_series_equal() 和 testing.assert_frame_equal() 对于具有不同Unicode数据的系列或DataFrame (GH20503 )

贡献者#

共有328人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Aaron Critchley
AbdealiJK +
Adam Hooper +
Albert Villanova del Moral
Alejandro Giacometti +
Alejandro Hohmann +
Alex Rychyk
Alexander Buchkovsky
Alexander Lenail +
Alexander Michael Schade
Aly Sivji +
Andreas Költringer +
Andrew
Andrew Bui +
András Novoszáth +
Andy Craze +
Andy R. Terrel
Anh Le +
Anil Kumar Pallekonda +
Antoine Pitrou +
Antonio Linde +
Antonio Molina +
Antonio Quinonez +
Armin Varshokar +
Artem Bogachev +
Avi Sen +
Azeez Oluwafemi +
Ben Auffarth +
Bernhard Thiel +
Bhavesh Poddar +
BielStela +
Blair +
Bob Haffner
Brett Naul +
Brock Mendel
Bryce Guinta +
Carlos Eduardo Moreira dos Santos +
Carlos García Márquez +
Carol Willing
Cheuk Ting Ho +
Chitrank Dixit +
Chris
Chris Burr +
Chris Catalfo +
Chris Mazzullo
Christian Chwala +
Cihan Ceyhan +
Clemens Brunner
Colin +
Cornelius Riemenschneider
Crystal Gong +
DaanVanHauwermeiren
Dan Dixey +
Daniel Frank +
Daniel Garrido +
Daniel Sakuma +
DataOmbudsman +
Dave Hirschfeld
Dave Lewis +
David Adrián Cañones Castellano +
David Arcos +
David C Hall +
David Fischer
David Hoese +
David Lutz +
David Polo +
David Stansby
Dennis Kamau +
Dillon Niederhut
Dimitri +
Dr. Irv
Dror Atariah
Eric Chea +
Eric Kisslinger
Eric O. LEBIGOT (EOL) +
FAN-GOD +
Fabian Retkowski +
Fer Sar +
Gabriel de Maeztu +
Gianpaolo Macario +
Giftlin Rajaiah
Gilberto Olimpio +
Gina +
Gjelt +
Graham Inggs +
Grant Roch
Grant Smith +
Grzegorz Konefał +
Guilherme Beltramini
HagaiHargil +
Hamish Pitkeathly +
Hammad Mashkoor +
Hannah Ferchland +
Hans
Haochen Wu +
Hissashi Rocha +
Iain Barr +
Ibrahim Sharaf ElDen +
Ignasi Fosch +
Igor Conrado Alves de Lima +
Igor Shelvinskyi +
Imanflow +
Ingolf Becker
Israel Saeta Pérez
Iva Koevska +
Jakub Nowacki +
Jan F-F +
Jan Koch +
Jan Werkmann
Janelle Zoutkamp +
Jason Bandlow +
Jaume Bonet +
Jay Alammar +
Jeff Reback
JennaVergeynst
Jimmy Woo +
Jing Qiang Goh +
Joachim Wagner +
Joan Martin Miralles +
Joel Nothman
Joeun Park +
John Cant +
Johnny Metz +
Jon Mease
Jonas Schulze +
Jongwony +
Jordi Contestí +
Joris Van den Bossche
José F. R. Fonseca +
Jovixe +
Julio Martinez +
Jörg Döpfert
KOBAYASHI Ittoku +
Kate Surta +
Kenneth +
Kevin Kuhl
Kevin Sheppard
Krzysztof Chomski
Ksenia +
Ksenia Bobrova +
Kunal Gosar +
Kurtis Kerstein +
Kyle Barron +
Laksh Arora +
Laurens Geffert +
Leif Walsh
Liam Marshall +
Liam3851 +
Licht Takeuchi
Liudmila +
Ludovico Russo +
Mabel Villalba +
Manan Pal Singh +
Manraj Singh
Marc +
Marc Garcia
Marco Hemken +
Maria del Mar Bibiloni +
Mario Corchero +
Mark Woodbridge +
Martin Journois +
Mason Gallo +
Matias Heikkilä +
Matt Braymer-Hayes
Matt Kirk +
Matt Maybeno +
Matthew Kirk +
Matthew Rocklin +
Matthew Roeschke
Matthias Bussonnier +
Max Mikhaylov +
Maxim Veksler +
Maximilian Roos
Maximiliano Greco +
Michael Penkov
Michael Röttger +
Michael Selik +
Michael Waskom
Mie~~~
Mike Kutzma +
Ming Li +
Mitar +
Mitch Negus +
Montana Low +
Moritz Münst +
Mortada Mehyar
Myles Braithwaite +
Nate Yoder
Nicholas Ursa +
Nick Chmura
Nikos Karagiannakis +
Nipun Sadvilkar +
Nis Martensen +
Noah +
Noémi Éltető +
Olivier Bilodeau +
Ondrej Kokes +
Onno Eberhard +
Paul Ganssle +
Paul Mannino +
Paul Reidy
Paulo Roberto de Oliveira Castro +
Pepe Flores +
Peter Hoffmann
Phil Ngo +
Pietro Battiston
Pranav Suri +
Priyanka Ojha +
Pulkit Maloo +
README Bot +
Ray Bell +
Riccardo Magliocchetti +
Ridhwan Luthra +
Robert Meyer
Robin
Robin Kiplang'at +
Rohan Pandit +
Rok Mihevc +
Rouz Azari
Ryszard T. Kaleta +
Sam Cohan
Sam Foo
Samir Musali +
Samuel Sinayoko +
Sangwoong Yoon
SarahJessica +
Sharad Vijalapuram +
Shubham Chaudhary +
SiYoungOh +
Sietse Brouwer
Simone Basso +
Stefania Delprete +
Stefano Cianciulli +
Stephen Childs +
StephenVoland +
Stijn Van Hoey +
Sven
Talitha Pumar +
Tarbo Fukazawa +
Ted Petrou +
Thomas A Caswell
Tim Hoffmann +
Tim Swast
Tom Augspurger
Tommy +
Tulio Casagrande +
Tushar Gupta +
Tushar Mittal +
Upkar Lidder +
Victor Villas +
Vince W +
Vinícius Figueiredo +
Vipin Kumar +
WBare
Wenhuan +
Wes Turner
William Ayd
Wilson Lin +
Xbar
Yaroslav Halchenko
Yee Mey
Yeongseon Choe +
Yian +
Yimeng Zhang
ZhuBaohe +
Zihao Zhao +
adatasetaday +
akielbowicz +
akosel +
alinde1 +
amuta +
bolkedebruin
cbertinato
cgohlke
charlie0389 +
chris-b1
csfarkas +
dajcs +
deflatSOCO +
derestle-htwg
discort
dmanikowski-reef +
donK23 +
elrubio +
fivemok +
fjdiod
fjetter +
froessler +
gabrielclow
gfyoung
ghasemnaddaf
h-vetinari +
himanshu awasthi +
ignamv +
jayfoad +
jazzmuesli +
jbrockmendel
jen w +
jjames34 +
joaoavf +
joders +
jschendel
juan huguet +
l736x +
luzpaz +
mdeboc +
miguelmorin +
miker985
miquelcamprodon +
orereta +
ottiP +
peterpanmj +
rafarui +
raph-m +
readyready15728 +
rmihael +
samghelms +
scriptomation +
sfoo +
stefansimik +
stonebig
tmnhat2001 +
tomneep +
topper-123
tv3141 +
verakai +
xpvpc +
zhanghui +

0.23.1中的新特性(2018年6月12日)

0.22.0版(2017年12月29日)

0.23.0中的新特性(2018年5月15日)#

新功能#

JSON读/写可往返，带 orient='table'#

方法 .assign() 接受从属参数#

在列和索引级的组合上合并#

按列和索引级别的组合进行排序#

使用自定义类型扩展Pandas(试验性)#

新的 observed 中排除未观察到的类别的关键字 GroupBy#

Rolling/Expaning.Apply()接受 raw=False 要通过一个 Series 传递给函数#

DataFrame.interpolate 已经获得了 limit_area 科瓦格#

功能 get_dummies 现在支持 dtype 论据#

Timedelta mod方法#

方法 .rank() 手柄 inf 值符合以下条件 NaN 都在现场#

Series.str.cat 已经获得了 join 科瓦格#

DataFrame.astype performs column-wise conversion to Categorical#

其他增强功能#

向后不兼容的API更改#

依赖项增加了最低版本#

从词典实例化保留了Python3.6+的词典插入顺序#

弃用面板#

Pandas.core.常见删除#

要输出的更改 DataFrame.apply 始终如一#

串联将不再排序#

构建更改#

索引除以零可以正确填充#

从字符串中提取匹配模式#

的缺省值 ordered 的参数 CategoricalDtype#

在终端中更好地美化打印DataFrames#

类似DateTimeliAPI的更改#

其他API更改#

不推荐使用#

删除先前版本的弃用/更改#

性能改进#

文档更改#

错误修复#

直截了当的#

类似日期的#

Timedelta#

时区#

偏移#

数字#

字符串#

标引#

MultiIndex#

IO#

标绘#

分组依据/重采样/滚动#

稀疏#

重塑#

其他#

贡献者#

JSON读/写可往返，带 `orient='table'`#

方法 `.assign()` 接受从属参数#

新的 `observed` 中排除未观察到的类别的关键字 `GroupBy`#

Rolling/Expaning.Apply()接受 `raw=False` 要通过一个 `Series` 传递给函数#

`DataFrame.interpolate` 已经获得了 `limit_area` 科瓦格#

功能 `get_dummies` 现在支持 `dtype` 论据#

方法 `.rank()` 手柄 `inf` 值符合以下条件 `NaN` 都在现场#

`Series.str.cat` 已经获得了 `join` 科瓦格#

`DataFrame.astype` performs column-wise conversion to `Categorical`#

要输出的更改 `DataFrame.apply` 始终如一#

的缺省值 `ordered` 的参数 `CategoricalDtype`#