1.1.0中的新特性(2020年7月28日)#

这些是Pandas1.1.0中的变化。看见发行说明获取完整的更改日志，包括其他版本的Pandas。

增强#

Loc引发的KeyErrors指定缺少标签#

以前，如果缺少标签， .loc 调用时，引发了一个KeyError，声明不再支持此操作。

现在，错误消息还包括缺失标签的列表(最多10个项目，显示宽度80个字符)。看见 GH34272 。

所有数据类型现在都可以转换为 `StringDtype`#

以前，声明或转换为 StringDtype 一般情况下，只有在数据已经只有 str 或者像南一样 (GH31204 )。 StringDtype 现在可以在以下情况下工作： astype(str) 或 dtype=str 工作：

例如，下面的代码现在可以使用：

In [1]: ser = pd.Series([1, "abc", np.nan], dtype="string")

In [2]: ser
Out[2]: 
0       1
1     abc
2    <NA>
Length: 3, dtype: string

In [3]: ser[0]
Out[3]: '1'

In [4]: pd.Series([1, 2, np.nan], dtype="Int64").astype("string")
Out[4]: 
0       1
1       2
2    <NA>
Length: 3, dtype: string

非单调周期索引部分字符串切片#

PeriodIndex 现在支持非单调索引的部分字符串切片、镜像 DatetimeIndex 行为 (GH31096 )

例如：

In [5]: dti = pd.date_range("2014-01-01", periods=30, freq="30D")

In [6]: pi = dti.to_period("D")

In [7]: ser_monotonic = pd.Series(np.arange(30), index=pi)

In [8]: shuffler = list(range(0, 30, 2)) + list(range(1, 31, 2))

In [9]: ser = ser_monotonic[shuffler]

In [10]: ser
Out[10]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
              ..
2015-09-23    21
2015-11-22    23
2016-01-21    25
2016-03-21    27
2016-05-20    29
Freq: D, Length: 30, dtype: int64

In [11]: ser["2014"]
Out[11]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
2014-10-28    10
2014-12-27    12
2014-01-31     1
2014-04-01     3
2014-05-31     5
2014-07-30     7
2014-09-28     9
2014-11-27    11
Freq: D, Length: 13, dtype: int64

In [12]: ser.loc["May 2015"]
Out[12]: 
2015-05-26    17
Freq: D, Length: 1, dtype: int64

比较两个 `DataFrame` 或者两个 `Series` 并总结了其中的差异#

We've added DataFrame.compare() and Series.compare() for comparing two DataFrame or two Series (GH30429)

In [13]: df = pd.DataFrame(
   ....:     {
   ....:         "col1": ["a", "a", "b", "b", "a"],
   ....:         "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
   ....:         "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
   ....:     },
   ....:     columns=["col1", "col2", "col3"],
   ....: )
   ....: 

In [14]: df
Out[14]: 
  col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

[5 rows x 3 columns]

In [15]: df2 = df.copy()

In [16]: df2.loc[0, 'col1'] = 'c'

In [17]: df2.loc[2, 'col3'] = 4.0

In [18]: df2
Out[18]: 
  col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

[5 rows x 3 columns]

In [19]: df.compare(df2)
Out[19]: 
  col1       col3      
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

[2 rows x 4 columns]

看见 User Guide 了解更多详细信息。

允许GROUPBY密钥中的NA#

使用 groupby ，我们添加了一个 dropna 关键字至 DataFrame.groupby() 和 Series.groupby() 为了允许 NA 组密钥中的值。用户可以定义 dropna 至 False 如果他们想包括 NA GROUPBY键中的值。缺省值设置为 True 为 dropna 保持向后兼容性 (GH3729 )

In [20]: df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]

In [21]: df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])

In [22]: df_dropna
Out[22]: 
   a    b  c
0  1  2.0  3
1  1  NaN  4
2  2  1.0  3
3  1  2.0  2

[4 rows x 3 columns]

# Default ``dropna`` is set to True, which will exclude NaNs in keys
In [23]: df_dropna.groupby(by=["b"], dropna=True).sum()
Out[23]: 
     a  c
b        
1.0  2  3
2.0  2  5

[2 rows x 2 columns]

# In order to allow NaN in keys, set ``dropna`` to False
In [24]: df_dropna.groupby(by=["b"], dropna=False).sum()
Out[24]: 
     a  c
b        
1.0  2  3
2.0  2  5
NaN  1  4

[3 rows x 2 columns]

默认设置为 dropna 论据是 True 这意味着 NA 不包括在组密钥中。

使用键进行排序#

我们已经添加了一个 key 参数设置为 DataFrame 和 Series 排序方法，包括 DataFrame.sort_values() ， DataFrame.sort_index() ， Series.sort_values() ，以及 Series.sort_index() 。这个 key 可以是在执行排序之前逐列应用于用于排序的每一列的任何可调用函数 (GH27237 )。看见 sort_values with keys 和 sort_index with keys 了解更多信息。

In [25]: s = pd.Series(['C', 'a', 'B'])

In [26]: s
Out[26]: 
0    C
1    a
2    B
Length: 3, dtype: object

In [27]: s.sort_values()
Out[27]: 
2    B
0    C
1    a
Length: 3, dtype: object

请注意这是如何先用大写字母排序的。如果我们将 Series.str.lower() 方法，我们将得到

In [28]: s.sort_values(key=lambda x: x.str.lower())
Out[28]: 
1    a
2    B
0    C
Length: 3, dtype: object

当应用于 DataFrame ，则按列将键应用于所有列或子集(如果 by 是指定的，例如

In [29]: df = pd.DataFrame({'a': ['C', 'C', 'a', 'a', 'B', 'B'],
   ....:                    'b': [1, 2, 3, 4, 5, 6]})
   ....: 

In [30]: df
Out[30]: 
   a  b
0  C  1
1  C  2
2  a  3
3  a  4
4  B  5
5  B  6

[6 rows x 2 columns]

In [31]: df.sort_values(by=['a'], key=lambda col: col.str.lower())
Out[31]: 
   a  b
2  a  3
3  a  4
4  B  5
5  B  6
0  C  1
1  C  2

[6 rows x 2 columns]

有关详细信息，请参阅中的示例和文档 DataFrame.sort_values() ， Series.sort_values() ，以及 sort_index() 。

时间戳构造函数中的折叠参数支持#

Timestamp: 现在支持仅限关键字的文件夹参数 PEP 495 类似于父级 datetime.datetime 班级。它既支持接受Fold作为初始化参数，也支持从其他构造函数参数推断Fold (GH25057 ， GH31338 )。支持仅限于 dateutil 时区作为 pytz 不支持折叠。

例如：

In [32]: ts = pd.Timestamp("2019-10-27 01:30:00+00:00")

In [33]: ts.fold
Out[33]: 0

In [34]: ts = pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
   ....:                   tz="dateutil/Europe/London", fold=1)
   ....: 

In [35]: ts
Out[35]: Timestamp('2019-10-27 01:30:00+0000', tz='dateutil//usr/share/zoneinfo/Europe/London')

有关使用Fold的更多信息，请参见 Fold subsection 在用户指南中。

解析TO_DATETIME中具有不同时区的时区感知格式#

to_datetime() 现在支持解析包含时区名称的格式 (%Z )和UTC偏移 (%z )，然后通过设置将它们转换为UTC utc=True 。这将返回一个 DatetimeIndex 时区为UTC，而不是 Index 使用 object 数据类型If utc=True 未设置 (GH32792 )。

例如：

In [36]: tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
   ....:            "2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
   ....: 

In [37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
Out[37]: 
DatetimeIndex(['2010-01-01 11:00:00+00:00', '2010-01-01 13:00:00+00:00',
               '2010-01-01 09:00:00+00:00', '2010-01-01 08:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

In [38]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')
Out[38]: 
Index([2010-01-01 12:00:00+01:00, 2010-01-01 12:00:00-01:00,
       2010-01-01 12:00:00+03:00, 2010-01-01 12:00:00+04:00],
      dtype='object')

Grouper和Resample现在支持参数Origin和Offset#

Grouper 和 DataFrame.resample() 现在支持以下论点 origin 和 offset 。它允许用户控制调整分组所依据的时间戳。 (GH31809 )

根据时间序列起始点的一天的开始调整分组的仓位。这在频率是一天的倍数的情况下运行良好(例如 30D )或将一天分为两部分(如 90s 或 1min )。但它可能会造成与某些不符合这一标准的频率的不一致。要更改此行为，现在可以使用参数指定固定的时间戳 origin 。

现在不建议使用两个参数(有关更多信息，请参阅 DataFrame.resample() )：

base 应替换为 offset 。
loffset 应通过直接向索引添加偏移量来替换 DataFrame 在被重新取样后。

用法的小例子 origin ：

In [39]: start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'

In [40]: middle = '2000-10-02 00:00:00'

In [41]: rng = pd.date_range(start, end, freq='7min')

In [42]: ts = pd.Series(np.arange(len(rng)) * 3, index=rng)

In [43]: ts
Out[43]: 
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7T, Length: 9, dtype: int64

使用默认行为重新采样 'start_day' (原产地为 2000-10-01 00:00:00 )：

In [44]: ts.resample('17min').sum()
Out[44]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17T, Length: 5, dtype: int64

In [45]: ts.resample('17min', origin='start_day').sum()
Out[45]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17T, Length: 5, dtype: int64

使用固定原点重新采样：

In [46]: ts.resample('17min', origin='epoch').sum()
Out[46]: 
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17T, Length: 5, dtype: int64

In [47]: ts.resample('17min', origin='2000-01-01').sum()
Out[47]: 
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17T, Length: 4, dtype: int64

如果需要，您可以使用参数调整垃圾箱 offset (A) Timedelta )，它将被添加到缺省 origin 。

有关完整示例，请参阅：使用 origin 或 offset 调整垃圾箱起点的步骤。

Fsspec现在用于文件系统处理#

对于读取和写入本地以外的文件系统以及从HTTP(S)读取，可选的依赖项 fsspec 将用于调度操作 (GH33452 )。这将为已经支持的S3和GCS存储提供不变的功能，但也增加了对其他几种存储实施的支持，例如 Azure Datalake and Blob 、SSH、FTP、Dropbox和GitHub。有关文档和功能，请参阅 fsspec docs 。

现有的与S3和GCS对接的能力将不受此更改的影响，因为 fsspec 仍然会带来和以前一样的包裹。

其他增强功能#

与matplotlib 3.3.0兼容 (GH34850 )
IntegerArray.astype() 现在支持 datetime64 数据类型 (GH32538 )
IntegerArray 现在实现了 sum 运营 (GH33172 )
Added pandas.errors.InvalidIndexError (GH34570).
Added DataFrame.value_counts() (GH5377)
添加了一个 pandas.api.indexers.FixedForwardWindowIndexer() 类来支持前瞻性窗口。 rolling 运营部。
添加了一个 pandas.api.indexers.VariableOffsetWindowIndexer() 要支持的类 rolling 具有非固定偏移量的运算 (GH34994 )
describe() 现在包括一个 datetime_is_numeric 关键字来控制如何汇总日期时间列 (GH30164 ， GH34798 )
Styler 在多个单元格具有相同样式的情况下，现在可以更高效地呈现CSS (GH30876 )
highlight_null() 现在接受 subset 论据 (GH31345 )
直接写入SQLite连接时 DataFrame.to_sql() 现在支持 multi 方法 (GH29921 )
pandas.errors.OptionError is now exposed in pandas.errors (GH27553)
Added api.extensions.ExtensionArray.argmax() and api.extensions.ExtensionArray.argmin() (GH24382)
timedelta_range() will now infer a frequency when passed start, stop, and periods (GH32377)
Positional slicing on a IntervalIndex now supports slices with step > 1 (GH31658)
Series.str now has a fullmatch method that matches a regular expression against the entire string in each row of the Series, similar to re.fullmatch (GH32806).
DataFrame.sample() 现在还允许将类数组和BitGenerator对象传递给 random_state 作为种子 (GH32503 )
Index.union() 现在将提高 RuntimeWarning 为 MultiIndex 对象，如果其中的对象不可排序。经过 sort=False 要取消此警告，请执行以下操作 (GH33015 )
已添加 Series.dt.isocalendar() 和 DatetimeIndex.isocalendar() 这将返回一个 DataFrame 根据ISO 8601日历计算的年、周和日 (GH33206 ， GH34392 )。
这个 DataFrame.to_feather() 方法现在支持附加的关键字参数(例如，设置压缩)，这些参数添加到pyrow 0.17中 (GH33422 )。
这个 cut() 现在将接受参数 ordered 使用默认设置 ordered=True 。如果 ordered=False 并且未提供任何标签，则将引发错误 (GH33141 )
DataFrame.to_csv(), DataFrame.to_pickle(), and DataFrame.to_json() now support passing a dict of compression arguments when using the gzip and bz2 protocols. This can be used to set a custom compression level, e.g., df.to_csv(path, compression={'method': 'gzip', 'compresslevel': 1} (GH33196)
melt() 已经获得了一个 ignore_index (默认 True )参数，如果设置为 False ，可防止该方法删除索引 (GH17440 )。
Series.update() now accepts objects that can be coerced to a Series, such as dict and list, mirroring the behavior of DataFrame.update() (GH33215)
transform() and aggregate() have gained engine and engine_kwargs arguments that support executing functions with Numba (GH32854, GH33388)
interpolate() now supports SciPy interpolation method scipy.interpolate.CubicSpline as method cubicspline (GH33670)
DataFrameGroupBy 和 SeriesGroupBy 现在实现 sample 一种分组内随机抽样的方法 (GH31775 )
DataFrame.to_numpy() 现在支持 na_value 关键字来控制输出数组中的NA前哨 (GH33820 )
Added api.extension.ExtensionArray.equals to the extension array interface, similar to Series.equals() (GH27081)
The minimum supported dta version has increased to 105 in read_stata() and StataReader (GH26667).
to_stata() supports compression using the compression keyword argument. Compression can either be inferred or explicitly set using a string or a dictionary containing both the method and any additional arguments that are passed to the compression library. Compression was also added to the low-level Stata-file writers StataWriter, StataWriter117, and StataWriterUTF8 (GH26599).
HDFStore.put() now accepts a track_times parameter. This parameter is passed to the create_table method of PyTables (GH32682).
Series.plot() 和 DataFrame.plot() 现在接受 xlabel 和 ylabel 用于在x轴和y轴上显示标签的参数 (GH9093 )。
制造 pandas.core.window.rolling.Rolling 和 pandas.core.window.expanding.Expanding Iterable(：问题：11704)
制造 option_context 一个 contextlib.ContextDecorator ，这使得它可以用作整个函数的装饰符 (GH34253 )。
DataFrame.to_csv() 和 Series.to_csv() 现在接受一个 errors 论据 (GH22610 )
transform() now allows func to be pad, backfill and cumcount (GH31269).
read_json() 现在接受 nrows 参数。 (GH33916 )。
DataFrame.hist() ， Series.hist() ， core.groupby.DataFrameGroupBy.hist() ，以及 core.groupby.SeriesGroupBy.hist() 已经获得了 legend 争论。设置为True可在直方图中显示图例。 (GH6279 )
concat() 和 append() 现在保留扩展数据类型，例如，将可为空的整型列与数值整型列组合将不再生成对象数据类型，而是保留整型数据类型 (GH33607 ， GH34339 ， GH34095 )。
read_gbq() 现在允许禁用进度条 (GH33360 )。
read_gbq() now supports the max_results kwarg from pandas-gbq (GH34639).
DataFrame.cov() 和 Series.cov() 现在支持一个新参数 ddof 在相应的NumPy方法中支持增量自由度 (GH34611 )。
DataFrame.to_html() 和 DataFrame.to_string() %s col_space 参数现在接受列表或字典以仅更改某些特定列的宽度 (GH28917 )。
DataFrame.to_excel() 现在还可以编写OpenOffice电子表格(.ods)文件 (GH27222 )
explode() now accepts ignore_index to reset the index, similar to pd.concat() or DataFrame.sort_values() (GH34932).
DataFrame.to_markdown() and Series.to_markdown() now accept index argument as an alias for tabulate's showindex (GH32667)
read_csv() 现在接受像“0”、“0.0”、“1”、“1.0”这样的字符串值可转换为可空的布尔数据类型 (GH34859 )
pandas.core.window.ExponentialMovingWindow now supports a times argument that allows mean to be calculated with observations spaced by the timestamps in times (GH34839)
DataFrame.agg() 和 Series.agg() 现在接受命名聚合来重命名输出列/索引。 (GH26513 )
compute.use_numba 现在作为一个配置选项存在，该选项在可用时利用Numba引擎 (GH33966 ， GH35374 )
Series.plot() 现在支持非对称误差栏。以前，如果 Series.plot() 已收到一个“2xN”数组，其中包含 yerr 和/或 xerr ，则镜像左/下数值(第一行)，而忽略右/上数值(第二行)。现在，第一行表示左/下误差值，第二行表示右/上误差值。 (GH9536 )

值得注意的错误修复#

这些错误修复可能会带来显著的行为变化。

`MultiIndex.get_indexer` 解读 `method` 正确论证#

这将恢复 MultiIndex.get_indexer() 使用 method='backfill' 或 method='pad' 在大Pandas面前的行为0.23.0。具体地说，多索引被视为元组列表，并且相对于这些元组列表的排序进行填充或回填 (GH29896 )。

作为这方面的一个例子，给出：

In [48]: df = pd.DataFrame({
   ....:     'a': [0, 0, 0, 0],
   ....:     'b': [0, 2, 3, 4],
   ....:     'c': ['A', 'B', 'C', 'D'],
   ....: }).set_index(['a', 'b'])
   ....: 

In [49]: mi_2 = pd.MultiIndex.from_product([[0], [-1, 0, 1, 3, 4, 5]])

重建索引的差异 df 使用 mi_2 并使用 method='backfill' 可以在这里看到：

pandas >= 0.23, < 1.1.0 ：

In [1]: df.reindex(mi_2, method='backfill')
Out[1]:
      c
0 -1  A
   0  A
   1  D
   3  A
   4  A
   5  C

pandas <0.23, >= 1.1.0

In [50]: df.reindex(mi_2, method='backfill')
Out[50]: 
        c
0 -1    A
   0    A
   1    B
   3    C
   4    D
   5  NaN

[6 rows x 1 columns]

以及在重建索引方面的差异 df 使用 mi_2 并使用 method='pad' 可以在这里看到：

pandas >= 0.23, < 1.1.0

In [1]: df.reindex(mi_2, method='pad')
Out[1]:
        c
0 -1  NaN
   0  NaN
   1    D
   3  NaN
   4    A
   5    C

pandas < 0.23, >= 1.1.0

In [51]: df.reindex(mi_2, method='pad')
Out[51]: 
        c
0 -1  NaN
   0    A
   1    A
   3    C
   4    D
   5    D

[6 rows x 1 columns]

失败的基于标签的查找总是引发KeyError#

Label lookups series[key], series.loc[key] and frame.loc[key] used to raise either KeyError or TypeError depending on the type of key and type of Index. These now consistently raise KeyError (GH31867)

In [52]: ser1 = pd.Series(range(3), index=[0, 1, 2])

In [53]: ser2 = pd.Series(range(3), index=pd.date_range("2020-02-01", periods=3))

以前的行为 ：

In [3]: ser1[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
TypeError: cannot do label indexing on DatetimeIndex with these indexers [1] of type int

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

新行为 ：

In [3]: ser1[1.5]
...
KeyError: 1.5

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
KeyError: 1.5

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
KeyError: 1

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

Similarly, DataFrame.at() and Series.at() will raise a TypeError instead of a ValueError if an incompatible key is passed, and KeyError if a missing key is passed, matching the behavior of .loc[] (GH31722)

多索引提升键错误上的整数查找失败#

使用整数进行索引 MultiIndex 具有一个整数dtype的第一级错误地未能引发 KeyError 当这些整型键中的一个或多个不存在于索引的第一级中时 (GH33539 )

In [54]: idx = pd.Index(range(4))

In [55]: dti = pd.date_range("2000-01-03", periods=3)

In [56]: mi = pd.MultiIndex.from_product([idx, dti])

In [57]: ser = pd.Series(range(len(mi)), index=mi)

以前的行为 ：

In [5]: ser[[5]]
Out[5]: Series([], dtype: int64)

新行为 ：

In [5]: ser[[5]]
...
KeyError: '[5] not in index'

`DataFrame.merge()` 保留右侧框架的行顺序#

DataFrame.merge() 现在在执行右合并时保留右框架的行顺序 (GH27453 )

In [58]: left_df = pd.DataFrame({'animal': ['dog', 'pig'],
   ....:                        'max_speed': [40, 11]})
   ....: 

In [59]: right_df = pd.DataFrame({'animal': ['quetzal', 'pig'],
   ....:                         'max_speed': [80, 11]})
   ....: 

In [60]: left_df
Out[60]: 
  animal  max_speed
0    dog         40
1    pig         11

[2 rows x 2 columns]

In [61]: right_df
Out[61]: 
    animal  max_speed
0  quetzal         80
1      pig         11

[2 rows x 2 columns]

以前的行为 ：

>>> left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
    animal  max_speed
0      pig         11
1  quetzal         80

新行为 ：

In [62]: left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
Out[62]: 
    animal  max_speed
0  quetzal         80
1      pig         11

[2 rows x 2 columns]

当某些列不存在时分配给DataFrame的多个列#

的多列赋值 DataFrame 当某些列不存在时，将先前将这些值分配给最后一列。现在，将使用正确的值构建新列。 (GH13658 )

In [63]: df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})

In [64]: df
Out[64]: 
   a  b
0  0  3
1  1  4
2  2  5

[3 rows x 2 columns]

以前的行为 ：

In [3]: df[['a', 'c']] = 1
In [4]: df
Out[4]:
   a  b
0  1  1
1  1  1
2  1  1

新行为 ：

In [65]: df[['a', 'c']] = 1

In [66]: df
Out[66]: 
   a  b  c
0  1  3  1
1  1  4  1
2  1  5  1

[3 rows x 3 columns]

跨分组缩减的一致性#

使用 DataFrame.groupby() 使用 as_index=True 和聚合 nunique 将在结果的列中包括分组列。现在，分组列只出现在索引中，这与其他缩减一致。 (GH32579 )

In [67]: df = pd.DataFrame({"a": ["x", "x", "y", "y"], "b": [1, 1, 2, 3]})

In [68]: df
Out[68]: 
   a  b
0  x  1
1  x  1
2  y  2
3  y  3

[4 rows x 2 columns]

以前的行为 ：

In [3]: df.groupby("a", as_index=True).nunique()
Out[4]:
   a  b
a
x  1  1
y  1  2

新行为 ：

In [69]: df.groupby("a", as_index=True).nunique()
Out[69]: 
   b
a   
x  1
y  2

[2 rows x 1 columns]

使用 DataFrame.groupby() 使用 as_index=False 和函数 idxmax ， idxmin ， mad ， nunique ， sem ， skew ，或 std 将修改分组列。现在分组列保持不变，与其他减少一致。 (GH21090 ， GH10355 )

以前的行为 ：

In [3]: df.groupby("a", as_index=False).nunique()
Out[4]:
   a  b
0  1  1
1  1  2

新行为 ：

In [70]: df.groupby("a", as_index=False).nunique()
Out[70]: 
   a  b
0  x  1
1  y  2

[2 rows x 2 columns]

该方法 size() 以前会忽略 as_index=False 。现在，分组列以列的形式返回，使结果成为 DataFrame 而不是一个 Series 。 (GH32599 )

以前的行为 ：

In [3]: df.groupby("a", as_index=False).size()
Out[4]:
a
x    2
y    2
dtype: int64

新行为 ：

In [71]: df.groupby("a", as_index=False).size()
Out[71]: 
   a  size
0  x     2
1  y     2

[2 rows x 2 columns]

`agg()` 丢失的结果 `as_index=False` 重新标记列时#

先前 agg() 属性时丢失结果列。 as_index 选项设置为 False 结果栏被重新标记。在本例中，结果值被替换为以前的索引 (GH32240 )。

In [72]: df = pd.DataFrame({"key": ["x", "y", "z", "x", "y", "z"],
   ....:                    "val": [1.0, 0.8, 2.0, 3.0, 3.6, 0.75]})
   ....: 

In [73]: df
Out[73]: 
  key   val
0   x  1.00
1   y  0.80
2   z  2.00
3   x  3.00
4   y  3.60
5   z  0.75

[6 rows x 2 columns]

以前的行为 ：

In [2]: grouped = df.groupby("key", as_index=False)
In [3]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))
In [4]: result
Out[4]:
     min_val
 0   x
 1   y
 2   z

新行为 ：

In [74]: grouped = df.groupby("key", as_index=False)

In [75]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))

In [76]: result
Out[76]: 
  key  min_val
0   x     1.00
1   y     0.80
2   z     0.75

[3 rows x 2 columns]

应用和应用地图 `DataFrame` 仅计算第一行/第一列一次#

In [77]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 6]})

In [78]: def func(row):
   ....:     print(row)
   ....:     return row
   ....: 

以前的行为 ：

In [4]: df.apply(func, axis=1)
a    1
b    3
Name: 0, dtype: int64
a    1
b    3
Name: 0, dtype: int64
a    2
b    6
Name: 1, dtype: int64
Out[4]:
   a  b
0  1  3
1  2  6

新行为 ：

In [79]: df.apply(func, axis=1)
a    1
b    3
Name: 0, Length: 2, dtype: int64
a    2
b    6
Name: 1, Length: 2, dtype: int64
Out[79]: 
   a  b
0  1  3
1  2  6

[2 rows x 2 columns]

向后不兼容的API更改#

已添加 `check_freq` 参数为 `testing.assert_frame_equal` 和 `testing.assert_series_equal`#

这个 check_freq 参数已添加到 testing.assert_frame_equal() 和 testing.assert_series_equal() 在Pandas 1.1.0中，默认为 True 。 testing.assert_frame_equal() 和 testing.assert_series_equal() 现在举起 AssertionError 如果索引不具有相同的频率。在Pandas1.1.0之前，没有检查索引频率。

提高了依赖项的最低版本#

更新了一些受支持的依赖项最低版本 (GH33718 ， GH29766 ， GH29723 ，PYABLES>=3.4.3)。如果已安装，我们现在需要：

套餐	最低版本	必填项	变化
钱币	1.15.4	X	X
皮兹	2015.4	X
Python-Dateutil	2.7.3	X	X
瓶颈	1.2.1
数字快递	2.6.2
最热(Dev)	4.0.2

为 optional libraries 一般建议使用最新版本。下表列出了目前在整个Pandas发育过程中正在测试的每个库的最低版本。低于最低测试版本的可选库仍可运行，但不被视为受支持。

套餐	最低版本	变化
美味可口的汤	4.6.0
实木地板	0.3.2
FsSpec	0.7.4
Gcsf	0.6.0	X
Lxml	3.8.0
Matplotlib	2.2.2
Numba	0.46.0
OpenPyxl	2.5.7
绿箭侠	0.13.0
Pymysql	0.7.1
易燃物	3.4.3	X
S3FS	0.4.0	X
斯比	1.2.0	X
SQLALCHIZY	1.1.4
XARRAY	0.8.2
Xlrd	1.1.0
Xlsx写入器	0.9.8
超大重量	1.2.0
Pandas-Gbq	1.2.0	X

看见依赖项和可选依赖项想要更多。

发展变化#

Cython的最低版本现在是最新的错误修复版本(0.29.16) (GH33334 )。

不推荐使用#

上的查找 Series 使用包含切片的单项列表(例如 ser[[slice(0, 4)]] )已弃用，并将在将来的版本中提出。或者将列表转换为元组，或者直接传递切片 (GH31333 )
DataFrame.mean() 和 DataFrame.median() 使用 numeric_only=None 将包括 datetime64 和 datetime64tz 未来版本中的列 (GH29941 )
使用设置值 .loc 使用位置切片已被弃用，并将在未来版本中提升。使用 .loc 带有标签或 .iloc 而不是用头寸 (GH31840 )
DataFrame.to_dict() 已不建议接受以下名称的简称 orient 并将在未来的版本中提高 (GH32515 )
Categorical.to_dense() 已弃用，并将在未来版本中删除，请使用 np.asarray(cat) 取而代之的是 (GH32639 )
这个 fastpath 中的关键字 SingleBlockManager 构造函数已弃用，并将在未来版本中删除 (GH33092 )
提供 suffixes 作为一个 set 在……里面 pandas.merge() 已弃用。改为提供一个元组 (GH33740 ， GH34741 )。
索引为 Series 使用多维索引器，如 [:, None] 要返回一个 ndarray 现在引发一个 FutureWarning 。改为在索引前转换为NumPy数组 (GH27837 )
Index.is_mixed() 已弃用，并将在未来版本中删除，请选中 index.inferred_type 而不是直接 (GH32922 )
将除第一个参数以外的任何参数传递给 read_html() 因为位置参数已弃用。所有其他参数应作为关键字参数提供 (GH27573 )。
传递任何参数，但是 path_or_buf (第一个)到 read_json() 因为位置参数已弃用。所有其他参数应作为关键字参数提供 (GH27573 )。
将除前两个参数以外的任何参数传递给 read_excel() 因为位置参数已弃用。所有其他参数应作为关键字参数提供 (GH27573 )。
pandas.api.types.is_categorical() 已弃用，并将在未来版本中删除；请使用 pandas.api.types.is_categorical_dtype() 取而代之的是 (GH33385 )
Index.get_value() 已弃用，并将在将来的版本中删除 (GH19728 )
Series.dt.week() 和 Series.dt.weekofyear() 已弃用，并将在将来的版本中删除，请使用 Series.dt.isocalendar().week() 取而代之的是 (GH33595 )
DatetimeIndex.week() 和 DatetimeIndex.weekofyear 已弃用，并将在将来的版本中删除，请使用 DatetimeIndex.isocalendar().week 取而代之的是 (GH33595 )
DatetimeArray.week() 和 DatetimeArray.weekofyear 已弃用，并将在将来的版本中删除，请使用 DatetimeArray.isocalendar().week 取而代之的是 (GH33595 )
DateOffset.__call__() 已弃用，并将在未来版本中删除，请使用 offset + other 取而代之的是 (GH34171 )
apply_index() 已弃用，并将在将来的版本中删除。使用 offset + other 取而代之的是 (GH34580 )
DataFrame.tshift() 和 Series.tshift() 已弃用，并将在将来的版本中删除，请使用 DataFrame.shift() 和 Series.shift() 取而代之的是 (GH11631 )
索引AN Index 对象已弃用，将引发 IndexError 在未来。您可以改为手动转换为整型键 (GH34191 )。
这个 squeeze 输入关键字 groupby() 已弃用，并将在将来的版本中删除 (GH32380 )
这个 tz 输入关键字 Period.to_timestamp() 已弃用，并将在未来版本中删除；请使用 per.to_timestamp(...).tz_localize(tz) 取而代之的是 (GH34522 )
DatetimeIndex.to_perioddelta() 已弃用，并将在将来的版本中删除。使用 index - index.to_period(freq).to_timestamp() 取而代之的是 (GH34853 )
DataFrame.melt() 接受一个 value_name 不推荐使用已存在的，并将在将来的版本中删除 (GH34731 )
这个 center 中的关键字 DataFrame.expanding() 函数已弃用，并将在未来版本中删除 (GH20647 )

性能改进#

性能提升 Timedelta 构造函数 (GH30543 )
性能提升 Timestamp 构造函数 (GH30543 )
Performance improvement in flex arithmetic ops between DataFrame and Series with axis=0 (GH31296)
Performance improvement in arithmetic ops between DataFrame and Series with axis=1 (GH33600)
内部指标法 _shallow_copy() 现在将缓存的属性复制到新索引，避免在新索引上再次创建这些属性。这可以加快许多依赖于创建现有索引副本的操作 (GH28584 ， GH32640 ， GH32669 )
在创建 DataFrame 稀疏值来自 scipy.sparse 矩阵使用 DataFrame.sparse.from_spmatrix() 构造函数 (GH32821 ， GH32825 ， GH32826 ， GH32856 ， GH32858 )。
Performance improvement for groupby methods first() and last() (GH34178)
性能提升 factorize() 对于可为空的(整型和布尔)数据类型 (GH33064 )。
构建时的性能改进 Categorical 对象 (GH33921 )
Fixed performance regression in pandas.qcut() and pandas.cut() (GH33921)
在削减中提高绩效 (sum ， prod ， min ， max )用于可空(整型和布尔型)数据类型 (GH30982 ， GH33261 ， GH33442 )。
之间的算术运算的性能改进 DataFrame 对象 (GH32779 )
Performance improvement in pandas.core.groupby.RollingGroupby (GH34052)
Performance improvement in arithmetic operations (sub, add, mul, div) for MultiIndex (GH34297)
Performance improvement in DataFrame[bool_indexer] when bool_indexer is a list (GH33924)
Significant performance improvement of io.formats.style.Styler.render() with styles added with various ways such as io.formats.style.Styler.apply(), io.formats.style.Styler.applymap() or io.formats.style.Styler.bar() (GH19917)

错误修复#

直截了当的#

Passing an invalid fill_value to Categorical.take() raises a ValueError instead of TypeError (GH33660)
将一个 Categorical 具有整型类别，并且在以下操作中包含具有浮点型数据类型列的缺失值 concat() 或 append() 现在将生成浮点列，而不是对象数据类型列 (GH33607 )
BUG在哪里 merge() 无法在非唯一类别索引上联接 (GH28189 )
Bug when passing categorical data to Index constructor along with dtype=object incorrectly returning a CategoricalIndex instead of object-dtype Index (GH32167)
BUG在哪里 Categorical 比较运算符 __ne__ 将错误地评估为 False 当任何一个元素丢失时 (GH32276 )
Categorical.fillna() 现在接受 Categorical other 论据 (GH32420 )
Repr of Categorical was not distinguishing between int and str (GH33676)

类似日期的#

Passing an integer dtype other than int64 to np.array(period_index, dtype=...) will now raise TypeError instead of incorrectly using int64 (GH32255)
Series.to_timestamp() 现在引发一个 TypeError 如果轴不是 PeriodIndex 。以前的一个 AttributeError 被提了出来 (GH33327 )
Series.to_period() 现在引发一个 TypeError 如果轴不是 DatetimeIndex 。以前的一个 AttributeError 被提了出来 (GH33327 )
Period 对象不再接受元组 freq 论据 (GH34658 )
窃听 Timestamp 在哪里构建一个 Timestamp 从不明确的纪元时间并再次调用构造函数更改了 Timestamp.value() 财产性 (GH24329 )
DatetimeArray.searchsorted(), TimedeltaArray.searchsorted(), PeriodArray.searchsorted() not recognizing non-pandas scalars and incorrectly raising ValueError instead of TypeError (GH30950)
窃听 Timestamp 在哪里施工 Timestamp 由于Dateutil时区在夏令时前少于128纳秒，从冬季转换到夏季将导致时间不存在 (GH31043 )
窃听 Period.to_timestamp() ， Period.start_time() 微秒频率返回比正确时间早一纳秒的时间戳 (GH31475 )
Timestamp 在缺少年、月或日时引发令人困惑的错误消息 (GH31200 )
窃听 DatetimeIndex 构造函数不正确地接受 bool -dtype输入 (GH32668 )
窃听 DatetimeIndex.searchsorted() 不接受 list 或 Series 作为它的论点 (GH32762 )
BUG在哪里 PeriodIndex() 在传递给 Series 字符串的 (GH26109 )
窃听 Timestamp 加或减时的算术运算 np.ndarray 使用 timedelta64 数据类型 (GH33296 )
窃听 DatetimeIndex.to_period() 在不带参数的情况下调用时不推断频率 (GH33358 )
窃听 DatetimeIndex.tz_localize() 保留不正确 freq 在某些情况下，原始的 freq 不再有效 (GH30511 )
窃听 DatetimeIndex.intersection() 输掉 freq 在某些情况下为时区 (GH33604 )
窃听 DatetimeIndex.get_indexer() 对于混合的类似DateTime的目标，将返回错误的输出 (GH33741 )
窃听 DatetimeIndex 与某些类型的加法和减法 DateOffset 对象不正确地保留无效的 freq 属性 (GH33779 )
窃听 DatetimeIndex 其中设置了 freq 属性可以静默地更改 freq 查看相同数据的另一个索引上的属性 (GH33552 )
DataFrame.min() and DataFrame.max() were not returning consistent results with Series.min() and Series.max() when called on objects initialized with empty pd.to_datetime()
窃听 DatetimeIndex.intersection() 和 TimedeltaIndex.intersection() 结果不是正确的 name 属性 (GH33904 )
窃听 DatetimeArray.__setitem__() ， TimedeltaArray.__setitem__() ， PeriodArray.__setitem__() 错误地允许使用 int64 要静默转换的数据类型 (GH33717 )
Bug in subtracting TimedeltaIndex from Period incorrectly raising TypeError in some cases where it should succeed and IncompatibleFrequency in some cases where it should raise TypeError (GH33883)
在构造一个 Series 或 Index 从具有非ns分辨率的只读NumPy数组转换为对象数据类型，而不是强制 datetime64[ns] 在时间戳范围内时的Dtype (GH34843 )。
这个 freq 输入关键字 Period ， date_range() ， period_range() ， pd.tseries.frequencies.to_offset() 不再允许元组，改为以字符串形式传递 (GH34703 )
窃听 DataFrame.append() 在将 Series 包含标量Tz感知 Timestamp 到一个空荡荡的 DataFrame 产生对象列，而不是 datetime64[ns, tz] 数据类型 (GH35038 )
OutOfBoundsDatetime 当时间戳超出实现范围时，发出改进的错误消息。 (GH32967 )
窃听 AbstractHolidayCalendar.holidays() 当未定义任何规则时 (GH31415 )
窃听 Tick 比较提升 TypeError 当与类似时间增量的对象进行比较时 (GH34088 )
窃听 Tick 繁育 TypeError 当乘以浮点数时 (GH34486 )

Timedelta#

在构造一个 Timedelta 使用一个高精度整数，该整数将对 Timedelta 组件 (GH31354 )
Bug in dividing np.nan or None by Timedelta incorrectly returning NaT (GH31869)
Timedelta 现在明白了 µs 作为微秒的标识符 (GH32899 )
Timedelta 当纳秒为非零时，字符串表示形式现在包括纳秒 (GH9309 )
比较时出现错误 Timedelta 对象对一个 np.ndarray 使用 timedelta64 数据类型错误地将所有条目视为不相等 (GH33441 )
窃听 timedelta_range() 这在边缘情况下产生了一个额外的点 (GH30353 ， GH33498 )
窃听 DataFrame.resample() 这在边缘情况下产生了一个额外的点 (GH30353 ， GH13022 ， GH33498 )
窃听 DataFrame.resample() 这忽略了 loffset 在处理时间增量时使用参数 (GH7687 ， GH33498 )
窃听 Timedelta 和 pandas.to_timedelta() 这忽略了 unit 字符串输入的参数 (GH12136 )

时区#

窃听 to_datetime() 使用 infer_datetime_format=True 其中时区名称(例如 UTC )不会被正确解析 (GH33133 )

数字#

Bug in DataFrame.floordiv() with axis=0 not treating division-by-zero like Series.floordiv() (GH31271)
窃听 to_numeric() 使用字符串参数 "uint64" 和 errors="coerce" 默默地失败 (GH32394 )
窃听 to_numeric() 使用 downcast="unsigned" 因数据为空而失败 (GH32493 )
Bug in DataFrame.mean() with numeric_only=False and either datetime64 dtype or PeriodDtype column incorrectly raising TypeError (GH32426)
窃听 DataFrame.count() 使用 level="foo" 和指标级 "foo" 包含NAN导致分段故障 (GH21824 )
窃听 DataFrame.diff() 使用 axis=1 返回混合数据类型的错误结果 (GH32995 )
Bug in DataFrame.corr() and DataFrame.cov() raising when handling nullable integer columns with pandas.NA (GH33803)
之间的算术运算存在错误 DataFrame 具有重复标签的非重叠列的对象会导致无限循环 (GH35194 )
窃听 DataFrame 和 Series 对象数据类型对象之间的加法和减法 datetime64 数据类型对象 (GH33824 )
Bug in Index.difference() giving incorrect results when comparing a Float64Index and object Index (GH35217)
窃听 DataFrame 减少量(例如 df.min() ， df.max() )与 ExtensionArray 数据类型 (GH34520 ， GH32651 )
Series.interpolate() and DataFrame.interpolate() now raise a ValueError if limit_direction is 'forward' or 'both' and method is 'backfill' or 'bfill' or limit_direction is 'backward' or 'both' and method is 'pad' or 'ffill' (GH34746)

转换#

窃听 Series 用大尾数从NumPy数组构造 datetime64 数据类型 (GH29684 )
窃听 Timedelta 具有大纳秒关键字值的构造 (GH32402 )
窃听 DataFrame 复制布景而不是提升布景的建筑 (GH32582 )
这个 DataFrame 构造函数不再接受 DataFrame 物体。由于对NumPy的更改， DataFrame 对象现在被一致地视为2D对象，因此 DataFrame 对象被认为是三维的，不再接受 DataFrame 构造函数 (GH32289 )。
Bug in DataFrame when initiating a frame with lists and assign columns with nested list for MultiIndex (GH32173)
改进了创建新索引时列表构造无效的错误消息 (GH35190 )

字符串#

Bug in the astype() 将“字符串”数据类型数据转换为可为空的整型数据类型时使用 (GH32450 )。
修复了获取位置的问题 min 或 max 属于 StringArray 或 Series 使用 StringDtype 字型会升高。 (GH31746 )
窃听 Series.str.cat() 返回 NaN 当其他人拥有的时候输出 Index 类型 (GH33425 )
pandas.api.dtypes.is_string_dtype() 不再错误地将分类序列标识为字符串。

间隔#

窃听 IntervalArray 在设置值时错误地允许更改基础数据 (GH32782 )

标引#

DataFrame.xs() 现在引发一个 TypeError 如果一个 level 关键字，并且轴不是 MultiIndex 。以前的一个 AttributeError 被提了出来 (GH33610 )
切片时出现错误 DatetimeIndex 在接近年末、季末或月末时，部分时间戳会丢弃高分辨率指数 (GH31064 )
Bug in PeriodIndex.get_loc() treating higher-resolution strings differently from PeriodIndex.get_value() (GH31172)
Bug in Series.at() and DataFrame.at() not matching .loc behavior when looking up an integer in a Float64Index (GH31329)
窃听 PeriodIndex.is_monotonic() 返回错误 True 当包含行距时 NaT 条目 (GH31437 )
窃听 DatetimeIndex.get_loc() 加薪 KeyError 使用转换后的整型密钥而不是用户传递的密钥 (GH31425 )
窃听 Series.xs() 返回错误 Timestamp 而不是 datetime64 在某些对象数据类型的情况下 (GH31630 )
窃听 DataFrame.iat() 返回错误 Timestamp 而不是 datetime 在某些对象数据类型的情况下 (GH32809 )
窃听 DataFrame.at() 当列或索引不是唯一的 (GH33041 )
窃听 Series.loc() 和 DataFrame.loc() 在对象数据类型上使用整型键进行索引时 Index 这不是全整数 (GH31905 )
窃听 DataFrame.iloc.__setitem__() 在一个 DataFrame 重复列不正确地为所有匹配列设置值 (GH15686 ， GH22036 )
窃听 DataFrame.loc() 和 Series.loc() 使用一个 DatetimeIndex ， TimedeltaIndex ，或 PeriodIndex 错误地允许查找不匹配的类DATETIME数据类型 (GH32650 )
Bug in Series.__getitem__() indexing with non-standard scalars, e.g. np.dtype (GH32684)
窃听 Index 构造函数，其中为NumPy标量引发了无用的错误消息 (GH33017 )
窃听 DataFrame.lookup() 错误地引发 AttributeError 什么时候 frame.index 或 frame.columns 不是唯一的；这现在将引发 ValueError 带有一条有用的错误消息 (GH33041 )
窃听 Interval 其中一个 Timedelta 无法添加或减去 Timestamp 间隔 (GH32023 )
窃听 DataFrame.copy() 复制后未失效项目缓存导致无法反映复制后的值更新 (GH31784 )
修复了中的回归问题 DataFrame.loc() 和 Series.loc() 时引发错误。 datetime64[ns, tz] 提供了值 (GH32395 )
窃听 Series.__getitem__() 使用整型键和 MultiIndex 前导整数级别未能提升 KeyError 如果密钥不存在于第一级中 (GH33355 )
窃听 DataFrame.iloc() 在对单列进行切片时 DataFrame 使用 ExtensionDtype (例如 df.iloc[:, :1] )返回无效结果 (GH32957 )
Bug in DatetimeIndex.insert() and TimedeltaIndex.insert() causing index freq to be lost when setting an element into an empty Series (GH33573)
窃听 Series.__setitem__() vbl.用一种. IntervalIndex 和一个类似列表的整数键 (GH33473 )
Bug in Series.__getitem__() allowing missing labels with np.ndarray, Index, Series indexers but not list, these now all raise KeyError (GH33646)
窃听 DataFrame.truncate() 和 Series.truncate() 其中，假设指数是单调递增的 (GH33756 )
Indexing with a list of strings representing datetimes failed on DatetimeIndex or PeriodIndex (GH11278)
窃听 Series.at() 当与 MultiIndex 将在有效输入上引发异常 (GH26989 )
Bug in DataFrame.loc() with dictionary of values changes columns with dtype of int to float (GH34573)
窃听 Series.loc() 当与 MultiIndex 会引发一个 IndexingError 当访问 None 价值 (GH34318 )
Bug in DataFrame.reset_index() and Series.reset_index() would not preserve data types on an empty DataFrame or Series with a MultiIndex (GH19602)
窃听 Series 和 DataFrame 使用 time 把钥匙放在 DatetimeIndex 使用 NaT 条目 (GH35114 )

丢失#

Calling fillna() on an empty Series now correctly returns a shallow copied object. The behaviour is now consistent with Index, DataFrame and a non-empty Series (GH32543).
窃听 Series.replace() When参数 to_replace 类型为dict/list，用于 Series 包含 <NA> 是在募集一个 TypeError 。该方法现在通过忽略 <NA> 在为替换项进行比较时 (GH32621 )
Bug in any() and all() incorrectly returning <NA> for all False or all True values using the nulllable Boolean dtype and with skipna=False (GH33253)
Clarified documentation on interpolate with method=akima. The der parameter must be scalar or None (GH33426)
DataFrame.interpolate() uses the correct axis convention now. Previously interpolating along columns lead to interpolation along indices and vice versa. Furthermore interpolating with methods pad, ffill, bfill and backfill are identical to using these methods with DataFrame.fillna() (GH12918, GH29146)
窃听 DataFrame.interpolate() 在调用 DataFrame 字符串类型的列名引发了ValueError。该方法现在独立于列名的类型 (GH33956 )
Passing NA into a format string using format specs will now work. For example "{:.1f}".format(pd.NA) would previously raise a ValueError, but will now return the string "<NA>" (GH34740)
Bug in Series.map() not raising on invalid na_action (GH32815)

MultiIndex#

DataFrame.swaplevels() 现在引发一个 TypeError 如果轴不是 MultiIndex 。以前的一个 AttributeError 被提了出来 (GH31126 )
窃听 Dataframe.loc() 当与 MultiIndex 。返回值与给定输入的顺序不同 (GH22797 )

In [80]: df = pd.DataFrame(np.arange(4),
   ....:                   index=[["a", "a", "b", "b"], [1, 2, 1, 2]])
   ....: 

# Rows are now ordered as the requested keys
In [81]: df.loc[(['b', 'a'], [2, 1]), :]
Out[81]: 
     0
b 2  3
  1  2
a 2  1
  1  0

[4 rows x 1 columns]

窃听 MultiIndex.intersection() 不能保证维持秩序 sort=False 。 (GH31325 )
窃听 DataFrame.truncate() 正在下降 MultiIndex 名字。 (GH34564 )

In [82]: left = pd.MultiIndex.from_arrays([["b", "a"], [2, 1]])

In [83]: right = pd.MultiIndex.from_arrays([["a", "b", "c"], [1, 2, 3]])

# Common elements are now guaranteed to be ordered by the left side
In [84]: left.intersection(right, sort=False)
Out[84]: 
MultiIndex([('b', 2),
            ('a', 1)],
           )

连接两个节点时出现错误 MultiIndex 而不使用不同的列指定Level。已忽略Return-Indexers参数。 (GH34074 )

IO#

传递一个 set as names argument to pandas.read_csv(), pandas.read_table(), or pandas.read_fwf() will raise ValueError: Names should be an ordered collection. (GH34946 )
打印输出出现BUG时 display.precision 是零。 (GH20359 )
窃听 read_json() 其中，当json包含大数字字符串时，会发生整数溢出。 (GH30320 )
read_csv() 现在将引发一个 ValueError 当争论的时候 header 和 prefix 两者都不是 None 。 (GH27394 )
窃听 DataFrame.to_json() 是在提高 NotFoundError 什么时候 path_or_buf 是S3 URI (GH28375 )
Bug in DataFrame.to_parquet() overwriting pyarrow's default for coerce_timestamps; following pyarrow's default allows writing nanosecond timestamps with version="2.0" (GH31652).
窃听 read_csv() 是在提高 TypeError 什么时候 sep=None 被用于与 comment 关键字 (GH31396 )
窃听 HDFStore 这导致它设置为 int64 的数据类型。 datetime64 列在读取 DataFrame 在Python3中由用Python2编写的固定格式 (GH31750 )
read_sas() 现在处理的日期和日期时间大于 Timestamp.max 将它们作为 datetime.datetime 对象 (GH20927 )
Bug in DataFrame.to_json() where Timedelta objects would not be serialized correctly with date_format="iso" (GH28256)
read_csv() will raise a ValueError when the column names passed in parse_dates are missing in the Dataframe (GH31251)
窃听 read_excel() 其中，具有高代理项的UTF-8字符串将导致分段冲突 (GH23809 )
窃听 read_csv() 正在导致空文件上的文件描述符泄漏 (GH31488 )
窃听 read_csv() 在标题和数据行之间有空行时会导致段错误 (GH28071 )
窃听 read_csv() 在权限问题上引发误导性异常 (GH23784 )
窃听 read_csv() 是在筹集一个 IndexError 什么时候 header=None 和两个额外的数据列
窃听 read_sas() 是在筹集一个 AttributeError 从Google云存储读取文件时 (GH33069 )
窃听 DataFrame.to_sql() 其中一个 AttributeError 在保存超出范围的日期时引发 (GH26761 )
窃听 read_excel() 未正确处理OpenDocument文本单元格中的多个嵌入空格。 (GH32207 )
窃听 read_json() 是在提高 TypeError 当阅读一份 list 把布尔人变成了一个 Series 。 (GH31464 )
窃听 pandas.io.json.json_normalize() 其中位置由指定 record_path 不指向数组。 (GH26284 )
pandas.read_hdf() 加载不受支持的HDF文件时出现更明确的错误消息 (GH9539 )
窃听 read_feather() 是在筹集一个 ArrowIOError 读取S3或http文件路径时 (GH29055 )
Bug in to_excel() could not handle the column name render and was raising an KeyError (GH34331)
窃听 execute() 是在募集一个 ProgrammingError 对于某些DB-API驱动程序，当SQL语句包含 % 字符，且不存在任何参数 (GH34211 )
窃听 StataReader() 这导致在使用迭代器读取数据时，类别变量具有不同的数据类型。 (GH31544 )
HDFStore.keys() 现在有一个可选的 include 允许检索所有本机HDF5表名的参数 (GH29916 )
TypeError 提出的例外情况 read_csv() 和 read_table() 显示为 parser_f 当传递意外的关键字参数时 (GH25648 )
窃听 read_excel() 对于消耗臭氧层物质文件，删除0.0值 (GH27222 )
Bug in ujson.encode() was raising an OverflowError with numbers larger than sys.maxsize (GH34395)
窃听 HDFStore.append_to_multiple() 是在募集一个 ValueError 当 min_itemsize 参数已设置 (GH11238 )
窃听 create_table() 现在会在以下情况下引发错误 column 参数未在中指定 data_columns 在输入时 (GH28156 )
read_json() 现在可以从文件url读取以行分隔的json文件，同时 lines 和 chunksize 都准备好了。
Bug in DataFrame.to_sql() when reading DataFrames with -np.inf entries with MySQL now has a more explicit ValueError (GH34431)
大写文件扩展名未由Read_*函数解压缩的错误 (GH35164 )
Bug in read_excel() that was raising a TypeError when header=None and index_col is given as a list (GH31783)
Bug in read_excel() where datetime values are used in the header in a MultiIndex (GH34748)
read_excel() no longer takes **kwds arguments. This means that passing in the keyword argument chunksize now raises a TypeError (previously raised a NotImplementedError), while passing in the keyword argument encoding now raises a TypeError (GH34464)
窃听 DataFrame.to_records() 在时区感知中错误地丢失了时区信息 datetime64 列 (GH32535 )

标绘#

DataFrame.plot() 对于线条/条形，现在接受按词典显示的颜色 (GH8193 )。
窃听 DataFrame.plot.hist() 其中权重不适用于多个列 (GH33173 )
Bug in DataFrame.boxplot() and DataFrame.plot.boxplot() lost color attributes of medianprops, whiskerprops, capprops and boxprops (GH30346)
窃听 DataFrame.hist() 其中顺序为 column 参数被忽略 (GH29235 )
Bug in DataFrame.plot.scatter() that when adding multiple plots with different cmap, colorbars always use the first cmap (GH33389)
窃听 DataFrame.plot.scatter() 在绘图中添加了颜色条，即使参数 c 被分配给包含颜色名称的列 (GH34316 )
窃听 pandas.plotting.bootstrap_plot() 造成了混乱的轴线和重叠的标签 (GH34905 )
窃听 DataFrame.plot.scatter() 绘制可变标记大小时出错 (GH32904 )

分组依据/重采样/滚动#

使用 pandas.api.indexers.BaseIndexer 使用 count ， min ， max ， median ， skew ， cov ， corr 现在将返回任何单调的正确结果 pandas.api.indexers.BaseIndexer 后代 (GH32865 )
DataFrameGroupby.mean() 和 SeriesGroupby.mean() (类似地，对于 median() ， std() 和 var() )现在提高一个 TypeError 如果向其传递了不可接受的关键字参数。以前的一个 UnsupportedFunctionCall 被提了出来 (AssertionError 如果 min_count 传给了 median() ) (GH31485 )
窃听 GroupBy.apply() 加薪 ValueError 当 by 轴未排序，具有重复项，并且应用了 func 不会发生变化，传入的对象 (GH30667 )
窃听 DataFrameGroupBy.transform() 使用变换函数产生错误的结果 (GH30918 )
窃听 Groupby.transform() 按多个键分组时返回错误结果，其中一些键是绝对的，另一些键不是 (GH32494 )
窃听 GroupBy.count() 当分组依据列包含NAN时导致分段错误 (GH32841 )
Bug in DataFrame.groupby() and Series.groupby() produces inconsistent type when aggregating Boolean Series (GH32894)
窃听 DataFrameGroupBy.sum() 和 SeriesGroupBy.sum() 其中，当非空值的数量低于 min_count 对于可为空的整型数据类型 (GH32861 )
窃听 SeriesGroupBy.quantile() 是在可为空的整数上提升 (GH33136 )
窃听 DataFrame.resample() 其中一个 AmbiguousTimeError 当生成的时区知道 DatetimeIndex 午夜时分有一个夏令时过渡 (GH25758 )
Bug in DataFrame.groupby() where a ValueError would be raised when grouping by a categorical column with read-only categories and sort=False (GH33410)
窃听 GroupBy.agg() ， GroupBy.transform() ，以及 GroupBy.resample() 其中不保留子类 (GH28330 )
窃听 SeriesGroupBy.agg() 的命名聚合中接受任何列名。 SeriesGroupBy 之前。这种行为现在只允许 str 而其他可计算的资金将会增加 TypeError 。 (GH34422 )
窃听 DataFrame.groupby() 丢掉了这个名字 Index 当其中一个 agg 引用的密钥列表为空 (GH32580 )
窃听 Rolling.apply() 哪里 center=True 在以下情况下被忽略 engine='numba' 被指定为 (GH34784 )
窃听 DataFrame.ewm.cov() 是在抛出 AssertionError 为 MultiIndex 输入 (GH34440 )
窃听 core.groupby.DataFrameGroupBy.quantile() 已提高 TypeError 用于非数值类型，而不是删除列 (GH27892 )
Bug in core.groupby.DataFrameGroupBy.transform() when func='nunique' and columns are of type datetime64, the result would also be of type datetime64 instead of int64 (GH35109)
Bug in DataFrame.groupby() raising an AttributeError when selecting a column and aggregating with as_index=False (GH35246).
Bug in DataFrameGroupBy.first() and DataFrameGroupBy.last() that would raise an unnecessary ValueError when grouping on multiple Categoricals (GH34951)

重塑#

影响不返回子类数据类型的所有数值和布尔约简方法的错误。 (GH25596 )
窃听 DataFrame.pivot_table() 仅当 MultiIndexed 列已设置 (GH17038 )
窃听 DataFrame.unstack() 和 Series.unstack() 中可以接受元组名称 MultiIndexed 数据 (GH19966 )
窃听 DataFrame.pivot_table() 什么时候 margin 是 True 而且只有 column 被定义为 (GH31016 )
修复了中的错误错误消息 DataFrame.pivot() 什么时候 columns 设置为 None 。 (GH30924 )
窃听 crosstab() 当输入为两个时 Series 并具有元组名称，则输出将保留一个哑元 MultiIndex 作为栏目。 (GH18321 )
DataFrame.pivot() 现在可以获取以下内容的列表 index 和 columns 论据 (GH21425 )
Bug in concat() where the resulting indices are not copied when copy=True (GH29879)
窃听 SeriesGroupBy.aggregate() 会导致聚合在共享相同名称时被覆盖 (GH30880 )
BUG在哪里 Index.astype() 就会失去 name 属性进行转换 Float64Index 至 Int64Index ，或者在强制转换为 ExtensionArray 数据类型 (GH32013 )
Series.append() will now raise a TypeError when passed a DataFrame or a sequence containing DataFrame (GH31413)
DataFrame.replace() 和 Series.replace() 将引发一个 TypeError 如果 to_replace 不是预期的类型。在此之前 replace 会默默地失败 (GH18634 )
的就地操作上的错误 Series 这就是向 DataFrame 从最初投放的位置(使用 inplace=True ) (GH30484 )
窃听 DataFrame.apply() 调用回调的位置 Series 参数，即使 raw=True 已请求。 (GH32423 )
窃听 DataFrame.pivot_table() 创建时丢失时区信息 MultiIndex 具有时区感知数据类型的列中的级别 (GH32558 )
Bug in concat() where when passing a non-dict mapping as objs would raise a TypeError (GH32863)
DataFrame.agg() 现在提供了更具描述性的 SpecificationError 尝试聚合不存在的列时出现消息 (GH32755 )
窃听 DataFrame.unstack() 什么时候 MultiIndex 柱和 MultiIndex 使用行 (GH32624 ， GH24729 和 GH28306 )
Appending a dictionary to a DataFrame without passing ignore_index=True will raise TypeError: Can only append a dict if ignore_index=True instead of TypeError: Can only append a :class:`Series` if ignore_index=True or if the :class:`Series` has a name (GH30871)
窃听 DataFrame.corrwith() ， DataFrame.memory_usage() ， DataFrame.dot() ， DataFrame.idxmin() ， DataFrame.idxmax() ， DataFrame.duplicated() ， DataFrame.isin() ， DataFrame.count() ， Series.explode() ， Series.asof() 和 DataFrame.asof() 不返回子类类型。 (GH31331 )
窃听 concat() 不允许串联 DataFrame 和 Series 使用重复的关键点 (GH33654 )
窃听 cut() 引发错误，当参数 labels 包含重复项 (GH33141 )
Ensure only named functions can be used in eval() (GH32460)
窃听 Dataframe.aggregate() 和 Series.aggregate() 在某些情况下会导致递归循环 (GH34224 )
Fixed bug in melt() where melting MultiIndex columns with col_level > 0 would raise a KeyError on id_vars (GH34129)
窃听 Series.where() 带着一个空的 Series 空荡荡的 cond 具有非布尔数据类型 (GH34592 )
修复了以下情况下的回归问题 DataFrame.apply() 会募集到 ValueError 对于具有的元素 S 数据类型 (GH34529 )

稀疏#

创建一个 SparseArray 在删除时区信息之前，可识别时区的dtype将发出警告，而不是静默执行此操作 (GH32501 )
窃听 arrays.SparseArray.from_spmatrix() 误读Scipy稀疏矩阵 (GH31991 )
Bug in Series.sum() with SparseArray raised a TypeError (GH25777)
BUG在哪里 DataFrame 包含全稀疏的 SparseArray 充满了 NaN 当按类似列表的 (GH27781 ， GH29563 )
对……的反响 SparseDtype 现在包括其 fill_value 属性。之前它使用的是 fill_value 的字符串表示形式 (GH34352 )
Bug where empty DataFrame could not be cast to SparseDtype (GH33113)
窃听 arrays.SparseArray() 为稀疏数据帧编制索引时返回了错误的类型 (GH34526 ， GH34540 )

ExtensionArray#

修复了以下错误 Series.value_counts() 将在输入为空时引发 Int64 数据类型 (GH33317 )
修复了中的错误 concat() 在连接时 DataFrame 具有非重叠列的对象会生成对象数据类型的列，而不是保留扩展数据类型 (GH27692 ， GH33027 )
Fixed bug where StringArray.isna() would return False for NA values when pandas.options.mode.use_inf_as_na was set to True (GH33655)
修复了中的错误 Series 使用EA数据类型和索引但没有数据或标量数据的构造失败 (GH26469 )
修复了导致 Series.__repr__() 对于元素为多维数组的扩展类型崩溃 (GH33770 )。
修复了以下错误 Series.update() 会引发一个 ValueError 为 ExtensionArray 缺少值的数据类型 (GH33980 )
修复了以下错误 StringArray.memory_usage() 未实现 (GH33963 )
修复了以下错误 DataFrameGroupBy() 会忽略 min_count 可为空的布尔数据类型的聚合的参数 (GH34051 )
修复了以下错误： DataFrame 使用 dtype='string' 会失败的 (GH27953 ， GH33623 )
BUG在哪里 DataFrame 设置为标量扩展类型的列被视为对象类型，而不是扩展类型 (GH34832 )
修复了中的错误 IntegerArray.astype() 也要正确复制蒙版 (GH34931 )。

其他#

在对象数据类型上设置操作 Index 现在总是返回对象数据类型结果 (GH31401 )
Fixed pandas.testing.assert_series_equal() to correctly raise if the left argument is a different subclass with check_series_type=True (GH32670).
Getting a missing attribute in a DataFrame.query() or DataFrame.eval() string raises the correct AttributeError (GH32408)
Fixed bug in pandas.testing.assert_series_equal() where dtypes were checked for Interval and ExtensionArray operands when check_dtype was False (GH32747)
窃听 DataFrame.__dir__() 在列名中使用Unicode代理时导致段错误 (GH25509 )
窃听 DataFrame.equals() 和 Series.equals() 在允许子类相等时 (GH34402 )。

贡献者#

共有368人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

3vts +
A Brooks +
Abbie Popa +
Achmad Syarif Hidayatullah +
Adam W Bagaskarta +
Adrian Mastronardi +
Aidan Montare +
Akbar Septriyan +
Akos Furton +
Alejandro Hall +
Alex Hall +
Alex Itkes +
Alex Kirko
Ali McMaster +
Alvaro Aleman +
Amy Graham +
Andrew Schonfeld +
Andrew Shumanskiy +
Andrew Wieteska +
Angela Ambroz
Anjali Singh +
Anna Daglis
Anthony Milbourne +
Antony Lee +
Ari Sosnovsky +
Arkadeep Adhikari +
Arunim Samudra +
Ashkan +
Ashwin Prakash Nalwade +
Ashwin Srinath +
Atsushi Nukariya +
Ayappan +
Ayla Khan +
Bart +
Bart Broere +
Benjamin Beier Liu +
Benjamin Fischer +
Bharat Raghunathan
Bradley Dice +
Brendan Sullivan +
Brian Strand +
Carsten van Weelden +
Chamoun Saoma +
ChrisRobo +
Christian Chwala
Christopher Whelan
Christos Petropoulos +
Chuanzhu Xu
CloseChoice +
Clément Robert +
CuylenE +
DanBasson +
Daniel Saxton
Danilo Horta +
DavaIlhamHaeruzaman +
Dave Hirschfeld
Dave Hughes
David Rouquet +
David S +
Deepyaman Datta
Dennis Bakhuis +
Derek McCammond +
Devjeet Roy +
Diane Trout
Dina +
Dom +
Drew Seibert +
EdAbati
Emiliano Jordan +
Erfan Nariman +
Eric Groszman +
Erik Hasse +
Erkam Uyanik +
Evan D +
Evan Kanter +
Fangchen Li +
Farhan Reynaldo +
Farhan Reynaldo Hutabarat +
Florian Jetter +
Fred Reiss +
GYHHAHA +
Gabriel Moreira +
Gabriel Tutui +
Galuh Sahid
Gaurav Chauhan +
George Hartzell +
Gim Seng +
Giovanni Lanzani +
Gordon Chen +
Graham Wetzler +
Guillaume Lemaitre
Guillem Sánchez +
HH-MWB +
Harshavardhan Bachina
How Si Wei
Ian Eaves
Iqrar Agalosi Nureyza +
Irv Lustig
Iva Laginja +
JDkuba
Jack Greisman +
Jacob Austin +
Jacob Deppen +
Jacob Peacock +
Jake Tae +
Jake Vanderplas +
James Cobon-Kerr
Jan Červenka +
Jan Škoda
Jane Chen +
Jean-Francois Zinque +
Jeanderson Barros Candido +
Jeff Reback
Jered Dominguez-Trujillo +
Jeremy Schendel
Jesse Farnham
Jiaxiang
Jihwan Song +
Joaquim L. Viegas +
Joel Nothman
John Bodley +
John Paton +
Jon Thielen +
Joris Van den Bossche
Jose Manuel Martí +
Joseph Gulian +
Josh Dimarsky
Joy Bhalla +
João Veiga +
Julian de Ruiter +
Justin Essert +
Justin Zheng
KD-dev-lab +
Kaiqi Dong
Karthik Mathur +
Kaushal Rohit +
Kee Chong Tan
Ken Mankoff +
Kendall Masse
Kenny Huynh +
Ketan +
Kevin Anderson +
Kevin Bowey +
Kevin Sheppard
Kilian Lieret +
Koki Nishihara +
Krishna Chivukula +
KrishnaSai2020 +
Lesley +
Lewis Cowles +
Linda Chen +
Linxiao Wu +
Lucca Delchiaro Costabile +
MBrouns +
Mabel Villalba
Mabroor Ahmed +
Madhuri Palanivelu +
Mak Sze Chun
Malcolm +
Marc Garcia
Marco Gorelli
Marian Denes +
Martin Bjeldbak Madsen +
Martin Durant +
Martin Fleischmann +
Martin Jones +
Martin Winkel
Martina Oefelein +
Marvzinc +
María Marino +
Matheus Cardoso +
Mathis Felardos +
Matt Roeschke
Matteo Felici +
Matteo Santamaria +
Matthew Roeschke
Matthias Bussonnier
Max Chen
Max Halford +
Mayank Bisht +
Megan Thong +
Michael Marino +
Miguel Marques +
Mike Kutzma
Mohammad Hasnain Mohsin Rajan +
Mohammad Jafar Mashhadi +
MomIsBestFriend
Monica +
Natalie Jann
Nate Armstrong +
Nathanael +
Nick Newman +
Nico Schlömer +
Niklas Weber +
ObliviousParadigm +
Olga Lyashevska +
OlivierLuG +
Pandas Development Team
Parallels +
Patrick +
Patrick Cando +
Paul Lilley +
Paul Sanders +
Pearcekieser +
Pedro Larroy +
Pedro Reys
Peter Bull +
Peter Steinbach +
Phan Duc Nhat Minh +
Phil Kirlin +
Pierre-Yves Bourguignon +
Piotr Kasprzyk +
Piotr Niełacny +
Prakhar Pandey
Prashant Anand +
Puneetha Pai +
Quang Nguyễn +
Rafael Jaimes III +
Rafif +
RaisaDZ +
Rakshit Naidu +
Ram Rachum +
Red +
Ricardo Alanis +
Richard Shadrach +
Rik-de-Kort
Robert de Vries
Robin to Roxel +
Roger Erens +
Rohith295 +
Roman Yurchak
Ror +
Rushabh Vasani
Ryan
Ryan Nazareth
SAI SRAVAN MEDICHERLA +
SHUBH CHATTERJEE +
Sam Cohan
Samira-g-js +
Sandu Ursu +
Sang Agung +
SanthoshBala18 +
Sasidhar Kasturi +
SatheeshKumar Mohan +
Saul Shanabrook
Scott Gigante +
Sebastian Berg +
Sebastián Vanrell
Sergei Chipiga +
Sergey +
ShilpaSugan +
Simon Gibbons
Simon Hawkins
Simon Legner +
Soham Tiwari +
Song Wenhao +
Souvik Mandal
Spencer Clark
Steffen Rehberg +
Steffen Schmitz +
Stijn Van Hoey
Stéphan Taljaard
SultanOrazbayev +
Sumanau Sareen
SurajH1 +
Suvayu Ali +
Terji Petersen
Thomas J Fan +
Thomas Li
Thomas Smith +
Tim Swast
Tobias Pitters +
Tom +
Tom Augspurger
Uwe L. Korn
Valentin Iovene +
Vandana Iyer +
Venkatesh Datta +
Vijay Sai Mutyala +
Vikas Pandey
Vipul Rai +
Vishwam Pandya +
Vladimir Berkutov +
Will Ayd
Will Holmgren
William +
William Ayd
Yago González +
Yosuke KOBAYASHI +
Zachary Lawrence +
Zaky Bilfagih +
Zeb Nicholls +
alimcmaster1
alm +
andhikayusup +
andresmcneill +
avinashpancham +
benabel +
bernie gray +
biddwan09 +
brock +
chris-b1
cleconte987 +
dan1261 +
david-cortes +
davidwales +
dequadras +
dhuettenmoser +
dilex42 +
elmonsomiat +
epizzigoni +
fjetter
gabrielvf1 +
gdex1 +
gfyoung
guru kiran +
h-vishal
iamshwin
jamin-aws-ospo +
jbrockmendel
jfcorbett +
jnecus +
kernc
kota matsuoka +
kylekeppler +
leandermaben +
link2xt +
manoj_koneni +
marydmit +
masterpiga +
maxime.song +
mglasder +
moaraccounts +
mproszewska
neilkg
nrebena
ossdev07 +
paihu
pan Jacek +
partev +
patrick +
pedrooa +
pizzathief +
proost
pvanhauw +
rbenes
rebecca-palmer
rhshadrach +
rjfs +
s-scherrer +
sage +
sagungrp +
salem3358 +
saloni30 +
smartswdeveloper +
smartvinnetou +
themien +
timhunderwood +
tolhassianipar +
tonywu1999
tsvikas
tv3141
venkateshdatta1993 +
vivikelapoutre +
willbowditch +
willpeppo +
za +
zaki-indra +

1.1.1中的新特性(2020年8月20日)

1.0.5中的新特性(2020年6月17日)

1.1.0中的新特性(2020年7月28日)#

增强#

Loc引发的KeyErrors指定缺少标签#

所有数据类型现在都可以转换为 StringDtype#

非单调周期索引部分字符串切片#

比较两个 DataFrame 或者两个 Series 并总结了其中的差异#

允许GROUPBY密钥中的NA#

使用键进行排序#

时间戳构造函数中的折叠参数支持#

解析TO_DATETIME中具有不同时区的时区感知格式#

Grouper和Resample现在支持参数Origin和Offset#

Fsspec现在用于文件系统处理#

其他增强功能#

值得注意的错误修复#

MultiIndex.get_indexer 解读 method 正确论证#

失败的基于标签的查找总是引发KeyError#

多索引提升键错误上的整数查找失败#

DataFrame.merge() 保留右侧框架的行顺序#

当某些列不存在时分配给DataFrame的多个列#

跨分组缩减的一致性#

agg() 丢失的结果 as_index=False 重新标记列时#

应用和应用地图 DataFrame 仅计算第一行/第一列一次#

向后不兼容的API更改#

已添加 check_freq 参数为 testing.assert_frame_equal 和 testing.assert_series_equal#

提高了依赖项的最低版本#

发展变化#

不推荐使用#

性能改进#

错误修复#

直截了当的#

类似日期的#

Timedelta#

时区#

数字#

转换#

字符串#

间隔#

标引#

丢失#

MultiIndex#

IO#

标绘#

分组依据/重采样/滚动#

重塑#

稀疏#

ExtensionArray#

其他#

贡献者#

所有数据类型现在都可以转换为 `StringDtype`#

比较两个 `DataFrame` 或者两个 `Series` 并总结了其中的差异#

`MultiIndex.get_indexer` 解读 `method` 正确论证#

`DataFrame.merge()` 保留右侧框架的行顺序#

`agg()` 丢失的结果 `as_index=False` 重新标记列时#

应用和应用地图 `DataFrame` 仅计算第一行/第一列一次#

已添加 `check_freq` 参数为 `testing.assert_frame_equal` 和 `testing.assert_series_equal`#