正在合并数据#
在中组合数据集有两种方法 地貌熊猫 --属性连接和空间连接。
在属性联接中, GeoSeries 或 GeoDataFrame 与常规的 pandas.Series 或 pandas.DataFrame 基于一个公共变量。这类似于正常的合并或加入 熊猫 。
在空间连接中,来自两个 GeoSeries 或 GeoDataFrame 根据它们彼此之间的空间关系组合在一起。
在以下示例中,我们使用以下数据集:
In [1]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
In [2]: cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
# For attribute join
In [3]: country_shapes = world[['geometry', 'iso_a3']]
In [4]: country_names = world[['name', 'iso_a3']]
# For spatial join
In [5]: countries = world[['geometry', 'name']]
In [6]: countries = countries.rename(columns={'name':'country'})
追加#
追加 GeoDataFrame 和 GeoSeries 使用熊猫 append() 方法:研究方法。请记住,附加的几何图形列需要具有相同的CRS。
# Appending GeoSeries
In [7]: joined = world.geometry.append(cities.geometry)
# Appending GeoDataFrames
In [8]: europe = world[world.continent == 'Europe']
In [9]: asia = world[world.continent == 'Asia']
In [10]: eurasia = europe.append(asia)
属性联接#
属性联接是使用 merge() 方法。一般情况下,建议使用 merge() 从空间数据集中调用的方法。话虽如此,单机版 pandas.merge() 函数将在以下情况下工作 GeoDataFrame 是在 left 参数;如果 DataFrame 是在 left 参数和一个 GeoDataFrame 是在 right 位置,结果将不再是 GeoDataFrame 。
例如,考虑下面的合并,它将全名添加到 GeoDataFrame 最初只有每个国家的ISO代码,方法是将其与 DataFrame 。
# `country_shapes` is GeoDataFrame with country shapes and iso codes
In [11]: country_shapes.head()
Out[11]:
geometry iso_a3
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... FJI
1 POLYGON ((33.903711197 -0.950000000, 34.072620... TZA
2 POLYGON ((-8.665589565 27.656425890, -8.665124... ESH
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... CAN
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... USA
# `country_names` is DataFrame with country names and iso codes
In [12]: country_names.head()
Out[12]:
name iso_a3
0 Fiji FJI
1 Tanzania TZA
2 W. Sahara ESH
3 Canada CAN
4 United States of America USA
# Merge with `merge` method on shared variable (iso codes):
In [13]: country_shapes = country_shapes.merge(country_names, on='iso_a3')
In [14]: country_shapes.head()
Out[14]:
geometry iso_a3 name
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... FJI Fiji
1 POLYGON ((33.903711197 -0.950000000, 34.072620... TZA Tanzania
2 POLYGON ((-8.665589565 27.656425890, -8.665124... ESH W. Sahara
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... CAN Canada
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... USA United States of America
空间连接#
在空间连接中,两个几何图形对象根据它们彼此的空间关系进行合并。
# One GeoDataFrame of countries, one of Cities.
# Want to merge so we can get each city's country.
In [15]: countries.head()
Out[15]:
geometry country
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... Fiji
1 POLYGON ((33.903711197 -0.950000000, 34.072620... Tanzania
2 POLYGON ((-8.665589565 27.656425890, -8.665124... W. Sahara
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... Canada
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... United States of America
In [16]: cities.head()
Out[16]:
name geometry
0 Vatican City POINT (12.453386545 41.903282180)
1 San Marino POINT (12.441770158 43.936095835)
2 Vaduz POINT (9.516669473 47.133723774)
3 Luxembourg POINT (6.130002806 49.611660379)
4 Palikir POINT (158.149974324 6.916643696)
# Execute spatial join
In [17]: cities_with_country = cities.sjoin(countries, how="inner", predicate='intersects')
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 cities_with_country = cities.sjoin(countries, how="inner", predicate='intersects')
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
1905 def sjoin(self, df, *args, **kwargs):
1906 """Spatial join of two GeoDataFrames.
1907
1908 See the User Guide page :doc:`../../user_guide/mergingdata` for details.
(...)
1981 sjoin : equivalent top-level function
1982 """
-> 1983 return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
120 raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
128 return joined
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
212 input_geoms = right_df.geometry
213 else:
214 # all other predicates are symmetric
215 # keep them the same
--> 216 sindex = right_df.sindex
217 input_geoms = left_df.geometry
219 if sindex:
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
2655 @property
2656 def sindex(self):
2657 """Generate the spatial index
2658
2659 Creates R-tree spatial index based on ``pygeos.STRtree`` or
(...)
2704 [2]])
2705 """
-> 2706 return self.geometry.values.sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
288 @property
289 def sindex(self):
290 if self._sindex is None:
--> 291 self._sindex = _get_sindex_class()(self.data)
292 return self._sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
19 if compat.HAS_RTREE:
20 return RTreeIndex
---> 21 raise ImportError(
22 "Spatial indexes require either `rtree` or `pygeos`. "
23 "See installation instructions at https://geopandas.org/install.html"
24 )
ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
In [18]: cities_with_country.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 cities_with_country.head()
NameError: name 'cities_with_country' is not defined
GeoPandas提供了两个空间连接函数:
GeoDataFrame.sjoin():基于二元谓词(交集、包含等)的联接GeoDataFrame.sjoin_nearest():基于接近程度合并,并能够设置最大搜索半径。
备注
由于历史原因,这两种方法也可以作为顶级函数使用 sjoin() 和 sjoin_nearest() 。建议使用方法,因为这些函数在将来可能会被弃用。
二元谓词连接#
二进制谓词连接可通过 GeoDataFrame.sjoin() 。
GeoDataFrame.sjoin() 有两个核心论点: how 和 predicate 。
predicate
这个 predicate 参数指定如何 geopandas 根据对象的几何关系决定是否将一个对象的属性连接到另一个对象。
的值 predicate 对应于几何二元谓词的名称,并取决于空间索引实现。
中的默认空间索引 geopandas currently supports the following values for predicate which are defined in the Shapely documentation :
intersects
contains
within
touches
crosses
overlaps
how
这个 how 参数指定将发生的连接类型以及结果中保留的几何 GeoDataFrame 。它接受以下选项:
left:使用第一个(或 left_df )GeoDataFrame您提供给GeoDataFrame.sjoin();仅保留 left_df 几何图形列right:使用从秒开始的索引(或 right_df );只保留 right_df 几何图形列inner:使用两个索引值的交集GeoDataFrame;仅保留 left_df 几何图形列
注通过将几何运算与空间连接相结合,可以研究更复杂的空间关系。例如,要查找某个点的给定距离内的所有面,可以首先使用 buffer() 方法将每个点展开成一个具有适当半径的圆,然后将这些缓冲的圆与所讨论的多边形相交。
最近的连接#
基于邻近度的连接可以通过 GeoDataFrame.sjoin_nearest() 。
GeoDataFrame.sjoin_nearest() 分享 how 与……争论 GeoDataFrame.sjoin() ,并包括两个附加参数: max_distance 和 distance_col 。
max_distance
这个 max_distance 参数指定匹配几何图形的最大搜索半径。在某些情况下,这可能会对性能产生相当大的影响。如果可以,强烈建议您使用此参数。
distance_col
如果设置,则生成的GeoDataFrame将包括一个同名的列,该列包含输入几何图形和最近的几何图形之间的计算距离。