RFC 76:OGR Python驱动程序
作者: |
甚至鲁奥 |
联系方式: |
even.rouault@spatialys.com网站 |
起动: |
2019年11月5日 |
上次更新时间: |
2019年11月15日 |
状态: |
采用,在GDAL3.1中实现 |
总结
这个RFC增加了用Python编写OGR/vector驱动程序的功能。
动机
对于一些不需要照明速度或处理非常小的格式(可能是内部格式)的用例,在Python中编写矢量驱动器可能比在当前需要的GDAL C++驱动程序或ad-hoc转换器更快、更高效。
备注
QGIS现在有了一种创建基于Python的提供程序的方法,例如在https://github.com/QGIS/QGIS/blob/master/tests/src/Python/provider_Python.py中,在GDAL本身中有一种方法,还允许其他基于GDAL/OGR的工具使用OGR Python驱动程序。
这是怎么回事?
驾驶员登记
驱动程序注册机制被扩展为在专用目录中查找.py脚本:
由
GDAL_PYTHON_DRIVER_PATH配置选项(可能有多条路径由 : 关于UNIX或 ; 在Windows上)如果未定义,则
GDAL_DRIVER_PATH配置选项。如果未定义,则在本机插件所在的目录中(在Unix编译时硬编码)。
这些Python脚本必须在其第一行中设置至少2个指令:
# gdal: DRIVER_NAME = "short_name"# gdal: DRIVER_SUPPORTED_API_VERSION = 1. Currently only 1 supported. If the interface changed in a backward incompatible way, we would increment internally the supported API version number. This item enables us to check if we are able to "safely" load a Python driver. If a Python driver would support several API versions (not clear if that's really possible at that point), it might use an array syntax to indicate that, like[1,2]# gdal: DRIVER_DCAP_VECTOR = "YES"# gdal: DRIVER_DMD_LONGNAME = "my super plugin"
可选元数据,如 # gdal: DRIVER_DMD_EXTENSIONS 或 # gdal: DRIVER_DMD_HELPTOPIC 可以定义(基本上,任何以 # gdal: DRIVER_
这些指令将以纯文本的方式解析,无需调用Python解释器,这是出于效率考虑,也是因为我们希望尽可能延迟Python解释器的研究或启动(QGIS使用GDAL时的典型用例:我们希望确保QGIS已经启动Python,以便重用那个Python解释器)
从短元数据中,驱动程序注册代码可以实例化GDAdvalsC++对象。当在该对象上调用标识()或Open-()方法时,C++代码将:
如果尚未完成,请查找Python符号或启动Python(有关更多详细信息,请参见下面的段落)
如果尚未完成,请将.py文件作为Python模块加载
如果尚未完成,则实例化从
gdal_python_driver.BaseDriver打电话给
identify和open方法,具体取决于发起的API调用。
这个 open 方法将返回一个Python BaseDataset 对象,具有将由相应的GDAL API调用调用的必需和可选方法。同样,对于 BaseLayer 对象。见 example.
与Python解释器的连接
逻辑将与用Python函数编写的VRT像素函数共享,它依赖于运行时链接到进程中已经可用的Python符号(例如Python可执行文件或二进制嵌入Python并使用GDAL,例如QGIS),或者在没有找到Python符号的情况下加载Python库,而不是编译时链接。原因是我们事先不知道哪个Python版本的GDAL可能被链接,我们也不希望GDAL.so/GDAL.dll与特定的Python库显式链接。
这是嵌入和扩展Python。
步骤如下:
通过Unix上的dlopen()+dlsym()和Windows上的EnumProcessModules()+getprocadaddress(),查找Python符号。如果找到了,就用它。例如,如果GDAL是从Python模块(gdalpythonbindings、rasterio等)或启动Python解释器的QGIS之类的应用程序中使用的,则会出现这种情况。
否则,请查找PYTHONSO环境变量,该变量应指向pythonX.Y [...] .so/.dll
否则,在路径中查找python二进制文件,并尝试标识对应的python.so/.dll
否则,请尝试使用众所周知的Python.so/.dll的dlopen()/LoadLibrary()名称加载
对杏仁核的影响
它们是最小的。GDALAllRegister()方法添加了对GDALDriverManager::AutoLoadPythonDrivers()的调用,该调用实现了上述逻辑。扩展了GDAdvReall类以支持一个新的函数指针,标识(),它由加载Python代码的C++ SIMM使用。
int (*pfnIdentifyEx)( GDALDriver*, GDALOpenInfo * );
这个扩展的IdentifyEx()函数指针,它添加了GDALDriver * argument, is used in priority by GDALIdentify() and GDALOpen() methods. The need for that is purely boring. For normal C++ drivers, there is no need to pass the driver, as there is a one-to-one correspondence between a driver and the function that implements the driver. But for the Python driver, there is a single C++ method that does the interface with the Python Identify() method of several Python drivers, hence the need of a GDALDriver* 参数将调用转发到相应的驱动程序。
这样一个司机的例子
注意,连接字符串中驱动程序名的前缀绝对不是必需的,而是特定于这个特定驱动程序的某种东西,这有点人为。下面提到的CityJSON驱动程序不需要它。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This code is in the public domain, so as to serve as a template for
# real-world plugins.
# or, at the choice of the licensee,
# Copyright 2019 Even Rouault
# SPDX-License-Identifier: MIT
# Metadata parsed by GDAL C++ code at driver pre-loading, starting with '# gdal: '
# Required and with that exact syntax since it is parsed by non-Python
# aware code. So just literal values, no expressions, etc.
# gdal: DRIVER_NAME = "DUMMY"
# API version(s) supported. Must include 1 currently
# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
# gdal: DRIVER_DCAP_VECTOR = "YES"
# gdal: DRIVER_DMD_LONGNAME = "my super plugin"
# Optional driver metadata items.
# # gdal: DRIVER_DMD_EXTENSIONS = "ext1 est2"
# # gdal: DRIVER_DMD_HELPTOPIC = "http://example.com/my_help.html"
# The gdal_python_driver module is defined by the GDAL library at runtime
from gdal_python_driver import BaseDriver, BaseDataset, BaseLayer
class Layer(BaseLayer):
def __init__(self):
# Reserved attribute names. Either those or the corresponding method
# must be defined
self.name = 'my_layer' # Required, or name() method
self.fid_name = 'my_fid' # Optional
self.fields = [{'name': 'boolField', 'type': 'Boolean'},
{'name': 'int16Field', 'type': 'Integer16'},
{'name': 'int32Field', 'type': 'Integer'},
{'name': 'int64Field', 'type': 'Integer64'},
{'name': 'realField', 'type': 'Real'},
{'name': 'floatField', 'type': 'Float'},
{'name': 'strField', 'type': 'String'},
{'name': 'strNullField', 'type': 'String'},
{'name': 'strUnsetField', 'type': 'String'},
{'name': 'binaryField', 'type': 'Binary'},
{'name': 'timeField', 'type': 'Time'},
{'name': 'dateField', 'type': 'Date'},
{'name': 'datetimeField', 'type': 'DateTime'}] # Required, or fields() method
self.geometry_fields = [{'name': 'geomField',
'type': 'Point', # optional
'srs': 'EPSG:4326' # optional
}] # Required, or geometry_fields() method
self.metadata = {'foo': 'bar'} # optional
# uncomment if __iter__() honour self.attribute_filter
#self.iterator_honour_attribute_filter = True
# uncomment if __iter__() honour self.spatial_filter
#self.iterator_honour_spatial_filter = True
# uncomment if feature_count() honour self.attribute_filter
#self.feature_count_honour_attribute_filter = True
# uncomment if feature_count() honour self.spatial_filter
#self.feature_count_honour_spatial_filter = True
# End of reserved attribute names
self.count = 5
# Required, unless self.name attribute is defined
# def name(self):
# return 'my_layer'
# Optional. If not defined, fid name is 'fid'
# def fid_name(self):
# return 'my_fid'
# Required, unless self.geometry_fields attribute is defined
# def geometry_fields(self):
# return [...]
# Required, unless self.required attribute is defined
# def fields(self):
# return [...]
# Optional. Only to be usd if self.metadata field is not defined
# def metadata(self, domain):
# if domain is None:
# return {'foo': 'bar'}
# return None
# Optional. Called when self.attribute_filter is changed by GDAL
# def attribute_filter_changed(self):
# # You may change self.iterator_honour_attribute_filter
# # or feature_count_honour_attribute_filter
# pass
# Optional. Called when self.spatial_filter is changed by GDAL
# def spatial_filter_changed(self):
# # You may change self.iterator_honour_spatial_filter
# # or feature_count_honour_spatial_filter
# pass
# Optional
def test_capability(self, cap):
if cap == BaseLayer.FastGetExtent:
return True
if cap == BaseLayer.StringsAsUTF8:
return True
# if cap == BaseLayer.FastSpatialFilter:
# return False
# if cap == BaseLayer.RandomRead:
# return False
if cap == BaseLayer.FastFeatureCount:
return self.attribute_filter is None and self.spatial_filter is None
return False
# Optional
def extent(self, force_computation):
return [2.1, 49, 3, 50] # minx, miny, maxx, maxy
# Optional.
def feature_count(self, force_computation):
# As we did not declare feature_count_honour_attribute_filter and
# feature_count_honour_spatial_filter, the below case cannot happen
# But this is to illustrate that you can callback the default implementation
# if needed
# if self.attribute_filter is not None or \
# self.spatial_filter is not None:
# return super(Layer, self).feature_count(force_computation)
return self.count
# Required. You do not need to handle the case of simultaneous iterators on
# the same Layer object.
def __iter__(self):
for i in range(self.count):
properties = {
'boolField': True,
'int16Field': 32767,
'int32Field': i + 2,
'int64Field': 1234567890123,
'realField': 1.23,
'floatField': 1.2,
'strField': 'foo',
'strNullField': None,
'binaryField': b'\x01\x00\x02',
'timeField': '12:34:56.789',
'dateField': '2017-04-26',
'datetimeField': '2017-04-26T12:34:56.789Z'}
yield {"type": "OGRFeature",
"id": i + 1,
"fields": properties,
"geometry_fields": {"geomField": "POINT(2 49)"},
"style": "SYMBOL(a:0)" if i % 2 == 0 else None,
}
# Optional
# def feature_by_id(self, fid):
# return {}
class Dataset(BaseDataset):
# Optional, but implementations will generally need it
def __init__(self, filename):
# If the layers member is set, layer_count() and layer() will not be used
self.layers = [Layer()]
self.metadata = {'foo': 'bar'}
# Optional, called on native object destruction
def __del__(self):
pass
# Optional. Only to be usd if self.metadata field is not defined
# def metadata(self, domain):
# if domain is None:
# return {'foo': 'bar'}
# return None
# Required, unless a layers attribute is set in __init__
# def layer_count(self):
# return len(self.layers)
# Required, unless a layers attribute is set in __init__
# def layer(self, idx):
# return self.layers[idx]
# Required: class deriving from BaseDriver
class Driver(BaseDriver):
# Optional. Called the first time the driver is loaded
def __init__(self):
pass
# Required
def identify(self, filename, first_bytes, open_flags, open_options={}):
return filename == 'DUMMY:'
# Required
def open(self, filename, first_bytes, open_flags, open_options={}):
if not self.identify(filename, first_bytes, open_flags):
return None
return Dataset(filename)
其他示例:
a PASSTHROUGH driver that forwards calls to the GDAL SWIG Python API: https://github.com/OSGeo/gdal/blob/master/examples/pydrivers/ogr_PASSTHROUGH.py
a driver implemented a simple parsing of CityJSON: https://github.com/OSGeo/gdal/blob/master/examples/pydrivers/ogr_CityJSON.py
限制和范围
矢量和只读。当然,以后可以延长。
插件的Python代码与构建在SWIG之上的OGR Python API之间没有连接。这似乎不可能以合理的方式实现。没有什么可以阻止人们使用GDAL/OGR/OSR Python API,但是OGR核心和Python代码之间交换的对象将不是OGR Python SWIG对象。一个典型的例子是,插件将以字符串(WKT、PROJSON或不推荐使用的PROJ.4字符串)的形式返回其CRS,而不是作为osgeo.osr.SpatialReference对象。但是可以使用osgeo.osr.SpatialReference API生成这个WKT字符串。
此RFC不试图涵盖Python依赖项的管理。由用户来完成所需的“pip安装”或它使用的任何Python包管理解决方案。
Python“全局解释器锁”保存在Python驱动程序中,这是安全使用Python所必需的。因此,这种驱动程序的扩展是有限的。
考虑到上述限制,这仍然是一个“实验性”特性,GDAL项目不接受将此类Python驱动程序包含在GDAL存储库中。这类似于QGIS项目的情况,该项目允许Python插件位于QGIS主存储库之外。如果QGIS插件要移动到主存储库,它必须被转换成C++。其原理是,Python代码的正确性可以在运行时检查,而C++从静态分析(编译时和其他检查程序)中受益。在GDAL的上下文中,这个基本原理也适用。GDAL驱动程序也受到OSS模糊基础结构的压力测试,这要求它们以C++编写。
C++和Python代码之间的接口可能会在GDAL特性发布之间中断。在这种情况下,我们将增加预期的API版本号,以避免加载不兼容的Python驱动程序。我们可能不会做出任何努力来处理不兼容(以前的)API版本的插件。
SWIG绑定更改
没有
安全影响
类似于GDAL现有的原生代码插件机制。如果用户定义了GDAL_PYTHON_DRIVER_PATH环境变量或GDAL_DRIVER_PATH,则在其中(或在{prefix}/lib/gdalplugins/PYTHON中作为回退)放置.py脚本,则将执行这些脚本。
但是,使用GDALOpen()或类似机制打开一个.py文件不会导致其执行,因此这对于正常的GDAL使用是安全的。
GDALúu NOúu AUTOLOAD compile time#define已经用于禁用本地插件的加载,它也很荣幸地禁用了Python插件的加载。
性能影响
如果研究位置中不存在.py脚本,则对GDALAllRegister()的性能影响应在噪声范围内。
向后兼容性
没有向后不兼容。只增加功能。
文档
将添加一个教程来解释如何编写这样一个Python驱动程序:https://github.com/rouault/gdal/blob/pythondrivers/gdal/doc/source/tutorials/vector_Python_driver.rst
测试
gdal autotest套件将使用上面的测试Python驱动程序进行扩展,并出现一些错误:https://github.com/rouault/gdal/blob/pythondrivers/autotest/ogr/ogr_pythondrivers.py
以前的讨论
过去曾在以下文章中讨论过此主题:
https://lists.osgeo.org/pipermail/gdal dev/2017年4月/thread.html#46526
https://lists.osgeo.org/pipermail/gdal dev/2018年11月/thread.html#49294
实施
候选实现可在https://github.com/rouault/gdal/tree/pythondrivers中找到
https://github.com/OSGeo/gdal/compare/master...rouault:pythondrivers
投票历史
+1名来自Evner、JukkaR、MateuzL、DanielM
-0来自SeanG
+0来自HowardB
信用
OpenGeoGroep赞助