RFC 76:OGR Python驱动程序

作者:

甚至鲁奥

联系方式:

even.rouault@spatialys.com网站

起动:

2019年11月5日

上次更新时间:

2019年11月15日

状态:

采用,在GDAL3.1中实现

总结

这个RFC增加了用Python编写OGR/vector驱动程序的功能。

动机

对于一些不需要照明速度或处理非常小的格式(可能是内部格式)的用例,在Python中编写矢量驱动器可能比在当前需要的GDAL C++驱动程序或ad-hoc转换器更快、更高效。

备注

QGIS现在有了一种创建基于Python的提供程序的方法,例如在https://github.com/QGIS/QGIS/blob/master/tests/src/Python/provider_Python.py中,在GDAL本身中有一种方法,还允许其他基于GDAL/OGR的工具使用OGR Python驱动程序。

这是怎么回事?

驾驶员登记

驱动程序注册机制被扩展为在专用目录中查找.py脚本:

  • GDAL_PYTHON_DRIVER_PATH 配置选项(可能有多条路径由 : 关于UNIX或 ; 在Windows上)

  • 如果未定义,则 GDAL_DRIVER_PATH 配置选项。

  • 如果未定义,则在本机插件所在的目录中(在Unix编译时硬编码)。

这些Python脚本必须在其第一行中设置至少2个指令:

  • # gdal: DRIVER_NAME = "short_name"

  • # gdal: DRIVER_SUPPORTED_API_VERSION = 1 . Currently only 1 supported. If the interface changed in a backward incompatible way, we would increment internally the supported API version number. This item enables us to check if we are able to "safely" load a Python driver. If a Python driver would support several API versions (not clear if that's really possible at that point), it might use an array syntax to indicate that, like [1,2]

  • # gdal: DRIVER_DCAP_VECTOR = "YES"

  • # gdal: DRIVER_DMD_LONGNAME = "my super plugin"

可选元数据,如 # gdal: DRIVER_DMD_EXTENSIONS# gdal: DRIVER_DMD_HELPTOPIC 可以定义(基本上,任何以 # gdal: DRIVER_

这些指令将以纯文本的方式解析,无需调用Python解释器,这是出于效率考虑,也是因为我们希望尽可能延迟Python解释器的研究或启动(QGIS使用GDAL时的典型用例:我们希望确保QGIS已经启动Python,以便重用那个Python解释器)

从短元数据中,驱动程序注册代码可以实例化GDAdvalsC++对象。当在该对象上调用标识()或Open-()方法时,C++代码将:

  • 如果尚未完成,请查找Python符号或启动Python(有关更多详细信息,请参见下面的段落)

  • 如果尚未完成,请将.py文件作为Python模块加载

  • 如果尚未完成,则实例化从 gdal_python_driver.BaseDriver

  • 打电话给 identifyopen 方法,具体取决于发起的API调用。

这个 open 方法将返回一个Python BaseDataset 对象,具有将由相应的GDAL API调用调用的必需和可选方法。同样,对于 BaseLayer 对象。见 example.

与Python解释器的连接

逻辑将与用Python函数编写的VRT像素函数共享,它依赖于运行时链接到进程中已经可用的Python符号(例如Python可执行文件或二进制嵌入Python并使用GDAL,例如QGIS),或者在没有找到Python符号的情况下加载Python库,而不是编译时链接。原因是我们事先不知道哪个Python版本的GDAL可能被链接,我们也不希望GDAL.so/GDAL.dll与特定的Python库显式链接。

这是嵌入和扩展Python。

步骤如下:

  1. 通过Unix上的dlopen()+dlsym()和Windows上的EnumProcessModules()+getprocadaddress(),查找Python符号。如果找到了,就用它。例如,如果GDAL是从Python模块(gdalpythonbindings、rasterio等)或启动Python解释器的QGIS之类的应用程序中使用的,则会出现这种情况。

  2. 否则,请查找PYTHONSO环境变量,该变量应指向pythonX.Y [...] .so/.dll

  3. 否则,在路径中查找python二进制文件,并尝试标识对应的python.so/.dll

  4. 否则,请尝试使用众所周知的Python.so/.dll的dlopen()/LoadLibrary()名称加载

对杏仁核的影响

它们是最小的。GDALAllRegister()方法添加了对GDALDriverManager::AutoLoadPythonDrivers()的调用,该调用实现了上述逻辑。扩展了GDAdvReall类以支持一个新的函数指针,标识(),它由加载Python代码的C++ SIMM使用。

int                 (*pfnIdentifyEx)( GDALDriver*, GDALOpenInfo * );

这个扩展的IdentifyEx()函数指针,它添加了GDALDriver * argument, is used in priority by GDALIdentify() and GDALOpen() methods. The need for that is purely boring. For normal C++ drivers, there is no need to pass the driver, as there is a one-to-one correspondence between a driver and the function that implements the driver. But for the Python driver, there is a single C++ method that does the interface with the Python Identify() method of several Python drivers, hence the need of a GDALDriver* 参数将调用转发到相应的驱动程序。

这样一个司机的例子

注意,连接字符串中驱动程序名的前缀绝对不是必需的,而是特定于这个特定驱动程序的某种东西,这有点人为。下面提到的CityJSON驱动程序不需要它。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This code is in the public domain, so as to serve as a template for
# real-world plugins.
# or, at the choice of the licensee,
# Copyright 2019 Even Rouault
# SPDX-License-Identifier: MIT

# Metadata parsed by GDAL C++ code at driver pre-loading, starting with '# gdal: '
# Required and with that exact syntax since it is parsed by non-Python
# aware code. So just literal values, no expressions, etc.
# gdal: DRIVER_NAME = "DUMMY"
# API version(s) supported. Must include 1 currently
# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
# gdal: DRIVER_DCAP_VECTOR = "YES"
# gdal: DRIVER_DMD_LONGNAME = "my super plugin"

# Optional driver metadata items.
# # gdal: DRIVER_DMD_EXTENSIONS = "ext1 est2"
# # gdal: DRIVER_DMD_HELPTOPIC = "http://example.com/my_help.html"

# The gdal_python_driver module is defined by the GDAL library at runtime
from gdal_python_driver import BaseDriver, BaseDataset, BaseLayer

class Layer(BaseLayer):
    def __init__(self):

        # Reserved attribute names. Either those or the corresponding method
        # must be defined
        self.name = 'my_layer'  # Required, or name() method

        self.fid_name = 'my_fid'  # Optional

        self.fields = [{'name': 'boolField', 'type': 'Boolean'},
                    {'name': 'int16Field', 'type': 'Integer16'},
                    {'name': 'int32Field', 'type': 'Integer'},
                    {'name': 'int64Field', 'type': 'Integer64'},
                    {'name': 'realField', 'type': 'Real'},
                    {'name': 'floatField', 'type': 'Float'},
                    {'name': 'strField', 'type': 'String'},
                    {'name': 'strNullField', 'type': 'String'},
                    {'name': 'strUnsetField', 'type': 'String'},
                    {'name': 'binaryField', 'type': 'Binary'},
                    {'name': 'timeField', 'type': 'Time'},
                    {'name': 'dateField', 'type': 'Date'},
                    {'name': 'datetimeField', 'type': 'DateTime'}]  # Required, or fields() method

        self.geometry_fields = [{'name': 'geomField',
                                'type': 'Point',  # optional
                                'srs': 'EPSG:4326'  # optional
                                }]  # Required, or geometry_fields() method

        self.metadata = {'foo': 'bar'}  # optional

        # uncomment if __iter__() honour self.attribute_filter
        #self.iterator_honour_attribute_filter = True

        # uncomment if __iter__() honour self.spatial_filter
        #self.iterator_honour_spatial_filter = True

        # uncomment if feature_count() honour self.attribute_filter
        #self.feature_count_honour_attribute_filter = True

        # uncomment if feature_count() honour self.spatial_filter
        #self.feature_count_honour_spatial_filter = True

        # End of reserved attribute names

        self.count = 5

    # Required, unless self.name attribute is defined
    # def name(self):
    #    return 'my_layer'

    # Optional. If not defined, fid name is 'fid'
    # def fid_name(self):
    #    return 'my_fid'

    # Required, unless self.geometry_fields attribute is defined
    # def geometry_fields(self):
    #    return [...]

    # Required, unless self.required attribute is defined
    # def fields(self):
    #    return [...]

    # Optional. Only to be usd if self.metadata field is not defined
    # def metadata(self, domain):
    #    if domain is None:
    #        return {'foo': 'bar'}
    #    return None

    # Optional. Called when self.attribute_filter is changed by GDAL
    # def attribute_filter_changed(self):
    #     # You may change self.iterator_honour_attribute_filter
    #     # or feature_count_honour_attribute_filter
    #     pass

    # Optional. Called when self.spatial_filter is changed by GDAL
    # def spatial_filter_changed(self):
    #     # You may change self.iterator_honour_spatial_filter
    #     # or feature_count_honour_spatial_filter
    #     pass

    # Optional
    def test_capability(self, cap):
        if cap == BaseLayer.FastGetExtent:
            return True
        if cap == BaseLayer.StringsAsUTF8:
            return True
        # if cap == BaseLayer.FastSpatialFilter:
        #    return False
        # if cap == BaseLayer.RandomRead:
        #    return False
        if cap == BaseLayer.FastFeatureCount:
            return self.attribute_filter is None and self.spatial_filter is None
        return False

    # Optional
    def extent(self, force_computation):
        return [2.1, 49, 3, 50]  # minx, miny, maxx, maxy

    # Optional.
    def feature_count(self, force_computation):
        # As we did not declare feature_count_honour_attribute_filter and
        # feature_count_honour_spatial_filter, the below case cannot happen
        # But this is to illustrate that you can callback the default implementation
        # if needed
        # if self.attribute_filter is not None or \
        #   self.spatial_filter is not None:
        #    return super(Layer, self).feature_count(force_computation)

        return self.count

    # Required. You do not need to handle the case of simultaneous iterators on
    # the same Layer object.
    def __iter__(self):
        for i in range(self.count):
            properties = {
                'boolField': True,
                'int16Field': 32767,
                'int32Field': i + 2,
                'int64Field': 1234567890123,
                'realField': 1.23,
                'floatField': 1.2,
                'strField': 'foo',
                'strNullField': None,
                'binaryField': b'\x01\x00\x02',
                'timeField': '12:34:56.789',
                'dateField': '2017-04-26',
                'datetimeField': '2017-04-26T12:34:56.789Z'}

            yield {"type": "OGRFeature",
                "id": i + 1,
                "fields": properties,
                "geometry_fields": {"geomField": "POINT(2 49)"},
                "style": "SYMBOL(a:0)" if i % 2 == 0 else None,
                }

    # Optional
    # def feature_by_id(self, fid):
    #    return {}


class Dataset(BaseDataset):

    # Optional, but implementations will generally need it
    def __init__(self, filename):
        # If the layers member is set, layer_count() and layer() will not be used
        self.layers = [Layer()]
        self.metadata = {'foo': 'bar'}

    # Optional, called on native object destruction
    def __del__(self):
        pass

    # Optional. Only to be usd if self.metadata field is not defined
    # def metadata(self, domain):
    #    if domain is None:
    #        return {'foo': 'bar'}
    #    return None

    # Required, unless a layers attribute is set in __init__
    # def layer_count(self):
    #    return len(self.layers)

    # Required, unless a layers attribute is set in __init__
    # def layer(self, idx):
    #    return self.layers[idx]


# Required: class deriving from BaseDriver
class Driver(BaseDriver):

    # Optional. Called the first time the driver is loaded
    def __init__(self):
        pass

    # Required
    def identify(self, filename, first_bytes, open_flags, open_options={}):
        return filename == 'DUMMY:'

    # Required
    def open(self, filename, first_bytes, open_flags, open_options={}):
        if not self.identify(filename, first_bytes, open_flags):
            return None
        return Dataset(filename)

其他示例:

限制和范围

  • 矢量和只读。当然,以后可以延长。

  • 插件的Python代码与构建在SWIG之上的OGR Python API之间没有连接。这似乎不可能以合理的方式实现。没有什么可以阻止人们使用GDAL/OGR/OSR Python API,但是OGR核心和Python代码之间交换的对象将不是OGR Python SWIG对象。一个典型的例子是,插件将以字符串(WKT、PROJSON或不推荐使用的PROJ.4字符串)的形式返回其CRS,而不是作为osgeo.osr.SpatialReference对象。但是可以使用osgeo.osr.SpatialReference API生成这个WKT字符串。

  • 此RFC不试图涵盖Python依赖项的管理。由用户来完成所需的“pip安装”或它使用的任何Python包管理解决方案。

  • Python“全局解释器锁”保存在Python驱动程序中,这是安全使用Python所必需的。因此,这种驱动程序的扩展是有限的。

  • 考虑到上述限制,这仍然是一个“实验性”特性,GDAL项目不接受将此类Python驱动程序包含在GDAL存储库中。这类似于QGIS项目的情况,该项目允许Python插件位于QGIS主存储库之外。如果QGIS插件要移动到主存储库,它必须被转换成C++。其原理是,Python代码的正确性可以在运行时检查,而C++从静态分析(编译时和其他检查程序)中受益。在GDAL的上下文中,这个基本原理也适用。GDAL驱动程序也受到OSS模糊基础结构的压力测试,这要求它们以C++编写。

  • C++和Python代码之间的接口可能会在GDAL特性发布之间中断。在这种情况下,我们将增加预期的API版本号,以避免加载不兼容的Python驱动程序。我们可能不会做出任何努力来处理不兼容(以前的)API版本的插件。

SWIG绑定更改

没有

安全影响

类似于GDAL现有的原生代码插件机制。如果用户定义了GDAL_PYTHON_DRIVER_PATH环境变量或GDAL_DRIVER_PATH,则在其中(或在{prefix}/lib/gdalplugins/PYTHON中作为回退)放置.py脚本,则将执行这些脚本。

但是,使用GDALOpen()或类似机制打开一个.py文件不会导致其执行,因此这对于正常的GDAL使用是安全的。

GDALúu NOúu AUTOLOAD compile time#define已经用于禁用本地插件的加载,它也很荣幸地禁用了Python插件的加载。

性能影响

如果研究位置中不存在.py脚本,则对GDALAllRegister()的性能影响应在噪声范围内。

向后兼容性

没有向后不兼容。只增加功能。

文档

将添加一个教程来解释如何编写这样一个Python驱动程序:https://github.com/rouault/gdal/blob/pythondrivers/gdal/doc/source/tutorials/vector_Python_driver.rst

测试

gdal autotest套件将使用上面的测试Python驱动程序进行扩展,并出现一些错误:https://github.com/rouault/gdal/blob/pythondrivers/autotest/ogr/ogr_pythondrivers.py

以前的讨论

过去曾在以下文章中讨论过此主题:

实施

候选实现可在https://github.com/rouault/gdal/tree/pythondrivers中找到

https://github.com/OSGeo/gdal/compare/master...rouault:pythondrivers

投票历史

  • +1名来自Evner、JukkaR、MateuzL、DanielM

  • -0来自SeanG

  • +0来自HowardB

信用

OpenGeoGroep赞助