几何集合与多边形的交集
我正在使用Shapely来处理地理数据的交集,同时用Pyshp将结果写入一个shapefile。我对“交集”是怎么回事还不是很明白。
首先,我创建了一些点,并通过以下方式计算Voronoi多边形:
# (1)
from shapely import LineString, MultiPoint, Point, Polygon, MultiPolygon
from shapely.geometry import shape
from shapely.constructive import voronoi_polygons
from shapely import intersection
import shapefile
output_shapefile = "/tmp/simple_test_01.shp"
points = MultiPoint([Point(1,1), Point(2,2), Point(4,3), Point(2,4), Point(2,5), Point(6,5), Point(5,4), Point(7,1)])
myVoronoi = voronoi_polygons(points)
#>>> myVoronoi
#<GEOMETRYCOLLECTION (POLYGON ((-5 -5, -5 4.625, -4.5 4.5, 0 3, 4 -1, 4 -5, -...>
n = 0
with shapefile.Writer(output_shapefile, shapeType=5) as w:
w.field("ID", "N")
for geom in myVoronoi.geoms:
w.shape(geom)
w.record(n)
n += 1
当我在ArcMap上绘制这个shapefile时,得到的结果是这样的(绿色点是我标记的,不在shapefile中):
接着,我创建了一个多边形,用来和Voronoi多边形进行交集:
# (2)
polygon_to_intersect = Polygon([Point(0,0), Point(2,8), Point(4,3.7), Point(10,4.5), Point(5,-4), Point(0,0)])
output_shapefile = "/tmp/simple_test_02.shp"
with shapefile.Writer(output_shapefile, shapeType=5) as w:
w.field("ID", "N")
w.shape(polygon_to_intersect)
w.record(1)
在ArcMap中显示成这样(用红色轮廓标出):
然后我进行交集操作:
# (3)
intersected = intersection(myVoronoi, polygon_to_intersect)
>>> intersected
<POLYGON ((0.6 2.4, 0.75 3, 1.125 4.5, 2 8, 3.553 4.66, 3.739 4.261, 4 3.7, ...>
output_shapefile = "/tmp/simple_test_03.shp"
n = 0
with shapefile.Writer(output_shapefile) as w:
w.field("ID", "N")
w.shape(intersected)
w.record(n)
simple_test_03.shp是一个单一的多边形。它的顶点可以在这里看到:
我原本期待“交集”能给我Voronoi多边形和polygon_to_intersect的交集。然而,我得到的是polygon_to_intersect,带有新顶点,这些新顶点出现在多边形的外部与Voronoi多边形交叉的地方。
>>> intersected
<POLYGON ((0.6 2.4, 0.75 3, 1.125 4.5, 2 8, 3.553 4.66, 3.739 4.261, 4 3.7, ...>
为什么“交集”的结果是一个Polygon而不是MultiPolygon呢?
不过,如果我这样做:
# (4)
output_shapefile = "/tmp/simple_test_04.shp"
multipol = []
for geom in myVoronoi.geoms:
multipol.append(intersection(geom, polygon_to_intersect))
n = 0
with shapefile.Writer(output_shapefile) as w:
w.field("ID", "N")
for pol in multipol:
w.shape(pol)
w.record(n)
n += 1
我得到了这个结果,这正是我最开始想要的:
每个有颜色的区域都是与“polygon_to_intersect”交集的Voronoi多边形。
为什么我需要做(4)才能得到我想要的结果,而我原本期待(3)能这样工作?是因为myVoronoi是一个GEOMETRYCOLLECTION而不是POLYGON或MULTIPOLYGON吗?
我尝试在交集之前将myVoronoi转换为MultiPolygon:
# (5)
output_shapefile = "/tmp/simple_test_05.shp"
myVoronoi2 = MultiPolygon(myVoronoi)
>>> myVoronoi2
<MULTIPOLYGON (((-5 -5, -5 4.625, -4.5 4.5, 0 3, 4 -1, 4 -5, -5 -5)), ((-5 1...>
intersected2 = intersection(myVoronoi2, polygon_to_intersect)
>>> intersected2
<POLYGON ((0.6 2.4, 0.75 3, 1.125 4.5, 2 8, 3.553 4.66, 3.739 4.261, 4 3.7, ...>
n = 0
with shapefile.Writer(output_shapefile) as w:
w.field("ID", "N")
w.shape(intersected2)
w.record(n)
>>> intersected2 == intersected
True
但结果和(3)是一样的。
这是一个简化的例子。在我的实际情况中,“myVoronoi”有18000多个多边形,而“polygon_to_intersect”要大得多。(3)没有按我想要的方式工作,而(4)虽然有效,但性能问题很严重。
[编辑] 这是我的真实情况。我的Voronoi多边形:

一些放大:

更多放大:

而这是我的“polygon_to_intersect”(橙色轮廓):

再放大一些:

正如@Pieter提到的,交集是按照“并集语义”来工作的,这意味着我的Voronoi多边形首先被合并掉,然后再与我的polygon_to_intersect进行交集。
但这不是我想要的。我想要的是这样的结果(每个Voronoi多边形在结果文件中;其中一些在polygon_to_intersect的边界处交集):

所以我使用了这段代码:
output_shapefile = 'intersected.shp'
n = 0
w = 0
i = 0
skipped = 0
with shapefile.Writer(output_shapefile) as writer:
writer.field("ID", "N")
for poly in myVoronoi:
n += 1
if not (n % 500):
print(f'Processed {n}, within = {w}, intersected = {i}, skipped = {skipped}')
# If the Voronoi polygon is fully inside polygon_to_intersect, don't waste time intersecting
# Just write the polygon into the output file
if within(poly, polygon_to_intersect):
writer.shape(poly)
writer.record(n)
w += 1
else:
# Perform the intersection
intersected = poly.intersection(polygon_to_intersect)
if intersected:
writer.shape(intersected)
writer.record(n + 50000)
i += 1
else:
skipped += 1
这很好用,但完成这个操作几乎花了25分钟。myVoronoi中有18600多个多边形,而polygon_to_intersect有几十万个顶点。
Pieter建议使用Rtree在性能上没有太大帮助。
tree = shapely.STRtree(voronoi_polys)
intersecting_idx = tree.query(polygon_to_intersect, predicate="intersects")
intersections = shapely.intersection(voronoi_polys.take(intersecting_idx), polygon_to_intersect)
因为它执行的交集操作比我上面的代码要多得多。
所以我去创建了一个大大简化的“polygon_to_intersect”:

这个多边形只有76个顶点。
然后我修改了我的代码:
# Read the new simplified shapefile into memory
simplified_file = 'simplified.shp'
with shapefile.Reader(simplified_file, shapeType=5) as w:
for shapeRec in w.shapeRecords():
polygon_to_intersect_simplified = shape(shapeRec.shape)
prepare(polygon_to_intersect_simplified)
output_shapefile = 'intersected.shp'
n = 0
w = 0
i = 0
skipped = 0
with shapefile.Writer(output_shapefile) as writer:
writer.field("ID", "N")
for poly in myVoronoi:
n += 1
if not (n % 500):
print(f'Processed {n}, within = {w}, intersected = {i}, skipped = {skipped}')
# If the Voronoi polygon is fully inside the simplified shape, don't waste time intersecting
# Just write the polygon into the output file
if within(poly, polygon_to_intersect_simplified):
writer.shape(poly)
writer.record(n)
w += 1
else:
# Perform the intersection with the real (not simplified) polygon
intersected = poly.intersection(polygon_to_intersect)
if intersected:
writer.shape(intersected)
writer.record(n + 50000)
i += 1
else:
skipped += 1
“within”被计算了18600多次,针对的是这个简化的多边形,这样速度更快。那些在“within”中的多边形不需要交集,因此节省了时间。
然后,那些不在“within”中的多边形,才与真正的polygon_to_intersect进行交集。
修改后的代码在4分钟内完成。
没有简化多边形的情况下:
- within = 16257
- intersected = 2357
有了简化多边形:
- within = 13669 <--- 针对简化多边形非常快
- intersected = 4945 <--- 交集数量比之前的代码多,但整体执行时间从25分钟降到4分钟。
1 个回答
在使用几何集合进行叠加时,内部的边界会被(故意地)先合并掉,所以这个结果是可以预料的(来源)。
通过使用一个叫做r树索引的东西,可以解决循环中的性能问题,下面的代码示例展示了这一点:
import numpy as np
import shapely
from shapely import MultiPoint, Polygon
points = MultiPoint([(1,1), (2,2), (4,3), (2,4), (2,5), (6,5), (5,4), (7,1)])
myVoronoi = shapely.voronoi_polygons(points)
polygon_to_intersect = Polygon([(0,0), (2,8), (4,3.7), (10,4.5), (5,-4), (0,0)])
voronoi_polys = np.array(myVoronoi.geoms)
tree = shapely.STRtree(voronoi_polys)
intersecting_idx = tree.query(polygon_to_intersect, predicate="intersects")
intersections = shapely.intersection(voronoi_polys.take(intersecting_idx), polygon_to_intersect)
print(intersections)