Python高效创建密度图

2024-04-24 11:09:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望得到一些帮助,使我的代码运行得更快。在

基本上我在列表中有一个长点的正方形网格insideoceanlist。然后有一个目录,其中包含lat, long坐标的数据文件,这些坐标表示某一天的雷击。我们的想法是每天,我们想知道在方格网的每个点周围有多少次雷击。目前只有两个for循环,所以对于正方形网格上的每个点,你都要检查那天每一次雷击的距离。如果它在40公里之内,我在那一点上加一个,就可以得到一张密度图。在

起始栅格的整体形状为矩形,由宽度为0.11、长度为0.11的正方形组成。整个矩形大约是50x30。最后,我有一个shapefile,它概述了澳大利亚的“预测区域”,如果网格中的任何一个点在这个区域之外,那么我们会忽略它。所以所有剩下的点(insideoceanlist)都是澳大利亚的。在

方格网上大约有10万个点,即使是一个缓慢的一天,也有大约1000次雷击,因此需要很长时间来处理。有没有一种方法可以更有效地做到这一点?我真的很感激你的建议。在

顺便说一下,我把list2改成了list3,因为我听说在python中迭代列表比数组快。在

for i in range(len(list1)): #list1 is a list of data files containing lat,long coords for lightning strikes for each day
    dict_density = {}
    for k in insideoceanlist: #insideoceanlist is a grid of ~100000 lat,long points
        dict_density[k] = 0
    list2 = np.loadtxt(list1[i],delimiter = ",") #this open one of the files containing lat,long coords and puts it into an array
    list3 = map(list,list2) #converts the array into a list
    # the following part is what I wanted to improve
    for j in insideoceanlist:
        for l in list3:
            if great_circle(l,j).meters < 40000: #great_circle is a function which measures distance between points the two lat,long points
                dict_density[j] += 1
    #
    filename = 'example' +str(i) + '.txt'
        with open(filename, 'w') as f:
            for m in range(len(insideoceanlist)):
                f.write('%s\n' % (dict_density[insideoceanlist[m]])) #writes each point in the same order as the insideoceanlist
    f.close()

Tags: thein网格forisdensitydictlong
3条回答

如果知道在栅格上生成点的公式,则可以通过反转该公式来快速找到距给定点最近的栅格点。在

下面是一个激励人心的例子,这对你的目的来说是不太正确的,因为地球是一个球体,不是平的或圆柱形的。如果无法轻松地反转网格点公式以找到最近的网格点,则可以执行以下操作:

  • 创建第二个网格(我们称之为G2),它是一个简单的公式,如下所示,有足够大的长方体,这样你就可以确信离一个长方体中任何一个点最近的网格点要么在同一个框中,要么在8个相邻框中的一个中。在
  • 创建一个dict,它存储哪些原始网格(G1)点在G2网格的哪个框中
  • 取你想要分类的点p,找到它将进入的G2
  • p与这个G2框中的所有G1点以及该框的所有近邻进行比较
  • 选择最接近pG1

用完美的平面网格来激励示例

如果在平面上有一个完美的正方形网格,它没有旋转,边长d,那么它们的点可以用一个简单的数学公式来定义。它们的纬度值都是

lat0 + d * i

对于某些整数值i,其中lat0是编号最低的纬度,其经度值的形式相同:

^{pr2}$

对于某个整数j。要找到给定(lat, long)对的最近网格点是什么,可以分别找到它的纬度和经度。网格上最近的纬度数是

i = round((lat - lat0) / d)

同样地,j = round((long - long0) / d)表示经度。在

所以你可以继续前进的一个方法是把它插入上面的公式,然后得到

grid_point = (lat0 + d * round((lat - lat0) / d),
              long0 + d * round((long - long0) / d)

只需在网格点增加dict中的计数。这将使您的代码比以前快得多,因为您不必检查数千个网格点的距离,而是通过一些计算直接找到网格点。在

使用ij作为多维数组的索引,而不是在dict中使用grid_point,这样做可能会更快一些。在

为了详细说明@DanGetz的答案,这里有一些代码使用罢工数据作为驱动程序,而不是为每个罢工点迭代整个网格。我假设你是以澳大利亚的中心点为中心,有0.11度的方格网格,,尽管一个度的大小随纬度而变化!

在维基百科上快速查阅维基百科的一些背面计算告诉我,你40公里的距离是从北到南的±4格方格范围,从东到西是±5格方格的范围。(在低纬度地区下降到4平方,但是。。。嗯!)在

这里的诀窍,如前所述,是从打击位置(lat/lon)直接转换成方格。计算出网格的一个角点的位置,从删除线中减去该位置,然后除以网格的大小-0.11度,截断,就得到了行/列索引。现在访问周围的所有方块,直到距离变得太大,最多为1+(2*2*4*5)=81格检查距离。在范围内增加方块。在

结果是,我最多做了81次访问乘以1000次攻击(或者不管你有多少次),而不是访问100000个网格正方形乘以1000次攻击。这是一个显著的性能提升。在

请注意,您没有描述传入的数据格式,所以我只是随机生成数字。你会想把它修好的。;—)

#!python3

"""
Per WikiPedia (https://en.wikipedia.org/wiki/Centre_points_of_Australia)

Median point
============

The median point was calculated as the midpoint between the extremes of
latitude and longitude of the continent.

    24 degrees 15 minutes south latitude, 133 degrees 25 minutes east
    longitude (24°15′S 133°25′E); position on SG53-01 Henbury 1:250 000
    and 5549 James 1:100 000 scale maps.

"""
MEDIAN_LAT = -(24.00 + 15.00/60.00)
MEDIAN_LON = (133 + 25.00/60.00)

"""
From the OP:

The starting grid has the overall shape of a rectangle, made up of
squares with width of 0.11 and length 0.11. The entire rectange is about
50x30. Lastly I have a shapefile which outlines the 'forecast zones' in
Australia, and if any point in the grid is outside this zone then we
omit it. So all the leftover points (insideoceanlist) are the ones in
Australia.
"""

DELTA_LAT = 0.11
DELTA_LON = 0.11

GRID_WIDTH = 50.0 # degrees
GRID_HEIGHT = 30.0 # degrees

GRID_ROWS = int(GRID_HEIGHT / DELTA_LAT) + 1
GRID_COLS = int(GRID_WIDTH / DELTA_LON) + 1

LAT_SIGN = 1.0 if MEDIAN_LAT >= 0 else -1.0
LON_SIGN = 1.0 if MEDIAN_LON >= 0 else -1.0

GRID_LOW_LAT = MEDIAN_LAT - (LAT_SIGN * GRID_HEIGHT / 2.0)
GRID_HIGH_LAT = MEDIAN_LAT + (LAT_SIGN * GRID_HEIGHT / 2.0)
GRID_MIN_LAT = min(GRID_LOW_LAT, GRID_HIGH_LAT)
GRID_MAX_LAT = max(GRID_LOW_LAT, GRID_HIGH_LAT)

GRID_LOW_LON = MEDIAN_LON - (LON_SIGN * GRID_WIDTH / 2.0)
GRID_HIGH_LON = MEDIAN_LON + (LON_SIGN * GRID_WIDTH / 2.0)
GRID_MIN_LON = min(GRID_LOW_LON, GRID_HIGH_LON)
GRID_MAX_LON = max(GRID_LOW_LON, GRID_HIGH_LON)

GRID_PROXIMITY_KM = 40.0

"""https://en.wikipedia.org/wiki/Longitude#Length_of_a_degree_of_longitude"""
_Degree_sizes_km = (
    (0,  110.574, 111.320),
    (15, 110.649, 107.551),
    (30, 110.852, 96.486),
    (45, 111.132, 78.847),
    (60, 111.412, 55.800),
    (75, 111.618, 28.902),
    (90, 111.694, 0.000),
)

# For the Australia situation, +/- 15 degrees means that our worst
# case scenario is about 40 degrees south. At that point, a single
# degree of longitude is smallest, with a size about 80 km. That
# in turn means a 40 km distance window will span half a degree or so.
# Since grid squares a 0.11 degree across, we have to check +/- 5
# cols.

GRID_SEARCH_COLS = 5

# Latitude degrees are nice and constant-like at about 110km. That means
# a .11 degree grid square is 12km or so, making our search range +/- 4
# rows.

GRID_SEARCH_ROWS = 4

def make_grid(rows, cols):
    return [[0 for col in range(cols)] for row in range(rows)]

Grid = make_grid(GRID_ROWS, GRID_COLS)

def _col_to_lon(col):
    return GRID_LOW_LON + (LON_SIGN * DELTA_LON * col)

Col_to_lon = [_col_to_lon(c) for c in range(GRID_COLS)]

def _row_to_lat(row):
    return GRID_LOW_LAT + (LAT_SIGN * DELTA_LAT * row)

Row_to_lat = [_row_to_lat(r) for r in range(GRID_ROWS)]

def pos_to_grid(pos):
    lat, lon = pos

    if lat < GRID_MIN_LAT or lat >= GRID_MAX_LAT:
        print("Lat limits:", GRID_MIN_LAT, GRID_MAX_LAT)
        print("Position {} is outside grid.".format(pos))
        return None

    if lon < GRID_MIN_LON or lon >= GRID_MAX_LON:
        print("Lon limits:", GRID_MIN_LON, GRID_MAX_LON)
        print("Position {} is outside grid.".format(pos))
        return None

    row = int((lat - GRID_LOW_LAT) / DELTA_LAT)
    col = int((lon - GRID_LOW_LON) / DELTA_LON)

    return (row, col)


def visit_nearby_grid_points(pos, dist_km):
    row, col = pos_to_grid(pos)

    # +0, +0 is not symmetric - don't increment twice
    Grid[row][col] += 1

    for dr in range(1, GRID_SEARCH_ROWS):
        for dc in range(1, GRID_SEARCH_COLS):
            misses = 0
            gridpos = Row_to_lat[row+dr], Col_to_lon[col+dc]
            if great_circle(pos, gridpos).meters <= dist_km:
                Grid[row+dr][col+dc] += 1
            else:
                misses += 1
            gridpos = Row_to_lat[row+dr], Col_to_lon[col-dc]
            if great_circle(pos, gridpos).meters <= dist_km:
                Grid[row+dr][col-dc] += 1
            else:
                misses += 1
            gridpos = Row_to_lat[row-dr], Col_to_lon[col+dc]
            if great_circle(pos, gridpos).meters <= dist_km:
                Grid[row-dr][col+dc] += 1
            else:
                misses += 1
            gridpos = Row_to_lat[row-dr], Col_to_lon[col-dc]
            if great_circle(pos, gridpos).meters <= dist_km:
                Grid[row-dr][col-dc] += 1
            else:
                misses += 1
            if misses == 4:
                break

def get_pos_from_line(line):
    """
    FIXME: Don't know the format of your data, just random numbers
    """
    import random
    return (random.uniform(GRID_LOW_LAT, GRID_HIGH_LAT),
            random.uniform(GRID_LOW_LON, GRID_HIGH_LON))

with open("strikes.data", "r") as strikes:
    for line in strikes:
        pos = get_pos_from_line(line)
        visit_nearby_grid_points(pos, GRID_PROXIMITY_KM)

你试过用Numpy做索引吗?您可以使用多维数组,并且索引应该更快,因为Numpy数组本质上是一个围绕C数组的Python包装器。在

如果您需要进一步提高速度,请看一下Cython,一个Python到优化的C转换器。它特别适合于多维索引,并且应该能够将这种代码的速度提高一个数量级。它将向代码中添加一个附加的依赖项,但安装很快,实现起来并不困难。在

Benchmarks),(Tutorial using Numpy with Cython

作为一个快速的旁白,使用

for listI in list1:
    ...
    list2 = np.loadtxt(listI, delimiter=',')
 # or if that doesn't work, at least use xrange() rather than range()

实际上,只有当您明确需要range()函数生成的列表时,才应该使用range()。在您的情况下,它不应该做太多,因为它是最外层的循环。在

相关问题 更多 >