执行某些步骤后，无法从网页中获取动态填充的数字

2条回答

网友

1楼 · 编辑于 2024-05-12 18:12:01

有两个选项可以获取您正在寻找的信息，其中一个是您可能已经知道的selenium

打开“网络”选项卡，并在您将鼠标悬停在地图上时监视浏览器正在传递的请求（无论是否向服务器发出请求）。对于请求和BS4，您最好的选择是，如果已经加载了数据，则下面的解决方案可能无法工作

import re 
print(re.findall(r’628086906’, r.text) )

如果它打印出数字，则意味着数据以json格式提供，并通过页面加载，您可以加载json或使用正则表达式查找。否则，您唯一的选择就是硒

网友

2楼 · 编辑于 2024-05-12 18:12:01

从以下位置调用数据：

POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx

内容在被OpenLayers library使用之前以自定义格式编码。所有解码都位于this JS file。如果你美化它，你可以寻找解码它的WayTo.Wtb.Format.WTB的OpenLayers.Class。二进制文件按字节进行解码，如下所示：

switch(elementType){
    case 1:
        var lineColor = new WayTo.Wtb.Element.LineColor();
        byteOffset = lineColor.parse(dataReader, byteOffset);
        outputElement = lineColor;
        break;
    case 2:
        var lineStyle = new WayTo.Wtb.Element.LineStyle();
        byteOffset = lineStyle.parse(dataReader, byteOffset);
        outputElement = lineStyle;
        break;
    case 3:
        var ellipse = new WayTo.Wtb.Element.Ellipse();
        byteOffset = ellipse.parse(dataReader, byteOffset);
        outputElement = ellipse;
        break;
    ........
}

为了得到原始数据，我们必须复制这种解码算法。我们不需要解码所有的对象，我们只想得到正确的偏移量并正确地提取strings。以下是用于解码部分的python脚本，该解码部分解码来自文件的数据（输出curl）：

with open("wtb.bin", mode='rb') as file:
    encodedData = file.read()
    offset = 0
    objects = []

    while offset < len(encodedData):

        elementSize = encodedData[offset]
        offset+=1
        elementType = encodedData[offset]
        offset+=1

        if elementType == 0:
            break

        curElemSize = elementSize
        curElemType = elementType

        if elementType== 114:
            largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big")
            offset+=4
            largeElementType = int.from_bytes(encodedData[offset:offset+2], "little")
            offset+=2
            curElemSize = largeElementSize
            curElemType = largeElementType

        print(f"type {curElemType} | size {curElemSize}")
        offsetInit = offset

        if curElemType == 1:
            offset+=4
        elif curElemType == 2:
            offset+=2
        elif curElemType == 3:
            offset+=20
        elif curElemType == 4:
            offset+=28
        elif curElemType == 5:
            offset+=12
        elif curElemType == 6:
            textLength = curElemSize - 3
            objects.append({
                "type": "Text",
                "x_position": int.from_bytes(encodedData[offset:offset+2], "little"),
                "y_position": int.from_bytes(encodedData[offset+2:offset+4], "little"),
                "rotation": int.from_bytes(encodedData[offset+4:offset+6], "little"),
                "text": encodedData[offset+6:offset+6+(textLength*2)].decode("utf-8").replace('\x00','')
            })
            offset+=6+(textLength*2)
        elif curElemType == 7:
            numPoint = int(curElemSize / 2)
            offset+=4*numPoint
        elif curElemType == 27:
            numPoint = int(curElemSize / 4)
            offset+=8*numPoint
        elif curElemType == 8:
            numPoint = int(curElemSize / 2)
            offset+=4*numPoint
        elif curElemType == 28:
            numPoint = int(curElemSize / 4)
            offset+=8*numPoint
        elif curElemType == 13:
            offset+=4
        elif curElemType == 14:
            offset+=2
        elif curElemType == 15:
            offset+=2
        elif curElemType == 100:
            pass
        elif curElemType == 101:
            offset+=20
        elif curElemType == 102:
            offset+=2
        elif curElemType == 103:
            pass
        elif curElemType == 104:
            highShort = int.from_bytes(encodedData[offset+2:offset+4], "little")
            lowShort = int.from_bytes(encodedData[offset+4:offset+6], "little")
            objects.append({
                "type": "StartNumericCell",
                "entity": int.from_bytes(encodedData[offset:offset+2], "little"),
                "occurrence": (highShort << 16) + lowShort
            })
            offset+=6
        elif curElemType == 105:
            #end cell
            pass
        elif curElemType == 109:
            textLength = curElemSize - 1
            objects.append({
                "type": "StartAlphanumericCell",
                "entity": int.from_bytes(encodedData[offset:offset+2], "little"),
                "occurrence":encodedData[offset+2:offset+2+(textLength*2)].decode("utf-8").replace('\x00','')
            })
            offset+=2+(textLength*2)
        elif curElemType == 111:
            offset+=40
        elif curElemType == 112:
            objects.append({
                "type": "CoordinatePlane",
                "projection_code": encodedData[offset+48:offset+52].decode("utf-8").replace('\x00','')
            })
            offset+=52
        elif curElemType == 113:
            offset+=24
        elif curElemType == 256:
            nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little")
            objects.append({
                "type": "LargePolygon",
                "name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''),
                "occurence": int.from_bytes(encodedData[offset+2:offset+6], "little")
            })
            if nameLength > 0:
                offset+= 16 + nameLength
                if encodedData[offset] == 0:
                    offset+=1
            else:
                offset+= 16
            numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little")
            offset+=2
            offset+=numberOfPoints*8
        elif curElemType == 257:
            pass
        else:
            offset+= curElemSize*2
        print(f"offset diff {offset-offsetInit}")
        print("                ")

    print(objects)
    print(len(encodedData))
    print(offset)

（旁注：注意元素大小以大端为单位，所有其他值以小端为单位）

运行this repl.it查看它如何解码文件

在此基础上，我们构建了获取数据的步骤，为了清晰起见，我将描述所有步骤（即使是您已经了解的步骤）：

法律公告

获取地图值不需要法律通知，但获取项目信息需要法律通知（文章的最后一步）

GET https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx

刮取input标记名称/值，设置cmdYES.x和cmdYES.y，然后调用

POST https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx

地图数据

调用服务器映射API：

POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx

使用以下数据：

{
    "mt":"titleresults",
    "qt":"lincNo",
    "LINCNumber": lincNumber,
    "rights": "B", #not required
    "cx": 1920, #screen definition
    "cy": 1080,
}

cx/xy是画布大小

使用上述方法对编码数据进行解码。您将获得：

[{'type': 'LargePolygon', 'name': '0010495134 8722524;1;162', 'entity': 23, 'occurence': 628079167, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170859 8022146;8;99', 'entity': 23, 'occurence': 628048595, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010691822 8722524;1;163', 'entity': 23, 'occurence': 628222354, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169736 8022146;8;89', 'entity': 23, 'occurence': 628021327, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694454 8722524;1;179', 'entity': 23, 'occurence': 628191678, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694362 8722524;1;178', 'entity': 23, 'occurence': 628307403, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010433381 8722524;1;177', 'entity': 23, 'occurence': 628209696, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169710 8022146;8;88A', 'entity': 23, 'occurence': 628021328, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694355 8722524;1;176', 'entity': 23, 'occurence': 628315826, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170866 8022146;8;100', 'entity': 23, 'occurence': 628163431, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694347 8722524;1;175', 'entity': 23, 'occurence': 628132810, 'line_color_green': 0, 'line_color_red': 129,

提取信息

如果您想针对一个特定的lincNumber，您需要查找多边形的样式，因为对于“多个”值（例如具有多个项目的值），响应中没有提到lincNumberid，只是一个链接引用。以下内容将获取所选项目：

selectedZone = [
    t 
    for t in objects 
    if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255
][0]
print(selectedZone)

调用您在帖子中提到的url以获取数据并提取表格：

GET https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}

完整代码：

import requests
from bs4 import BeautifulSoup
import pandas as pd

lincNumber = "0030278592"
#lincNumber = "0010661156"

s = requests.Session()

# 1) login
r = s.get("https://alta.registries.gov.ab.ca/spinii/logon.aspx")
soup = BeautifulSoup(r.text, "html.parser")

payload = dict([
    (t["name"], t.get("value", ""))
    for t in soup.findAll("input")
])
payload["uctrlLogon:cmdLogonGuest.x"] = 76
payload["uctrlLogon:cmdLogonGuest.y"] = 25
s.post("https://alta.registries.gov.ab.ca/spinii/logon.aspx",data=payload)

# 2) legal notice
r = s.get("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx")
soup = BeautifulSoup(r.text, "html.parser")
payload = dict([
    (t["name"], t.get("value", ""))
    for t in soup.findAll("input")
])
payload["cmdYES.x"] = 82
payload["cmdYES.y"] = 3
s.post("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx", data = payload)

# 3) map data
r = s.post("http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx",
    data= {
        "mt":"titleresults",
        "qt":"lincNo",
        "LINCNumber": lincNumber,
        "rights": "B", #not required
        "cx": 1920, #screen definition
        "cy": 1080,
    })

def decodeWtb(encodedData):
    offset = 0

    objects = []
    iteration = 0

    while offset < len(encodedData):

        elementSize = encodedData[offset]
        offset+=1
        elementType = encodedData[offset]
        offset+=1

        if elementType == 0:
            break

        curElemSize = elementSize
        curElemType = elementType

        if elementType== 114:
            largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big")
            offset+=4
            largeElementType = int.from_bytes(encodedData[offset:offset+2], "little")
            offset+=2
            curElemSize = largeElementSize
            curElemType = largeElementType

        offsetInit = offset

        if curElemType == 1:
            offset+=4
        elif curElemType == 2:
            offset+=2
        elif curElemType == 3:
            offset+=20
        elif curElemType == 4:
            offset+=28
        elif curElemType == 5:
            offset+=12
        elif curElemType == 6:
            textLength = curElemSize - 3
            offset+=6+(textLength*2)
        elif curElemType == 7:
            numPoint = int(curElemSize / 2)
            offset+=4*numPoint
        elif curElemType == 27:
            numPoint = int(curElemSize / 4)
            offset+=8*numPoint
        elif curElemType == 8:
            numPoint = int(curElemSize / 2)
            offset+=4*numPoint
        elif curElemType == 28:
            numPoint = int(curElemSize / 4)
            offset+=8*numPoint
        elif curElemType == 13:
            offset+=4
        elif curElemType == 14:
            offset+=2
        elif curElemType == 15:
            offset+=2
        elif curElemType == 100:
            pass
        elif curElemType == 101:
            offset+=20
        elif curElemType == 102:
            offset+=2
        elif curElemType == 103:
            pass
        elif curElemType == 104:
            offset+=6
        elif curElemType == 105:
            pass
        elif curElemType == 109:
            textLength = curElemSize - 1
            offset+=2+(textLength*2)
        elif curElemType == 111:
            offset+=40
        elif curElemType == 112:
            offset+=52
        elif curElemType == 113:
            offset+=24
        elif curElemType == 256:
            nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little")
            objects.append({
                "type": "LargePolygon",
                "name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''),
                "entity": int.from_bytes(encodedData[offset:offset+2], "little"),
                "occurence": int.from_bytes(encodedData[offset+2:offset+6], "little"),
                "line_color_green": encodedData[offset + 8],
                "line_color_red": encodedData[offset + 7],
                "line_color_blue": encodedData[offset + 9],
                "fill_color_green": encodedData[offset + 10],
                "fill_color_red": encodedData[offset + 11],
                "fill_color_blue": encodedData[offset + 13]
            })
            if nameLength > 0:
                offset+= 16 + nameLength
                if encodedData[offset] == 0:
                    offset+=1
            else:
                offset+= 16
            numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little")
            offset+=2
            offset+=numberOfPoints*8
        elif curElemType == 257:
            pass
        else:
            offset+= curElemSize*2

    return objects

# 4) decode custom format
objects = decodeWtb(r.content)

# 5) get the selected area
selectedZone = [
    t 
    for t in objects 
    if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255
][0]
print(selectedZone)

# 6) get the info about item
r = s.get(f'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}')
df = pd.read_html(r.content, attrs = {'class': 'bodyText'}, header =0)[0]
del df['Add to Cart']
del df['View']
print(df[:-1])

Run this on repl.it

输出

  Title Number           Type LINC Number Short Legal   Rights Registration Date Change/Cancel Date
0    052400228  Current Title  0030278592  0420091;16  Surface        19/09/2005         13/11/2019
1    072294084  Current Title  0030278551  0420091;12  Surface        22/05/2007         21/08/2007
2    072400529  Current Title  0030278469   0420091;3  Surface        05/07/2007         28/08/2007
3    072498228  Current Title  0030278501   0420091;7  Surface        18/08/2007         08/02/2008
4    072508699  Current Title  0030278535  0420091;10  Surface        23/08/2007         13/12/2007
5    072559500  Current Title  0030278477   0420091;4  Surface        17/09/2007         19/11/2007
6    072559508  Current Title  0030278576  0420091;14  Surface        17/09/2007         09/01/2009
7    072559521  Current Title  0030278519   0420091;8  Surface        17/09/2007         07/11/2007
8    072559530  Current Title  0030278493   0420091;6  Surface        17/09/2007         25/08/2008
9    072559605  Current Title  0030278485   0420091;5  Surface        17/09/2007         23/12/2008

如果您想获得更多条目，可以查看objects字段。如果你想获得更多关于物品的信息，比如坐标等，你可以改进解码器

也可以通过查看包含lincNumber的name字段来匹配目标周围的其他lincNumber，除非其中有“多个”名称

有趣的事实：

no http header need to be set in this flow

登录

法律公告

地图数据

提取信息

相关问题更多 >

编程相关推荐

热门问题

热门文章