如何在不实际解析JSON的情况下查找JSON对象（regex）

2条回答

网友

1楼 · 编辑于 2024-04-19 09:05:07

下面将使用string.find()逐步遍历该字符串，如果找到目标字符串的开头和结尾，则将其提取为字典。如果它只找到开始，而没有找到结束，那么它将假定它是一个断开或中断的字符串，并中断循环，因为没有其他事情要做

我正在使用ast模块将字符串转换为字典。这并不是严格地回答这个问题所需要的，但我认为它使最终结果更有用

import ast

testdata = '{"routeProps":{"b723":{"navDataResource":[{"catalogId":48,"parentCatalogId":null,"icon":"https://bin.bnbstatic.com/image/20200609/bbjy2x.png","catalogName":"New Crypto Listings","total":762,"articles":[{"id":54572,"code":"0ef69e1d334c4d8c9ffbd088843bf2dd","title":"Binance Will List GYEN"},{"id":54548,"code":"e5607624f4614c3f9fd2562c8beb8660","title":"BTG, DEXE \u0026 SHIB Enabled on Binance Isolated Margin"},{"id":54394,"code":"a176d4cfd4c74a7fb8238e63d71c062a","title":"Binance Futures Will Launch USDT-Margined ICP Perpetual Contracts with Up to 25X Leverage"},{"id":54392,"code":"4fa91d953fd0484ab9a48cca0a41c192","title":"Binance Will Open Trading for Internet Computer (ICP)"},{"id":54382,"code":"33b6e8116ce54705ac89e898d1a05510","title":"Binance Will List Internet Computer (ICP)"}],"catalogs":[]},{"catalogId":49,"parentCatalogId":null,"icon":"https://bin.bnbstatic.com/image/20200609/zxgg2x.png","catalogName":"Latest News","total":1164,"articles":[{"id":54649,"code":"2291f02b964f45b195fd6d4685db80bb","title":"Update on Trading Suspension for GYEN"},{"id":54646,"code":"724346d139b041198a441dc149133c7d","title":"Binance Liquid Swap Adds RAMP/BUSD Liquidity Pool"},{"id":54643,"code":"bc9f313c04cc40d2b7e598c831fd721f","title":"Notice on Trading Suspension for GYEN"},{"id":54591,"code":"b3c6998066af43078c63a5498bfd80b1","title":"Binance P2P Supports New Payment Methods for Mongolia"},{"id":54586,"code":"d4418be0b9ea4d1b8e92cbbfe8468a17","title":"Dual Investment (42nd Phase) - Earn Up to 56% APY"}]'

# Create a list to hold the dictionary objects
itemlist = []

# Create variable to keep track of our position in the string
strMarker = 0

#Neverending Loooooooooooooooooooooooooooooooop
while True:

    # Find occurrence of the beginning of a target string
    strStart = testdata.find('{"id":',strMarker)
    if not strStart == -1:
        
        # If we've found the start, now look for the end marker of the string,
        # starting from the location we identified as the beginning of that string
        strEnd = testdata.find('}', strStart)
        
        # If it does not exist, this suggests it might be an interrupted string
        # so we don't do anything further with it, just allow the loop to break
        if not strEnd == -1:

            # Save this marker as it will be used as the starting point
            # for the next search cycle.
            strMarker = strEnd

            # Extract the substring based on the start and end positions, +1 to capture
            # the final '}'; as this string is nicely formatted as a dictionary object
            # already, we are using ast.literal_eval() to turn it into an actual usable
            # dictionary object
            itemlist.append(ast.literal_eval(testdata[strStart:strEnd+1]))

            # We're happy to keep searching so jump to the next loop
            continue

    # If nothing happened to trigger a jump to the next loop, break out of the
    # while loop
    break

# Print out the first entry in the list as a demo
print(str(itemlist[0]))
print(str(itemlist[0]["title"]))

此代码的输出应该是格式良好的dict：

{"id":54572,"code":"0ef69e1d334c4d8c9ffbd088843bf2dd","title":"Binance Will List GYEN"}
Binance Will List GYEN

网友

2楼 · 编辑于 2024-04-19 09:05:07

正则表达式应该在这里工作。尝试与以下正则表达式匹配。当我在https://regexr.com/中尝试它时，它与所需的部分匹配。此外，regexr还可以帮助您理解正则表达式，以防您不熟悉它

(\{"id":\d{5},"code":".{32}","title":"[^"]*"\})

下面是一个小样本python脚本，用于查找所有部分

import re

pattern='(\{"id":\d{5},"code":".{32}","title":"[^"]*"\})'
string_to_parse='...'
sections = re.findall(pattern, string_to_parse, re.DOTALL)

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在不实际解析JSON的情况下查找JSON对象（regex）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >