我正在过滤网站的数据并寻找关键词。该网站使用一个长JSON主体,我只需要在base64编码图像之前解析所有内容。我无法定期解析JSON对象,因为结构经常更改,有时会被切断
下面是我正在分析的一段代码:
<script id="__APP_DATA" type="application/json">{"routeProps":{"b723":{"navDataResource":[{"catalogId":48,"parentCatalogId":null,"icon":"https://bin.bnbstatic.com/image/20200609/bbjy2x.png","catalogName":"New Crypto Listings","total":762,"articles":[{"id":54572,"code":"0ef69e1d334c4d8c9ffbd088843bf2dd","title":"Binance Will List GYEN"},{"id":54548,"code":"e5607624f4614c3f9fd2562c8beb8660","title":"BTG, DEXE \u0026 SHIB Enabled on Binance Isolated Margin"},{"id":54394,"code":"a176d4cfd4c74a7fb8238e63d71c062a","title":"Binance Futures Will Launch USDT-Margined ICP Perpetual Contracts with Up to 25X Leverage"},{"id":54392,"code":"4fa91d953fd0484ab9a48cca0a41c192","title":"Binance Will Open Trading for Internet Computer (ICP)"},{"id":54382,"code":"33b6e8116ce54705ac89e898d1a05510","title":"Binance Will List Internet Computer (ICP)"}],"catalogs":[]},{"catalogId":49,"parentCatalogId":null,"icon":"https://bin.bnbstatic.com/image/20200609/zxgg2x.png","catalogName":"Latest News","total":1164,"articles":[{"id":54649,"code":"2291f02b964f45b195fd6d4685db80bb","title":"Update on Trading Suspension for GYEN"},{"id":54646,"code":"724346d139b041198a441dc149133c7d","title":"Binance Liquid Swap Adds RAMP/BUSD Liquidity Pool"},{"id":54643,"code":"bc9f313c04cc40d2b7e598c831fd721f","title":"Notice on Trading Suspension for GYEN"},{"id":54591,"code":"b3c6998066af43078c63a5498bfd80b1","title":"Binance P2P Supports New Payment Methods for Mongolia"},{"id":54586,"code":"d4418be0b9ea4d1b8e92cbbfe8468a17","title":"Dual Investment (42nd Phase) - Earn Up to 56% APY"}]
正如你所看到的,我正试图剔除一切,除了这些:
{"id":54382,"code":"33b6e8116ce54705ac89e898d1a05510","title":"Binance Will List Internet Computer (ICP)"}
由于JSON非常长,所以解析整个内容并不明智,有没有一种方法可以在不解析JSON对象的情况下找到这样的字符串?理想情况下,我希望所有东西都在一个数组中。正则表达式可以工作吗
ID有5个数字长,代码有32个字符长,还有一个标题
提前多谢
下面将使用
string.find()
逐步遍历该字符串,如果找到目标字符串的开头和结尾,则将其提取为字典。如果它只找到开始,而没有找到结束,那么它将假定它是一个断开或中断的字符串,并中断循环,因为没有其他事情要做我正在使用ast模块将字符串转换为字典。这并不是严格地回答这个问题所需要的,但我认为它使最终结果更有用
此代码的输出应该是格式良好的dict:
正则表达式应该在这里工作。尝试与以下正则表达式匹配。当我在https://regexr.com/中尝试它时,它与所需的部分匹配。此外,regexr还可以帮助您理解正则表达式,以防您不熟悉它
下面是一个小样本python脚本,用于查找所有部分
相关问题 更多 >
编程相关推荐