找到给定文本在HTML字符串中的位置的父标签

1条回答

网友

1楼 · 发布于 2024-04-24 08:32:04

就在我的头顶，因为你有偏移量（我想你可能需要调整，因为我必须使用（28,48））

基于偏移创建子字符串
使用split()将偏移量字符串作为分隔符拆分整个html字符串
获取由split创建的第一个子串，并使用>将其拆分

子字符串列表中倒数第二个子字符串是父标记（因为如果分隔符位于要拆分的字符串的末尾，拆分列表将返回空字符串）：

 html_string = '<html><body><span id="1234">The Dormouse\'s story</span><body></head>'
 offset_string = html_string[28:48]
 tags_together = html_string.split(offset_string)[0]
 list_of_tags = tags_together.split('>')
 parent_tag = list_of_tags[len(list_of_tags)-2]

请注意，您将缺少一个“>；”所以如果有必要的话，你得加回去

parent_tag = parent_tag + ">"

另外，我之所以把html_string放在单引号中，是因为已经有双引号了

这是恶心的，有点野蛮，但它应该得到的工作完成I am sure there exists a python library out there that can do this kind of task for you. You just need to look hard enough!

我建议打开一个pythonshell，在创建每个变量之后打印出来，这样就可以看到split()的作用Here are some docs for that!

现在我想起来了，使用带有已知偏移量的regex也可以得到标签

相关问题更多 >

编程相关推荐

热门问题

热门文章

找到给定文本在HTML字符串中的位置的父标签

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >