Python从html页面重新排列并删除字符

2024-05-15 14:35:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用beautifulsoup4和lxml在windows10上运行python2.7.11

import urllib2
import re
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("http://www.daisuki.net/us/en/anime/watch.GUNDAMUNICORNRE0096.13142.html"), "lxml")
Name = soup.title.string

print(Name.replace('#', ""))

输出:

01出发0096-移动套装GUNDAM UNICORN回复:0096-DAISUKI

期望输出:

移动套装GUNDAM UNICORN回复:0096-01出发0096

我该如何删除结尾的“-DAISUKI”并重新排列字符串


Tags: namefromimportreurllib2lxmlunicornsoup
2条回答

黑客解决方案:

Name = "01 DEPARTURE 0096 - MOBILE SUIT GUNDAM UNICORN RE:0096 - DAISUKI"
print ("- ".join(reversed(Name.split('-')[:2]))).strip()

-拆分并重新排列标题部分:

>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> 
>>> soup = BeautifulSoup(urllib2.urlopen("http://www.daisuki.net/us/en/anime/watch.GUNDAMUNICORNRE0096.13142.html"), "lxml")
>>> Name = soup.title.string
>>> 
>>> " - ".join(Name.replace('#', "").split(" - ")[1::-1])
u'MOBILE SUIT GUNDAM UNICORN RE:0096 - 01 DEPARTURE 0096'

相关问题 更多 >