我想从csv文件中删除不是url的值:我们的df['url']包含类似于https://stackoverflow.com/questions/ask'https://www.linkedin.com/feed/''345'的值,我想删除345。你知道吗
def Find_url(string):
url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', string)
return url
if __name__ == "__main__":
file = pd.read_csv('url_file.csv')
df = pd.DataFrame(file)
for i in range(len(df)):
url = Find_url(df.loc[i]['url'])
df.loc[i]['url']=url
df.to_csv('clean_url.csv')
样本输入:
'https://www.zaubacorp.com/company/HINDUSTAN-CABLES-LTD/L31300WB1952GOI020560'
'http://www.indianrailways.gov.in/railwayboard/view_section.jsp?lang=0&id=0
1
304
365'
'https://en.wikipedia.org/wiki/Railway_Board'
'https://en.wikipedia.org/wiki/Railway_Board#History'
我想输出如下示例输出:
'https://www.zaubacorp.com/company/HINDUSTAN-CABLES-LTD/L31300WB1952GOI020560'
'http://www.indianrailways.gov.in/railwayboard/view_section.jsp?lang=0&id=0
'https://en.wikipedia.org/wiki/Railway_Board'
'https://en.wikipedia.org/wiki/Railway_Board#History'
您可以使用标准库中的^{} 尝试将字符串解析为具有必要属性的URL。你知道吗
相关问题 更多 >
编程相关推荐