我正在努力让这段代码创建一个.csv文件,文件名是我在使用cmd+r运行脚本时通过第二个参数声明的。现在,我已经设法让脚本在键入时接受第一个参数:
winkey+r -> pyscript https://stackoverflow.com
但当我打字时:
winkey+r -> pyscript https://stackoverflow.com ideal_filename.csv
什么也没发生。我仍在将输出结果放入剪贴板,但它不会创建新的.csv文件。如果我在脚本中手动命名文件,它可以正常工作,但会使脚本的功能有所降低
我真的不知道在这里该做什么——我只是最近才开始学习,并得到了一些帮助
完整代码如下:
import bs4 as bs
import urllib.request
import requests
from requests_html import HTMLSession
import pyperclip
import sys
import pandas as pd
sys.argv
url = sys.argv[1]
docTitle = sys.argv[2]
try:
session = HTMLSession()
response = session.get(url)
except requests.exceptions.RequestException as error:
print(error)
def crawl(url, docTitle=docTitle):
source = urllib.request.urlopen(url).read()
soup = bs.BeautifulSoup(source,'lxml')
csv_from_soup(soup, output_filename=docTitle)
def csv_from_soup(soup, output_filename, print_to_console=True):
title = soup.find('title')
desc = soup.findAll(attrs={"name": "description"})
h1Tag = soup.find_all('h1')[0].text.strip()
metadata = {
'Canonical' : response.html.xpath("//link[@rel='canonical']/@href"),
'Page Title': title.string,
'PT Length': len(title.string),
'Meta Description': desc[0]['content'],
'MD Length': len(desc[0]['content']),
'H1 Tag': h1Tag,
}
metadata_strings = ["\n".join([str(k), str(v)]) for k,v in metadata.items()]
metadata_strings = '\n--------------\n'.join(metadata_strings)
tag_names = ["h2", "h3", "h4", "h5", "h6"]
tag_data = [(tags.name + ' ',' ' + tags.text.strip()) for tags in soup.find_all(tag_names)]
tag_df = pd.DataFrame(tag_data, columns=["H1-H6 Tags", " Text"])
full_csv = "\n\n".join([metadata_strings, tag_df.to_csv(index=False)])
pyperclip.copy(full_csv)
if print_to_console:
print(full_csv)
with open(output_filename, "wb") as f:
f.write(full_csv.encode('utf-8', errors='replace'))
return full_csv
crawl(url)
目前没有回答
相关问题 更多 >
编程相关推荐