如何将刮取的数据保存到csv

import requests from bs4 import BeautifulSoup import pandas as pd link = ("https://sofifa.com/team/1/arsenal/?&showCol%5B%5D=ae&showCol%5B%5D=hi&showCol%5B%5D=le&showCol%5B%5D=vl&showCol%5B%5D=wg&showCol%5B%5D=rc") get_text = requests.get(link) soup = BeautifulSoup(get_text.content, "lxml") table = soup.find("table", {"class":"table table-hover persist-area"}) table1 = table.get_text() table1.to_csv("Arsenal_players.csv")

2条回答

网友

1楼 · 编辑于 2024-04-25 13:16:11

你需要输入更多的解释，然后再问一个问题，比如你得到的错误类型。这将更有助于给出答案。不管怎样，我运行了你的代码，看到了预期的错误。表1变量现在只包含字符串，因为

table1 = table.get_text()

因此，在您的情况下，没有函数可以在csv中输入所有数据，但您可以找到帮助here。但请记住，下一次要对你的问题保持精确

网友

2楼 · 编辑于 2024-04-25 13:16:11

您需要首先使用read_html将html读入数据帧，然后使用to_csv写入文件。以下是一个例子：

import requests
from bs4 import BeautifulSoup
import pandas as pd

link = ("https://sofifa.com/team/1/arsenal/?&showCol%5B%5D=ae&showCol%5B%5D=hi&showCol%5B%5D=le&showCol%5B%5D=vl&showCol%5B%5D=wg&showCol%5B%5D=rc")
get_text = requests.get(link)
soup = BeautifulSoup(get_text.content, "lxml")
table = soup.find("table", {"class":"table table-hover persist-area"})

# produces a list of dataframes from the html, see docs for more options
dfs = pd.read_html(str(table)) 
dfs[0].to_csv("Arsenal_players.csv")

read_html方法有很多选项可以改变行为。您还可以使用它直接读取链接，而不是首先使用requests/BeautifulSoup（它可以在引擎盖下执行此操作）

它可能看起来像这样，但这是未经测试的，因为当我这样做时，该链接给出一个403禁止（可能他们是基于用户代理进行阻止）：

dfs = pd.read_html(link, attrs={"class":"table table-hover persist-area"})

编辑：由于read_html不允许您指定用户代理，我相信这将是此特定链接最简洁的方式：

dfs = pd.read_html(
    requests.get(link).text,
    attrs={"class":"table table-hover persist-area"}
)
dfs[0].to_csv("Arsenal_players.csv")

相关问题更多 >

编程相关推荐

热门问题

热门文章