使用Python为webscraping从电子表格导出值（BeautifulSoup4）

2024-04-26 14:49:21 发布

男 | 程序猿一只，喜欢编程写python代码。

A.我的目标：使用Python从Excel电子表格中提取唯一的OCPO id，并使用这些id在web上搜索相应的公司名称和NIN id。（注意：NIN和OCPO id对于一个公司都是唯一的）。你知道吗

B.细节： i、使用openpyxl从Excel电子表格中提取OCPO id。二。在商业注册（https://focus.kontur.ru/）中逐个搜索OCPO id，并使用BeautifulSoup4查找相应的公司名称和公司id（NIN）。你知道吗

Example: A search for OCPO ID "00044428" yields a matching company name ПАО "НК "РОСНЕФТЬ" and corresponding NIN ID "7706107510."

在Excel中保存公司名称和ID的列表。你知道吗

C.我的进展： i、我能够将OCPO id列表从Excel提取到Python。你知道吗

# Pull the Packages
import openpyxl
import requests
import sys
from bs4 import BeautifulSoup

# Pull OCPO from the Spreadsheet
wb = openpyxl.load_workbook(r"C:\Users\ksong\Desktop\book1.xlsx")
sheet = wb.active
sheet.columns[0]
for cellobjc in sheet.columns[0]:
    print(cellobjc.value)

二。我可以搜索OCPO ID，让Python搜索匹配的公司名称和对应的公司ID

# Part 1a: Pull the Website 
r = requests.get("https://focus.kontur.ru/search?query=" + "00044428")
r.encoding = "UTF-8"

# Part 1b: Pull the Content
c = r.content
soup = BeautifulSoup(c, "html.parser", from_encoding="UTF-8")

# Part 2a: Pull Company name
name = soup.find("a", attrs={'class':"js-subject-link"})
name_box = name.text.strip()
print(name_box)

D.帮助

i.如何编写代码，以便将每个OCPO id作为一个循环单独搜索，这样我就不会得到一个OCPOs id列表，而是一个搜索结果列表？换句话说，每个OCPO都会被搜索并与相应的公司名称和NIN ID匹配。这个循环必须作为##########（“https://focus.kontur.ru/search?query=”+“#############”）。你知道吗

二。另外，Python应该使用什么代码将所有搜索结果保存在Excel电子表格中？你知道吗

Tags： the name https import 名称 id 列表公司

1条回答

网友

1楼 · 发布于 2024-04-26 14:49:21

1）创建要写入的空工作簿：

wb2 = Workbook()
ws1 = wb2.active

2）将第二个框中的所有代码从第一个框放入for循环。你知道吗

3）将“00044428”改为str(cellobjc.值)你知道吗

4）在每个循环结束时，将行附加到新工作表：

row = [cellobjc.value, date_box, other_variables]
ws1.append(row)

5）循环完成后，保存文件

wb2.save("results.xlsx")

使用Python为webscraping从电子表格导出值（BeautifulSoup4）

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Python为webscraping从电子表格导出值（BeautifulSoup4）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >