Pandas不从文件夹中的html文件读取表

2024-04-19 10:21:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用pandas读取文件夹中每个单独html文件的表,以找出每个文件中的表数

但是,当指定单个文件时,此功能有效,但当我尝试在文件夹中运行它时,它会显示没有表

这是单个文件的代码

import pandas as pd


file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)

print ('tables found:', len(table))

这是输出

C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe C:/Users/Ahmed_Abdelmuniem/PycharmProjects/PandaHTML/main.py
tables found: 72

Process finished with exit code 0

这是文件夹中每个文件的代码

import pandas as pd
import shutil
import os

source_dir = r'C:\Users\Ahmed_Abdelmuniem\Desktop\TMorning'
target_dir = r'C:\Users\Ahmed_Abdelmuniem\Desktop\TAfternoon'

file_names = os.listdir(source_dir)

for file_name in file_names:
    table = pd.read_html(file_name)
    print ('tables found:', len(table))

这是错误日志:

C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe "C:/Users/Ahmed_Abdelmuniem/PycharmProjects/File mover V2.0/main.py"
Traceback (most recent call last):
  File "C:\Users\Ahmed_Abdelmuniem\PycharmProjects\File mover V2.0\main.py", line 12, in <module>
    table = pd.read_html(file_name)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 1085, in read_html
    return _parse(
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 913, in _parse
    raise retained
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 893, in _parse
    tables = p.parse_tables()
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 213, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 543, in _parse_tables
    raise ValueError("No tables found")
ValueError: No tables found

Process finished with exit code 1

1条回答
网友
1楼 · 发布于 2024-04-19 10:21:21

os.listdir返回一个列表,其中包含目录中条目的名称,包括子目录或任何其他文件。如果您只想保留html文件,最好使用glob.glob

import glob

file_names = glob.glob(os.path.join(source_dir, '*.html'))

编辑:如果要使用os.listdir,必须获取文件的实际路径:

for file_name in file_names:
    table = pd.read_html(os.path.join(source_dir, file_name))
    print ('tables found:', len(table))

相关问题 更多 >