我试图在pdf格式的报告中包含一个带有多索引的数据框架。我想要一个好的表输出。
我找到了这两个解决方案:
pandas.df->;HTML->;pdf格式
import pandas as pd
from IPython.display import HTML
import pdfkit
# df generation
df = pd.read_csv(path_to_csv, sep =',')
groupeddf = df.groupby('Cluster')
res = groupeddf.describe([0.05, 0.5, 0.95])
res.index.rename(['Cluster', 'stats'], inplace=True)
res['Cluster'] = res.index.get_level_values('Cluster')
res['stats'] = res.index.get_level_values('stats')
populations = (res.iloc[(res.index.get_level_values('stats') == 'count'), \
0].values).tolist()
res['population'] = [populations[i] for i in res.index.labels[0].values()]
total_pop = sum(populations)
res['frequency'] =(res['population']/total_pop).round(3)
res.set_index(['Cluster', 'population','frequency', 'stats'], inplace=True)
res1 = res.iloc[(res.index.get_level_values('stats') == '5%') |
(res.index.get_level_values('stats') == 'mean') |
(res.index.get_level_values('stats') == '50%') |
(res.index.get_level_values('stats') == '95%')]
res1 = res1.round(2)
# saving the df
h = HTML(res1.to_html())
my_file = open('test.html', 'w')
my_file.write(h.data)
my_file.close()
options = {
'orientation': 'Landscape'
}
with open('test.html') as f:
pdfkit.from_file(f, 'out.pdf', options=options)
但这依赖于pdfkit
,这使我们很难做到。这就是为什么我尝试使用pandas.df->;tex->;pdf(如Export a Pandas dataframe as a table image中所述)
import pandas as pd
import os
# df generation
df = pd.read_csv(path_to_csv, sep =',')
groupeddf = df.groupby('Cluster')
res = groupeddf.describe([0.05, 0.5, 0.95])
res.index.rename(['Cluster', 'stats'], inplace=True)
res['Cluster'] = res.index.get_level_values('Cluster')
res['stats'] = res.index.get_level_values('stats')
populations = (res.iloc[(res.index.get_level_values('stats') == 'count'), \
0].values).tolist()
res['population'] = [populations[i] for i in res.index.labels[0].values()]
total_pop = sum(populations)
res['frequency'] =(res['population']/total_pop).round(3)
res.set_index(['Cluster', 'population','frequency', 'stats'], inplace=True)
res1 = res.iloc[(res.index.get_level_values('stats') == '5%') |
(res.index.get_level_values('stats') == 'mean') |
(res.index.get_level_values('stats') == '50%') |
(res.index.get_level_values('stats') == '95%')]
res1 = res1.round(2)
res1.rename(columns=lambda x: x.replace('_', ' '), inplace=True)
#latex
template = r'''\documentclass[preview]{{standalone}}
\usepackage{{booktabs}}
\begin{{document}}
{}
\end{{document}}
'''
with open("outputfile.tex", "wb") as afile:
afile.write(template.format(res1.to_latex()))
os.system("pdflatex outputfile.tex")
但是,我不熟悉乳胶,我得到了这个错误:
! LaTeX Error: File `standalone.cls' not found.
Type X to quit or <RETURN> to proceed,
or enter a new name. (Default extension: cls)
是否知道pandas.df->;pdf的错误或标准方式?
对我有效的解决方案是: 带熊猫>;=0.17 我安装了pdflatex。我复制了乳胶包,比如booktabs.sty、geography.sty和pdflscape.sty
一种方法是使用降价。您可以使用
df.to_html()
。这会将数据帧转换为html表。从那里,您可以将生成的html放入标记文件(.md)中,并使用包将标记转换为pdf。 https://www.npmjs.com/package/markdown-pdf这是个好的选择吗?
相关问题 更多 >
编程相关推荐