<p>我认为这里需要的是在输出到pdf的图形中,以一致的方式将表输出到pdf文件。</p>
<p>我的第一个想法是不要使用matplotlib后端</p>
<pre><code>from matplotlib.backends.backend_pdf import PdfPages
</code></pre>
<p>因为它在格式化选项上似乎有点受限,倾向于将表格式化为图像(从而以不可选择的格式呈现表的文本)</p>
<p>如果要在pdf中混合使用dataframe输出和matplotlib打印,而不使用matplotlib pdf后端,我可以想出两种方法。</p>
<ol>
<li>像以前一样生成matplotlib图形的pdf,然后插入包含dataframe表的页面。我认为这是一个困难的选择。</li>
<li>使用其他库生成pdf。下面我将介绍一种方法。</li>
</ol>
<hr/>
<p>首先,安装<code>xhtml2pdf</code>库。这看起来支持得有点零散,但它是<a href="https://github.com/chrisglass/xhtml2pdf/" rel="nofollow noreferrer">active on Github</a>,并且有一些<a href="https://github.com/chrisglass/xhtml2pdf/blob/master/doc/usage.rst" rel="nofollow noreferrer">basic usage documentation here</a>。您可以通过<code>pip</code>即<code>pip install xhtml2pdf</code>安装它</p>
<p>完成后,下面是一个简单的示例,嵌入matplotlib图形,然后是表(所有文本都可以选择),然后是另一个图形。你可以随意使用CSS等来改变格式,使之符合你的具体要求,但我认为这满足了这个要求:</p>
<pre><code>from xhtml2pdf import pisa # this is the module that will do the work
import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
# Utility function
def convertHtmlToPdf(sourceHtml, outputFilename):
# open output file for writing (truncated binary)
resultFile = open(outputFilename, "w+b")
# convert HTML to PDF
pisaStatus = pisa.CreatePDF(
sourceHtml, # the HTML to convert
dest=resultFile, # file handle to recieve result
path='.') # this path is needed so relative paths for
# temporary image sources work
# close output file
resultFile.close() # close output file
# return True on success and False on errors
return pisaStatus.err
# Main program
if __name__=='__main__':
arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df = pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))
# Define your data
sourceHtml = '<html><head>'
# add some table CSS in head
sourceHtml += '''<style>
table, td, th {
border-style: double;
border-width: 3px;
}
td,th {
padding: 5px;
}
</style>'''
sourceHtml += '</head><body>'
#Add a matplotlib figure(s)
plt.plot(range(20))
plt.savefig('tmp1.jpg')
sourceHtml += '\n<p><img src="tmp1.jpg"></p>'
# Add the dataframe
sourceHtml += '\n<p>' + df.to_html() + '</p>'
#Add another matplotlib figure(s)
plt.plot(range(70,100))
plt.savefig('tmp2.jpg')
sourceHtml += '\n<p><img src="tmp2.jpg"></p>'
sourceHtml += '</body></html>'
outputFilename = 'test.pdf'
convertHtmlToPdf(sourceHtml, outputFilename)
</code></pre>
<p><strong><em>注意</em></strong>在编写时xhtml2pdf中似乎有一个bug,这意味着某些CSS不受尊重。与这个问题特别相关的是,似乎不可能在桌子周围有两个边界</p>
<hr/>
<h2>编辑</h2>
<p>在回应评论中,很明显有些用户(至少@Keith同时回答并获得了奖金!)希望表是可选择的,但绝对是在matplotlib轴上。这与原来的方法有些一致。因此-这里是一个只对matplotlib和matplotlib对象使用<code>pdf</code>后端的方法。我认为这个表看起来不太好,特别是层次列标题的显示,但我想这是一个选择的问题。我很感激<a href="https://stackoverflow.com/a/17237728/838992">this answer</a>和注释,它们提供了格式化表格显示轴的方法。</p>
<pre><code>import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
# Main program
if __name__=='__main__':
pp = PdfPages('Output.pdf')
arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))
plt.plot(range(20))
pp.savefig()
plt.close()
# Calculate some sizes for formatting - constants are arbitrary - play around
nrows, ncols = len(df)+1, len(df.columns) + 10
hcell, wcell = 0.3, 1.
hpad, wpad = 0, 0
#put the table on a correctly sized figure
fig=plt.figure(figsize=(ncols*wcell+wpad, nrows*hcell+hpad))
plt.gca().axis('off')
matplotlib_tab = pd.tools.plotting.table(plt.gca(),df, loc='center')
pp.savefig()
plt.close()
#Add another matplotlib figure(s)
plt.plot(range(70,100))
pp.savefig()
plt.close()
pp.close()
</code></pre>