将pandas数据框的“Out[]表保存为figu问题的回答

将pandas数据框的“Out[]表保存为figu

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我认为这里需要的是在输出到pdf的图形中，以一致的方式将表输出到pdf文件。 我的第一个想法是不要使用matplotlib后端 <pre><code>from matplotlib.backends.backend_pdf import PdfPages </code></pre> 因为它在格式化选项上似乎有点受限，倾向于将表格式化为图像（从而以不可选择的格式呈现表的文本） 如果要在pdf中混合使用dataframe输出和matplotlib打印，而不使用matplotlib pdf后端，我可以想出两种方法。 <ol> <li>像以前一样生成matplotlib图形的pdf，然后插入包含dataframe表的页面。我认为这是一个困难的选择。</li> <li>使用其他库生成pdf。下面我将介绍一种方法。</li> </ol> <hr/> 首先，安装<code>xhtml2pdf</code>库。这看起来支持得有点零散，但它是<a href="https://github.com/chrisglass/xhtml2pdf/" rel="nofollow noreferrer">active on Github</a>，并且有一些<a href="https://github.com/chrisglass/xhtml2pdf/blob/master/doc/usage.rst" rel="nofollow noreferrer">basic usage documentation here</a>。您可以通过<code>pip</code>即<code>pip install xhtml2pdf</code>安装它 完成后，下面是一个简单的示例，嵌入matplotlib图形，然后是表（所有文本都可以选择），然后是另一个图形。你可以随意使用CSS等来改变格式，使之符合你的具体要求，但我认为这满足了这个要求： <pre><code>from xhtml2pdf import pisa # this is the module that will do the work import numpy as np import pandas as pd from matplotlib.backends.backend_pdf import PdfPages import matplotlib.pyplot as plt # Utility function def convertHtmlToPdf(sourceHtml, outputFilename): # open output file for writing (truncated binary) resultFile = open(outputFilename, "w+b") # convert HTML to PDF pisaStatus = pisa.CreatePDF( sourceHtml, # the HTML to convert dest=resultFile, # file handle to recieve result path='.') # this path is needed so relative paths for # temporary image sources work # close output file resultFile.close() # close output file # return True on success and False on errors return pisaStatus.err # Main program if __name__=='__main__': arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2] columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar']) df = pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3)) # Define your data sourceHtml = '<html><head>' # add some table CSS in head sourceHtml += '''<style> table, td, th { border-style: double; border-width: 3px; } td,th { padding: 5px; } </style>''' sourceHtml += '</head><body>' #Add a matplotlib figure(s) plt.plot(range(20)) plt.savefig('tmp1.jpg') sourceHtml += '\n<img src="tmp1.jpg">' # Add the dataframe sourceHtml += '\n' + df.to_html() + '' #Add another matplotlib figure(s) plt.plot(range(70,100)) plt.savefig('tmp2.jpg') sourceHtml += '\n<img src="tmp2.jpg">' sourceHtml += '</body></html>' outputFilename = 'test.pdf' convertHtmlToPdf(sourceHtml, outputFilename) </code></pre> 注意在编写时xhtml2pdf中似乎有一个bug，这意味着某些CSS不受尊重。与这个问题特别相关的是，似乎不可能在桌子周围有两个边界 <hr/> <h2>编辑</h2> 在回应评论中，很明显有些用户（至少@Keith同时回答并获得了奖金！）希望表是可选择的，但绝对是在matplotlib轴上。这与原来的方法有些一致。因此-这里是一个只对matplotlib和matplotlib对象使用<code>pdf</code>后端的方法。我认为这个表看起来不太好，特别是层次列标题的显示，但我想这是一个选择的问题。我很感激<a href="https://stackoverflow.com/a/17237728/838992">this answer</a>和注释，它们提供了格式化表格显示轴的方法。 <pre><code>import numpy as np import pandas as pd from matplotlib.backends.backend_pdf import PdfPages import matplotlib.pyplot as plt # Main program if __name__=='__main__': pp = PdfPages('Output.pdf') arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2] columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar']) df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3)) plt.plot(range(20)) pp.savefig() plt.close() # Calculate some sizes for formatting - constants are arbitrary - play around nrows, ncols = len(df)+1, len(df.columns) + 10 hcell, wcell = 0.3, 1. hpad, wpad = 0, 0 #put the table on a correctly sized figure fig=plt.figure(figsize=(ncols*wcell+wpad, nrows*hcell+hpad)) plt.gca().axis('off') matplotlib_tab = pd.tools.plotting.table(plt.gca(),df, loc='center') pp.savefig() plt.close() #Add another matplotlib figure(s) plt.plot(range(70,100)) pp.savefig() plt.close() pp.close() </code></pre>

将pandas数据框的“Out[]表保存为figu

1 个回答

相关Python问题