Cash flow statement
Q3 Q3 Dev YTD YTD Dev
SEK M 2017 2016 % 2017 2016 %
Operating earnings (EBIT) 977 506 93 1 921 1 379 39
Depreciation 163 40 308 245 120 104
Amortization and revaluation of purchased debt 866 389 123 1 845 1 137 62
Income tax paid -97 -33 194 -283 -187 51
Changes in factoring receivables 7 -25 -128 -39 -45 -13
Other changes in working capital 5 -60 -108 -8 -119 n/a
Financial net & other non-cash items -125 -6 1983 -486 -74 557
Cash flow from operating activities (CFFO) 1 796 811 121 3 195 2 211 45
Purchases of tangible and intangible fixed assets (CAPEX) -38 -33 15 -115 -103 12
Purchases of debt -1 124 -732 54 -4 317 -2 188 97
Purchases of shares in subsidiaries and associated companies -2 -1 100 -171 -89 92
Liquid assets in acquired subsidiaries 0 0 975 1
Other cash flow form investing activities -1 2 -150 -2 6 -133
Cash flow from investing activities (CFFI) -1 165 -764 52 -3 630 -2 373 53
Cash flow from investing activities (CFFI)
excl liquid assets in acquired subsidiaries -1 165 -764 52 -4 605 -2 374 94
Free cash flow (CFFO - CFFI) 631 47 1 243 -435 -167 160
Free cash flow (CFFO - CFFI) excl liquid
assets in acquired subsidiaries 631 47 1 243 -1 410 -168 739
17
您可以看到这很像您的字符串,但是各个列之间的间距更大。你知道吗
然后我们可以使用一些python将其解析为二维数组:
from tabulate import tabulate
import re
template = ''
with open('C:\\parsed_output.txt') as f:
raw_lines = [line for line in f.readlines() if line.strip() != '']
lines = raw_lines[1:-1] # ignore first and last lines
for raw_line in lines:
length = max([len(template), len(raw_line)])
old_template = template.ljust(length)
line = raw_line.ljust(length)
template = ''
for i in range(0,length):
template += ' ' if (old_template[i]==' ' and line[i]==' ') else 'X'
# try to work out the column widths, based on alignment of spaces:
column_widths = [len(x) for x in template.split()]
column_count = len(column_widths)
column_starts = [0]
start = 0
for i in range(1, column_count):
start = template.find(' X',start) + 1
column_starts.append(start)
column_starts.append(len(template)) # add final value to terminate right-most column
# now divide up each line using our column widths
rows=[]
for raw_line in lines:
line = raw_line.ljust(len(template))
row=[]
for i in range(0, column_count):
value = line[column_starts[i]:column_starts[i+1]].strip()
if i>0: value = re.sub('\s+', '', value)
row.append(value)
rows.append(row)
print(tabulate(rows, tablefmt='grid'))
如果不访问实际数据,很难准确地解决这个问题,但基本上您需要使用复制粘贴以外的其他方法来解析PDF表,因为这会导致列间距和用作千位分隔符的空间之间的混淆。你知道吗
首先,我建议使用Xpdf tools之类的工具,这是一组用于解析PDF文档的命令行实用程序。其中一个实用程序叫做
pdftotext.exe
,我已经在一个叫做intrum_q317_presentation.pdf
的sample PDF file上测试过了例如,要提取本文档第17页的表格:
您可以运行以下命令:
它产生这个输出(在
parsed_output.txt
):您可以看到这很像您的字符串,但是各个列之间的间距更大。你知道吗
然后我们可以使用一些python将其解析为二维数组:
。。。结果如下:
当然,它并不完美(例如“Q3 2017”应该在一个单元格中),也不能保证使用精确的数据(例如,您可能需要手动调整列宽),但它应该可以让您开始使用。你知道吗
相关问题 更多 >
编程相关推荐