如何使用pandas从csv文件每次读取10条记录？

time,line_id,high,low,avg,total,split_counts 1468332421098000,206,50879,50879,50879,2,"[50000,2]" 1468332421195000,206,39556,39556,39556,2,"[30000,2]" 1468332421383000,206,61636,61636,61636,2,"[60000,2]" 1468332423568000,206,47315,38931,43123,4,"[30000,2][40000,2]" 1468332423489000,206,38514,38445,38475,6,"[30000,6]" 1468332421672000,206,60079,60079,60079,2,"[60000,2]" 1468332421818000,206,44664,44664,44664,2,"[40000,2]" 1468332422164000,206,48500,48500,48500,2,"[40000,2]" 1468332423490000,206,39469,37894,38206,12,"[30000,12]" 1468332422538000,206,44023,44023,44023,2,"[40000,2]" 1468332423491000,206,38813,38813,38813,2,"[30000,2]" 1468332423528000,206,75970,75970,75970,2,"[70000,2]" 1468332423533000,206,42546,42470,42508,4,"[40000,4]" 1468332423536000,206,41065,40888,40976,4,"[40000,4]" 1468332423566000,206,66401,62453,64549,6,"[60000,6]"

2条回答

网友

1楼 · 编辑于 2024-06-12 06:46:54

您可以在^{}中使用chunksize：

import pandas as pd
import io

temp=u'''time,line_id,high,low,avg,total,split_counts
1468332421098000,206,50879,50879,50879,2,"[50000,2]"
1468332421195000,206,39556,39556,39556,2,"[30000,2]"
1468332421383000,206,61636,61636,61636,2,"[60000,2]"
1468332423568000,206,47315,38931,43123,4,"[30000,2][40000,2]"
1468332423489000,206,38514,38445,38475,6,"[30000,6]"
1468332421672000,206,60079,60079,60079,2,"[60000,2]"
1468332421818000,206,44664,44664,44664,2,"[40000,2]"
1468332422164000,206,48500,48500,48500,2,"[40000,2]"
1468332423490000,206,39469,37894,38206,12,"[30000,12]"
1468332422538000,206,44023,44023,44023,2,"[40000,2]"
1468332423491000,206,38813,38813,38813,2,"[30000,2]"
1468332423528000,206,75970,75970,75970,2,"[70000,2]"
1468332423533000,206,42546,42470,42508,4,"[40000,4]"
1468332423536000,206,41065,40888,40976,4,"[40000,4]"
1468332423566000,206,66401,62453,64549,6,"[60000,6]"'''
#after testing replace io.StringIO(temp) to filename

#for testing 2
reader = pd.read_csv(io.StringIO(temp), chunksize=2)
print (reader)
<pandas.io.parsers.TextFileReader object at 0x000000000AD1CD68>

^{pr2}$

见pandas documentation。在

网友

2楼 · 编辑于 2024-06-12 06:46:54

第一次迭代应该可以正常工作，但是任何进一步的迭代都是有问题的。在

read_csv有一个headerskwarg，默认值为infer（基本上是0）。这意味着解析的csv中的第一行将用作dataframe中列的名称。在

read_csv还有另一个kwarg，names。在

如documentation中所述：

header : int or list of ints, default ‘infer’ Row number(s) to use as the column names, and the start of the data. Default behavior is as if set to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
names : array-like, default None List of column names to use. If file contains no header row, then you should explicitly pass header=None

您应该将headers=None和{}传递给read_csv。在

相关问题更多 >

编程相关推荐

热门问题

热门文章