如何使用pandas周期性跳过读取txt文件中的行?

网友

1楼 · 编辑于 2024-04-25 01:17:03

使用^{}，其中下面的N表示read every N lines

from itertools import islice

N = 3
sep = ','

with open(file_path, 'r') as f:
    lines_gen = islice(f, None, None, N)
    df = pd.DataFrame([x.strip().split(sep) for x in lines_gen])

网友

2楼 · 编辑于 2024-04-25 01:17:03

只需数一数文件中有多少行，然后列出应该跳过的行（可能称为无用的行）熊猫.read\u csv（…，skiprows=无用的行）。你知道吗

我的问题是芯片排数。有几种方法可以做到：

在Linux命令“wc-l”（下面是一个如何将其放入代码中的指令：Running "wc -l <filename>" within Python Code）
发电机。我在相关行中有一个键：它在最后一列中。不是很有用，但对我有帮助。所以我可以用它来计算字符串，看起来大约是500000行，需要0.00011来计算
```
with open(filename) as f:
    for row in f:
        if '2147483647' in row:
            continue
        yield row
```

网友

3楼 · 编辑于 2024-04-25 01:17:03

我把你的数据重复了三遍。听起来您需要每4行（不是从0开始），因为这就是数据所在的位置。在documentation的skipsrows中，它说。你知道吗

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

那么如果我们把一个not in传递给lambda函数呢？这就是我下面要做的。我正在创建一个我希望保留的值列表。并将not in传递给skiprows参数。在英语中，跳过不是每四行的所有行。你知道吗

import pandas as pd

# creating a list of all the 4th row indexes. If you need more than 1 million, just up the range number
list_of_rows_to_keep = list(range(0,1000000))[3::4]

# passing this list to the lambda function using not in.
df = pd.read_csv(r'PATH_To_CSV.csv', skiprows=lambda x: x not in list_of_rows_to_keep)
df.head()

#output
0  data
1  data
2  data

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用pandas周期性跳过读取txt文件中的行?

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >