如何生成平面文件(?)从等级cs

2024-04-18 01:08:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要对csv进行一些数据处理,其结构如下:

enter image description here

我需要折叠文本列中字段项为空的行的所有数据,并使其如下所示:

FIELD              TEXT

P0190001, RACE OF HOUSEHOLDER BY HOUSEHOLD TYPE(8) Universe:Households White Family Households: Married-couple family: With related children

P0190002, RACE OF HOUSEHOLDER BY HOUSEHOLD TYPE(8) Universe:Households White Family Households: Married-couple family: No related children

。。。以此类推。(字段中第一个有效条目之前的空白条目数并不总是两个,可能或多或少)

对于一个大的(60000个独特的“字段”)csv文件,有没有一种简单有效的方法来实现这一点?我在寻找在命令行上实现的方法,而不是编写程序。在


Tags: ofcsvbytypefamilywhiterelatedrace
1条回答
网友
1楼 · 发布于 2024-04-18 01:08:39

这不是一个命令行解决方案,而是一个有趣的脚本。在

import csv

csv_reader = csv.reader(open('data.csv', 'rb'))

# Read first two rows of field text out as a prefix.                            
prefix = ' '.join(csv_reader.next()[2].strip() for i in range(2))

def collapsed_row_iter():
    depth_value_list = []
    for (_, field_id, field_text, _) in csv_reader:
        # Count number of leading <SPACE> chars to determine depth.             
        pre_strip_text_len = len(field_text)
        field_text = field_text.lstrip()
        depth = pre_strip_text_len - len(field_text)

        depth_value_list_len = len(depth_value_list)
        if depth == depth_value_list_len + 1:
            # Append a new depth value.                                            
            depth_value_list.append(field_text.rstrip())

        if depth <= depth_value_list_len:
            # Truncate list to depth, append new value.                         
            del depth_value_list[depth:]
            depth_value_list.append(field_text.rstrip())

        else:
            # Depth value is greater than current_depth + 1                     
            raise ValueError

        # Only yield the row if field_id value is non-NULL.                     
        if field_id:
            yield (field_id, '%s %s' % (prefix, ' '.join(depth_value_list)))

# Get CSV writer object, write the header.                                      
csv_writer = csv.writer(open('collapsed.csv', 'wb'))
csv_writer.writerow(['FIELD', 'TEXT'])

# Iterate over collapsed rows, writing each to the output CSV.                  
for (field_id, collapsed_text) in collapsed_row_iter():
    csv_writer.writerow([field_id, collapsed_text])

输出:

^{pr2}$

相关问题 更多 >

    热门问题