在apache beam d中如何将csv转换成字典

# Standard imports import apache_beam as beam # Create a pipeline executing on a direct runner (local, non-cloud). p = beam.Pipeline('DirectPipelineRunner') # Create a PCollection with names and write it to a file. (p | 'read solar data' >> beam.Read(beam.io.TextFileSource('./sensor1_121116.csv')) # How do you do this?? | 'convert to dictionary' >> beam.Map(lambda (k, v): {'luminosity': k, 'datetime': v}) | 'save' >> beam.Write( beam.io.BigQuerySink( output_table, schema='month:INTEGER, tornado_count:INTEGER', create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))) p.run()

2条回答

网友

1楼 · 编辑于 2024-05-19 02:11:32

作为巴勃罗职位的补充，我想和大家分享一下我对他的样品做的一些改动。（+1给你！）

更改： reader = csv.reader(self._file)到reader = csv.DictReader(self._file)

csv.DictReader使用CSV文件的第一行作为Dict键。其他行用于用dict的值填充每行dict。它会根据列顺序自动将正确的值放入正确的键。

一个小细节是Dict中的每个值都存储为字符串。如果对某些字段使用例如INTEGER，这可能会与BigQuery架构冲突。所以你以后要注意正确的选角。

网友

2楼 · 编辑于 2024-05-19 02:11:32

编辑：从版本2.12.0开始，Beam提供了新的fileio转换，允许您从CSV读取数据，而无需重新实现源代码。你可以这样做：

def get_csv_reader(readable_file):
  # You can return whichever kind of reader you want here
  # a DictReader, or a normal csv.reader.
  if sys.version_info >= (3, 0):
    return csv.reader(io.TextIOWrapper(readable_file.open()))
  else:
    return csv.reader(readable_file.open())

with Pipeline(...) as p:
  content_pc = (p
                | beam.io.fileio.MatchFiles("/my/file/name")
                | beam.io.fileio.ReadMatches()
                | beam.Reshuffle()  # Useful if you expect many matches
                | beam.FlatMap(get_csv_reader))

我最近为Apache Beam编写了一个测试。你可以看看the Github repository。

旧的答案依赖于重新实现源代码。这不再是推荐的主要方式：）

其思想是有一个返回解析的CSV行的源。您可以通过子类化FileBasedSource类以包含CSV解析来实现这一点。尤其是read_records函数看起来像这样：

class MyCsvFileSource(apache_beam.io.filebasedsource.FileBasedSource):
  def read_records(self, file_name, range_tracker):
    self._file = self.open_file(file_name)

    reader = csv.reader(self._file)

    for rec in reader:
      yield rec

相关问题更多 >

编程相关推荐

热门问题

热门文章