如何在pyspark中读取简单的字符串文本文件?

2024-05-14 00:49:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个字符串列表保存在文本文件中,没有标题,我想在pyspark笔记本的databricks中打开并打印所有行

abcdef 
vcdfgrs 
vcvdfrs 
vfdedsew 
kgsldkflfdlfd

text = sc.textFile("path.../filename.txt)
print(text.collect()) 

此代码不是打印行。我感谢你的帮助


Tags: 字符串text标题列表笔记本pysparksc文本文件
1条回答
网友
1楼 · 发布于 2024-05-14 00:49:00

来了

#define a function which takes line and print
def f(line):
    print(line)

#building the text file via list
my_list = [['my text line-1'],['line-2 text2 my2'],['some junk line-3']]

#create RDD via list (you have it via 
txt_file = sc.parallelize(my_list)

#use for each to call the function and print will work
txt_file.foreach(f)

#if you want each word via line, use flatmap

enter image description here

相关问题 更多 >