如何在Python中正确删除制表符和拆分?

2024-04-18 11:20:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用下面的代码尝试使用Fpgrowth算法,但是当我要删除它们时,我将“”作为篮子中的项。正确的方法是什么?你知道吗

from pyspark.mllib.fpm import FPGrowth
from pyspark import SparkConf
from pyspark.context import SparkContext
sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))
data = sc.textFile("C:\\Users\\marka\\Downloads\\Assig2.txt")
data.map(lambda line: line.strip().split())
transactions = data.map(lambda line: line.strip().split('\t'))
#notempty = transactions.map(lambda x: x is not '')
unique = transactions.map(lambda x: list(set(x))).cache()
model = FPGrowth.train(unique, minSupport=0.7, numPartitions=10)
result = model.freqItemsets().collect()
for fi in result:
    print(fi)

输出:

FreqItemset(items=[''], freq=100)
FreqItemset(items=['Soap'], freq=99)
FreqItemset(items=['Soap', ''], freq=99)
FreqItemset(items=['Water'], freq=99)
FreqItemset(items=['Water', 'Soap'], freq=99)
FreqItemset(items=['Water', 'Soap', ''], freq=99)
FreqItemset(items=['Water', ''], freq=99)
FreqItemset(items=['Beer'], freq=88)
FreqItemset(items=['Beer', 'Water'], freq=88)
FreqItemset(items=['Beer', 'Water', 'Soap'], freq=88)
FreqItemset(items=['Beer', 'Water', 'Soap', ''], freq=88)
FreqItemset(items=['Beer', 'Water', ''], freq=88)
FreqItemset(items=['Beer', 'Soap'], freq=88)
FreqItemset(items=['Beer', 'Soap', ''], freq=88)
FreqItemset(items=['Beer', ''], freq=88)
FreqItemset(items=['Rock_Salt'], freq=80)
FreqItemset(items=['Rock_Salt', 'Water'], freq=79)
FreqItemset(items=['Rock_Salt', 'Water', 'Soap'], freq=79)
FreqItemset(items=['Rock_Salt', 'Water', 'Soap', ''], freq=79)
FreqItemset(items=['Rock_Salt', 'Water', ''], freq=79)
FreqItemset(items=['Rock_Salt', 'Soap'], freq=79)
FreqItemset(items=['Rock_Salt', 'Soap', ''], freq=79)

enter image description here


Tags: lambdafromimportmapdatalineitemssoap
1条回答
网友
1楼 · 发布于 2024-04-18 11:20:14
>>> s = 'Rock_Salt\tFlashlight\t\tWater\t\t'
>>> s.split('\t')
['Rock_Salt', 'Flashlight', '', 'Water', '', '']
>>> import re
>>> re.split(r'[\t]+', s)
['Rock_Salt', 'Flashlight', 'Water', '']

# potential solutions?
>>> [a for a in s.split('\t') if a]
['Rock_Salt', 'Flashlight', 'Water']
>>> list(filter(None, s.split('\t')))
['Rock_Salt', 'Flashlight', 'Water']
>>> s.split()
['Rock_Salt', 'Flashlight', 'Water']

相关问题 更多 >