目前我有以下python代码:
def get(x):
up, up1, up2, up3, up4 = "" ,"" ,"","" , ""
x = x.split(", ")
for i in x:
if "Up_" in i:
# print(i)
up = str(i) + ', '
if "Up1_" in i:
# print(i)
up1 = str(i) + ', '
if "Up2_" in i:
# print(i)
up2 = str(i) + ', '
if "Up3_" in i:
# print(i)
up3 = str(i) + ', '
if "Up4_" in i:
# print(i)
up4 = str(i) + ', '
return (str(up) + str(up1) + str(up2) + str(up3) + str(up4))[:-2]
尽管这个函数对于我目前所拥有的很好,但是如果要添加的任何标签包含从5到10的内容,那么这个函数将停止工作。你知道吗
我要做的是创建一个函数,在“tags”列中搜索任何包含“Up”&;“Up*.*”的标记(在SQL术语中,将返回任何值介于Up&;之间的内容)。不确定python中是否有这方面的功能)然后将数组找到的任何内容放在另一个只包含Up和Up*\标记的数组中,然后将其应用于另一列。你知道吗
+---+----------+-------+------------+-----------+--------------+
| product_id | sku | total_sold | tags | total_images |
+---+----------+-------+------------+-----------+--------------+
| geggre | rgerg | 456 | Up1_, Up2 | 5 |
+---+----------+-------+------------+-----------+--------------+
希望它看起来像:
+---+----------+-------+------------+-----------+--------------+-------+
| product_id | sku | total_sold | tags | total_images | Count |
+---+----------+-------+------------+-----------+--------------+-------+
| ggeggre | rgerg | 456 | Up1_, Up2 | 5 | 2 |
+---+----------+-------+------------+-----------+--------------+-------+
感谢另一位用户,我已经有了count标签:
data[“total_tags”]=data[“tags”].apply(lambda x:len(x.split(','))
我只需要知道如何创建上面的数组来简化if语句,并让它包含多达10个标记。你知道吗
另外,这是我的python,它使用get并附加“tags”列以仅包括Up标记:
data['tags'] = data['tags'].apply(get)
上下文的完整脚本:
# impoting padas module with an alias of pd
import pandas as pd
# get function assigned to x (x values: up, up1, up2, up3, up4)
def get(x):
up, up1, up2, up3, up4 = "" ,"" ,"","" , ""
x = x.split(", ")
for i in x:
if "Up_" in i:
# print(i)
up = str(i) + ', '
if "Up1_" in i:
# print(i)
up1 = str(i) + ', '
if "Up2_" in i:
# print(i)
up2 = str(i) + ', '
if "Up3_" in i:
# print(i)
up3 = str(i) + ', '
if "Up4_" in i:
# print(i)
up4 = str(i) + ', '
# returns the values within a string if each maches, it also removed -2 characters
return (str(up) + str(up1) + str(up2) + str(up3) + str(up4))[:-2]
# data contains the content of the data200.csv file using pandas read_csv function
data = pd.read_csv('data200.csv')
#defines the tags column to equal what up_ tags are in the tags column using the get function
data['tags'] = data['tags'].apply(get)
#
data = data[ (data['tags'] == "") == False]
#creates a new column called total_tags and returns a count of how many elements are between commas
data["total_tags"] = data["tags"].apply(lambda x : len(x.split(',')))
# prints first 5 lines of csv
print(data.head())
# exports everything to test.csv and removes the index column
data.to_csv("test.csv", index = False)
可以使用正则表达式:
输出:
这就是你要找的吗?如果您只需要标记中的数字0-9,可以将regex中的
*
更改为?
:编辑:
在你编辑之后,我更明白你的意思,你可以简单地做:
或:
取决于在
Up
和_
之间最多只需要一个数字,或者是否允许任何数字。请注意,在findall()
方法中,^
被删除,因为我们不仅从字符串的开头搜索,而且在整个字符串中搜索所有出现的情况。你知道吗编辑2:
好吧,总结一下这些评论和从这些评论中获得的附加信息,你可能想要这样的东西:
相关问题 更多 >
编程相关推荐