如何在python中创建一个数组，用于在字符串中搜索特定的标记并将输出放入return

def get(x): up, up1, up2, up3, up4 = "" ,"" ,"","" , "" x = x.split(", ") for i in x: if "Up_" in i: # print(i) up = str(i) + ', ' if "Up1_" in i: # print(i) up1 = str(i) + ', ' if "Up2_" in i: # print(i) up2 = str(i) + ', ' if "Up3_" in i: # print(i) up3 = str(i) + ', ' if "Up4_" in i: # print(i) up4 = str(i) + ', ' return (str(up) + str(up1) + str(up2) + str(up3) + str(up4))[:-2]

+---+----------+-------+------------+-----------+--------------+ | product_id | sku | total_sold | tags | total_images | +---+----------+-------+------------+-----------+--------------+ | geggre | rgerg | 456 | Up1_, Up2 | 5 | +---+----------+-------+------------+-----------+--------------+

+---+----------+-------+------------+-----------+--------------+-------+ | product_id | sku | total_sold | tags | total_images | Count | +---+----------+-------+------------+-----------+--------------+-------+ | ggeggre | rgerg | 456 | Up1_, Up2 | 5 | 2 | +---+----------+-------+------------+-----------+--------------+-------+

# impoting padas module with an alias of pd import pandas as pd # get function assigned to x (x values: up, up1, up2, up3, up4) def get(x): up, up1, up2, up3, up4 = "" ,"" ,"","" , "" x = x.split(", ") for i in x: if "Up_" in i: # print(i) up = str(i) + ', ' if "Up1_" in i: # print(i) up1 = str(i) + ', ' if "Up2_" in i: # print(i) up2 = str(i) + ', ' if "Up3_" in i: # print(i) up3 = str(i) + ', ' if "Up4_" in i: # print(i) up4 = str(i) + ', ' # returns the values within a string if each maches, it also removed -2 characters return (str(up) + str(up1) + str(up2) + str(up3) + str(up4))[:-2] # data contains the content of the data200.csv file using pandas read_csv function data = pd.read_csv('data200.csv') #defines the tags column to equal what up_ tags are in the tags column using the get function data['tags'] = data['tags'].apply(get) # data = data[ (data['tags'] == "") == False] #creates a new column called total_tags and returns a count of how many elements are between commas data["total_tags"] = data["tags"].apply(lambda x : len(x.split(','))) # prints first 5 lines of csv print(data.head()) # exports everything to test.csv and removes the index column data.to_csv("test.csv", index = False)

1条回答

网友

1楼 · 发布于 2024-05-28 20:17:20

可以使用正则表达式：

import re

def get(x):
    x = x.split(", ")
    out_str = ''
    for tag in x:
        if re.search("^Up\d*_", tag):
            t = re.match("^Up\d*_", tag)
            t = t.group(0)
            out_str += t + ','
    return out_str[:-1]
print(get("Up1_, AS3_, Up2_, Up_, AS_"))

输出：

Up1_,Up2_,Up_

这就是你要找的吗？如果您只需要标记中的数字0-9，可以将regex中的*更改为?：

if re.search("^Up\d?_", tag):
     t = re.match("^Up\d?_", tag)

编辑：

在你编辑之后，我更明白你的意思，你可以简单地做：

data['tags'] = data['tags'].apply(lambda x : ",".join(re.findall("Up\d*_", x)))

或：

data['tags'] = data['tags'].apply(lambda x : ",".join(re.findall("Up\d?_", x)))

取决于在Up和_之间最多只需要一个数字，或者是否允许任何数字。请注意，在findall()方法中，^被删除，因为我们不仅从字符串的开头搜索，而且在整个字符串中搜索所有出现的情况。你知道吗

编辑2：

好吧，总结一下这些评论和从这些评论中获得的附加信息，你可能想要这样的东西：

data['tags'] = data['tags'].apply(lambda x : ",".join(re.findall("[Uu]p\d?_\S*(?=,)", x)))

编辑：

编辑2：

相关问题更多 >

编程相关推荐

热门问题

热门文章