如何对同一个字符串进行多次迭代分区?

2024-03-28 23:45:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个未知长度的字符串,可以重复感兴趣的模式任意次数。 字符串如下所示:

blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah

blahblahblahblahblahblahblahblahblahblah

JOHNNYSMITH has entered the above notes on 12/05/2017 14:18 blahblahblahblahblahblahblahblahblahblahblahblahblahblahblah

blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah

JOHNNYSMITH has entered the above notes on 12/05/2017 14:19

SARAHJOHNSON has entered the above notes on 12/05/2017 17:45 blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah

SARAHJOHNSON has entered the above notes on 12/05/2017 17:46

我正在尝试将注释、用户名和日期分开,以创建一个外观更好的注释框(带有一些css)。下面是我要分开的用户名

before_keyword, keyword, after_keyword = stringg.partition("has entered the above notes on ")
namedate = before_keyword.split()[-1] + "--" + after_keyword.split()[0] + after_keyword.split()[1]
comment = before_keyword.replace(before_keyword.split()[-1], '').rstrip()
print comment
print namedate

这适用于上面注释中输入的用户名的第一种情况。如何遍历字符串以收集字符串中的所有注释/用户名/日期并分别打印出来。你知道吗

谢谢。你知道吗

编辑:输入假名字而不是用户名2389来显示我的名字是如何出现的。你知道吗


Tags: the字符串onkeyword用户名abovenotessplit
3条回答

Bernz的解决方案奏效了,我使用了如下所示的代码。数据摔跤手的答案也会起作用。你知道吗

for line in stringg.split('\n'):
      if re.findall('(\w+) has entered the above notes on (\d{1,2}/\d{1,2}/\d{4}) (\d{1,2}:\d{1,2})', line):
        print line.split()[0] + " " + line.split()[-2] + line.split()[-1]
      else:
        print line

您只需遍历这些行,创建一个文本占位符,当用户名点击时,将其附加到数据框中,这样最后就有了一个漂亮的、可操作的数据集。你也可以直接转换日期时间,这样你就可以分析更多的时间,日期等

import re
import pandas as pd
from datetime import datetime

string = """
    blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah

    blahblahblahblahblahblahblahblahblahblah

    USERNAME2398 has entered the above notes on 12/05/2017 14:18 

    blahblahblahblahblahblahblahblahblahblahblahblahblahblahblah

    blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah

    USERNAME2839 has entered the above notes on 12/05/2017 14:19

    USERNAME7348 has entered the above notes on 12/05/2017 17:45 

    blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah

    USERNAME857 has entered the above notes on 12/05/2017 17:46
    """


# define regex for username matching
username = re.compile('USERNAME.*?\s', re.IGNORECASE)
# define regex for datetime matching
datetime_re = re.compile('[0-9]{1,2}\/[0-9]{1,2}\/(20|19)[0-9]{2}\s[0-9]{1,2}\:[0-9]{1,2}')
# create placeholder datfarame
masterdf = pd.DataFrame()
# define text placeholder
cur_text = ''
for line in string.split('\n'):
    if datetime_re.search(line) and all([x.isupper() for x in line.split()[0]]):
        # pull out username
        cur_user = line.split()[0].strip()# username.search(line).group(0)
        # pull out datetime
        cur_datetime = datetime_re.search(line).group(0)
        # convert to datetime object
        cur_datetime = datetime.strptime(cur_datetime, '%m/%d/%Y %H:%M')
        # create row to append to dataframe
        row = pd.DataFrame({'user': cur_user,
                       'datetime': cur_datetime,
                       'text': cur_text}, index = [0])
        # append row to dataframe
        masterdf = masterdf.append(row)
        # reinit cur_text
        cur_text = ''

    else:
        # if not a username line, continue appending the commentary for the user
        cur_text += line

我会用正则表达式来做这个。你知道吗

只需循环遍历每一行(FOREACH),然后测试该行的表达式:

(USERNAME\S*) has entered the above notes on (\d{1,2}/\d{1,2}/\d{4}) (\d{1,2}:\d{1,2})

如果这一行匹配,你有你的3个信息(括号中):用户名,日期和时间。将前面的行存储在一个数组(缓冲区)中,这样就可以得到文本。你知道吗

相关问题 更多 >