如果对应行有关键字,则提取时间戳

2024-06-16 12:12:09 发布

您现在位置:Python中文网/ 问答频道 /正文

每当col1有关键字时,我想在col2中提取时间戳

keywords=["i can help you with that", "i can surely help you with that", "i can check and help you with that", "i will be more than happy to help you", "let me assist you on this", "to assist you better"]

给定excel数据

    col1                                                                                                                            
1.agent enters(as arrin)
2.
3.I'll be happy to assist you. Give me a moment to review your request.
4.I see that the light in your Modem is Blinking Red. Am I right ?
5.Thank you for the detailed information.
6.Please do not worry.
7.Don't worry johny. I can help you with that.
8.Let me connect this chat to the concern team to help you out with this, 
  Please stay connected.

   col2
1. 2018-10-14 21:16:58
2. 2018-10-14 21:17:00
3. 2018-10-14 21:17:40
4. 2018-10-14 21:18:25
5. 2018-10-14 21:19:39
6. 2018-10-14 21:19:43
7. 2018-10-14 21:21:04
8. 2018-10-14 21:22:00

例如,其中一个关键字出现在第7行,因此应该提取col2中相应的时间戳

输出应如下所示

[out]: 2018-10-14 21:21:04

提前谢谢


Tags: thetoyouthatwith时间help关键字
2条回答

给予

keywords=[
    "i can help you with that", 
    "i can surely help you with that", 
    "i can check and help you with that", 
    "i will be more than happy to help you", 
    "let me assist you on this", 
    "to assist you better"
]

col1 = [
    "agent enters(as arrin)",
    "",
    "I'll be happy to assist you. Give me a moment to review your request.",
    "I see that the light in your Modem is Blinking Red. Am I right ?",
    "Thank you for the detailed information.",
    "Please do not worry.",
    "Don't worry johny. I can help you with that.",
    "Let me connect this chat to the concern team to help you out with this, Please stay connected."
]

col2 = [
    '2018-10-14 21:16:58',
    '2018-10-14 21:17:00',
    '2018-10-14 21:17:40',
    '2018-10-14 21:18:25',
    '2018-10-14 21:19:39',
    '2018-10-14 21:19:43',
    '2018-10-14 21:21:04',
    '2018-10-14 21:22:00'
]

您可以运行:

for i, col in enumerate(col1): 
    if any([keyword in col for keyword in keywords]): 
        print(col2[i]) 

any使您能够简洁地测试字符串中是否出现任何关键字

根据需要,您可能希望在搜索之前将字符串转换为小写,或者在for-loop中执行类似于以下操作:

for i, col in enumerate(col1): 
    if any([keyword.lower() in col.lower() for keyword in keywords]): 
        print(col2[i]) 

这应该管用

要么全部改为大写或小写,因为它是区分大小写的。小心点,因为标点符号可能也需要处理

import pandas as pd

keywords=["i can help you with that", "i can surely help you with that", "i can check and help you with that", "i will be more than happy to help you", "let me assist you on this", "to assist you better"]

############## Read in excel file ##########################
col1 = ["agent enters(as arrin)",
"",
"I'll be happy to assist you. Give me a moment to review your request.",
"I see that the light in your Modem is Blinking Red. Am I right ?",
"Thank you for the detailed information.",
"Please do not worry.",
"Don't worry johny. I can help you with that.",
"Let me connect this chat to the concern team to help you out with this, Please stay connected."]

col2 = ['2018-10-14 21:16:58',
'2018-10-14 21:17:00',
'2018-10-14 21:17:40',
'2018-10-14 21:18:25',
'2018-10-14 21:19:39',
'2018-10-14 21:19:43',
'2018-10-14 21:21:04',
'2018-10-14 21:22:00']

df = pd.DataFrame()
df['col1'] = col1
df['col2'] = col2

#####################################################

# lower case keywords and col1 strings
lower_keywords = [x.lower() for x in keywords]
df['low_col1'] = df['col1'].str.lower()

df_filter = df[df['low_col1'].str.contains('|'.join(lower_keywords))]

print (df_filter['col2'])

输出:

In  [38]: print (df_filter['col2'])
Out [38]: 6    2018-10-14 21:21:04
          Name: col2, dtype: object

相关问题 更多 >