提取带有前两个字符串的行值
大家好 :) 祝你们有个好日子!
我正在尝试用Python和Pandas做一些事情。
这是一个CSV文件,当用户输入像10.50.x.x这样的地址(比如10.50.155.12)时,代码应该能找到CSV文件中前两个部分是10.50的行,并输出“china”。我也附上了代码,我尝试了很多次,最后才做到这里。
import pandas as pd
df = pd.read_csv('test.csv', names=['col1','col2'], sep=',', index_col=[0])
#print(df)
def ipsheet():
val = input ('Enter the ip address :')
#out = df.loc[val, 'col2']
#print (out)
val1 = val
val1 = ".".join(val.split(".")[0:2])
print(val1)
for row in df.iterrows:
print(row)
if row[0].startswith(val1):
return row[1]
ipsheet()
3 个回答
0
如果你只是想遍历一些行并查找某个值,Python自带的csv模块和它的DictReader可以很轻松地做到这一点。
首先导入csv模块,然后使用DictReader把你的CSV文件转换成一个字典的列表。看起来你的数据没有表头,所以你需要告诉DictReader每一列的名称,可以用fieldnames=
这个关键词来指定。接下来,我建议你先提取出只包含IP网络部分的分隔符:
import csv
def netpart(x: str) -> str:
return ".".join(x.split(".")[:2])
val = input("Enter the ip address: ").strip()
np = netpart(val)
with open("input.csv", newline="") as f:
reader = csv.DictReader(f, fieldnames=["IP", "Country"])
data = list(reader)
for row in data:
if netpart(row["IP"]) == np:
print(row["Country"])
数据变量就是一个字典的列表:
[ {"IP": "10.10.0.1", "Country": "Columbia"}, {"IP": "10.20.0.1", "Country": "Australia"}, {"IP": "10.30.0.1", "Country": "Singapore"}, ... ]
1
你只需要在iterrows后面加上括号,或者你可以使用itertuples()这个方法。
import pandas as pd
df = pd.read_csv('test.csv', names=['col1','col2'], sep=',', index_col=[0])
# print(df)
def ipsheet():
val = input ('Enter the ip address :')
out = df.loc[val, 'col2']
# print (out)
val1 = val
val1 = ".".join(val.split(".")[0:2])
# print(val1)
# you need to use parentheses after iterrows or you can use itertuples() method
# for row in df.iterrows():
for row in df.itertuples():
# print(row)
if row[0].startswith(val1):
return row[1]
country = ipsheet()
print(country)
2
使用:
import pandas as pd
#create DataFrame with 2 columns
df = pd.read_csv('test.csv', names=['col1','col2'])
def ipsheet():
val = input ('Enter the ip address :')
#get first 2 inteegrs with . between
df['col1'] = df['col1'].str.extract('(^\d+\.\d+)', expand=False)
#create dictionary
d = df.set_index('col1')['col2'].to_dict()
#if match get value from dictionary
return d.get(".".join(val.split(".")[:2]), 'no match')
out = ipsheet()
print (out)
编辑:如果前3部分和最后2部分没有匹配,按IP地址的前4部分匹配的解决方案是:
#sample DataFrame
print (df)
col1 col2
0 10.54.0.0/16 chine
1 10.144.0.10/16 japan
2 10.144.0.11/16 hongkong /a234
3 171.18.48.0/24 australine gwan
4 171.18.49.0/24 india dmk
5 10.54.192.0/23 pakistan mmk
6 171.18.51.0/24 sdwan router
7 10.3.0.0/19 first route
import re
def ipsheet():
val = input ('Enter the ip address :')
#find first 4, 3, 2 parts of ip address
for i in range(4, 1, -1): #[4,3,2]
v = val.split('.')[:i]
#extract parts from column col1, if no match created NaNs
df['col3'] = df['col1'].str.extract(rf'({re.escape(".".join(v))})', expand=False)
s = ','.join(df.loc[df['col3'].notna(), 'col2'])
#if found return joined strings from col2 column
if s:
return s
#if no match any strings return default string - 'no match'
else:
return 'no match'
测试:
#171.18.0.0
out = ipsheet()
print (out)
australine gwan,india dmk,sdwan router
#10.144.0.0
out = ipsheet()
print (out)
japan,hongkong /a234
#171.18.51.0
out = ipsheet()
print (out)
sdwan router
#10.10.51.0
out = ipsheet()
print (out)
no match