提取带有前两个字符串的行值

2 投票
3 回答
79 浏览
提问于 2025-04-14 18:16

大家好 :) 祝你们有个好日子!

我正在尝试用Python和Pandas做一些事情。

这是一个CSV文件,当用户输入像10.50.x.x这样的地址(比如10.50.155.12)时,代码应该能找到CSV文件中前两个部分是10.50的行,并输出“china”。我也附上了代码,我尝试了很多次,最后才做到这里。

import pandas as pd
df = pd.read_csv('test.csv', names=['col1','col2'], sep=',', index_col=[0])
#print(df)

def ipsheet():
    val = input ('Enter the ip address :')
    #out = df.loc[val, 'col2']
    #print (out)
    val1 = val


    val1 = ".".join(val.split(".")[0:2])
    print(val1)

    for row in df.iterrows:
        print(row)
        if row[0].startswith(val1):
            return row[1]

ipsheet()

在这里输入图片描述

3 个回答

0

如果你只是想遍历一些行并查找某个值,Python自带的csv模块和它的DictReader可以很轻松地做到这一点。

首先导入csv模块,然后使用DictReader把你的CSV文件转换成一个字典的列表。看起来你的数据没有表头,所以你需要告诉DictReader每一列的名称,可以用fieldnames=这个关键词来指定。接下来,我建议你先提取出只包含IP网络部分的分隔符:

import csv

def netpart(x: str) -> str:
    return ".".join(x.split(".")[:2])


val = input("Enter the ip address: ").strip()
np = netpart(val)


with open("input.csv", newline="") as f:
    reader = csv.DictReader(f, fieldnames=["IP", "Country"])
    data = list(reader)

for row in data:
    if netpart(row["IP"]) == np:
        print(row["Country"])

数据变量就是一个字典的列表:

[
   {"IP": "10.10.0.1", "Country": "Columbia"},
   {"IP": "10.20.0.1", "Country": "Australia"},
   {"IP": "10.30.0.1", "Country": "Singapore"},
   ...
]
1

你只需要在iterrows后面加上括号,或者你可以使用itertuples()这个方法。

import pandas as pd
df = pd.read_csv('test.csv', names=['col1','col2'], sep=',', index_col=[0])
# print(df)

def ipsheet():
        val = input ('Enter the ip address :')
        out = df.loc[val, 'col2']
        # print (out)
        val1 = val
        val1 = ".".join(val.split(".")[0:2])
        # print(val1)
        # you need to use parentheses after iterrows or you can use itertuples() method
        # for row in df.iterrows():
        for row in df.itertuples():
            # print(row)
                if row[0].startswith(val1):
                        return row[1]

country = ipsheet()
print(country)
2

使用:

import pandas as pd

#create DataFrame with 2 columns
df = pd.read_csv('test.csv', names=['col1','col2'])

def ipsheet():
    val = input ('Enter the ip address :')

    #get first 2 inteegrs with . between
    df['col1'] = df['col1'].str.extract('(^\d+\.\d+)', expand=False)
    
    #create dictionary
    d = df.set_index('col1')['col2'].to_dict()
    
    #if match get value from dictionary
    return d.get(".".join(val.split(".")[:2]), 'no match')

out = ipsheet()
print (out)

编辑:如果前3部分和最后2部分没有匹配,按IP地址的前4部分匹配的解决方案是:

#sample DataFrame
print (df)
             col1             col2
0    10.54.0.0/16            chine
1  10.144.0.10/16            japan
2  10.144.0.11/16   hongkong /a234
3  171.18.48.0/24  australine gwan
4  171.18.49.0/24        india dmk
5  10.54.192.0/23     pakistan mmk
6  171.18.51.0/24     sdwan router
7     10.3.0.0/19      first route
import re

def ipsheet():

    val = input ('Enter the ip address :')

    #find first 4, 3, 2 parts of ip address
    for i in range(4, 1, -1): #[4,3,2]
        v = val.split('.')[:i]

        #extract parts from column col1, if no match created NaNs
        df['col3'] = df['col1'].str.extract(rf'({re.escape(".".join(v))})', expand=False)
        s = ','.join(df.loc[df['col3'].notna(), 'col2'])

        #if found return joined strings from col2 column
        if s:
            return s
        
    #if no match any strings return default string - 'no match'
    else:
        return 'no match'

测试:

#171.18.0.0
out = ipsheet()
print (out)
australine gwan,india dmk,sdwan router

#10.144.0.0
out = ipsheet()
print (out)
japan,hongkong /a234

#171.18.51.0
out = ipsheet()
print (out)
sdwan router

#10.10.51.0
out = ipsheet()
print (out)
no match

撰写回答