Pandas DataFrame从另一个DataFrame中查找最常见的字符串

for offer in all_offers: # get the interesting data and write to file title = offer.find('a', class_='offer-title__link').text.strip() price = offer.find('span', class_='offer-price__number').text.strip().replace(' ', '').replace('\nPLN', '').replace('\nEUR', '') location = offer.find('span', class_='ds-location-region').text.strip() item = [title, float(price.replace(",", ".") ),location] data.append(item) print(item)

for name in [ 'Alfa Romeo','Aston Martin', 'Audi',']: print('---', name,'pojemność od 2000cm3' '---') cars = df[df['title'].str.contains(name)] print('count:', len(cars)) print('price min :', cars['price'].min()) print('price average:', cars['price'].mean()) print('price max :', cars['price'].max()) cars.plot.hist(title=name) plt.show()

for voivodships in ['('(Śląskie)', '(Świętokrzyskie)', '(Warmińsko-mazurskie)', '(Wielkopolskie)','(Zachodniopomorskie)']: locations= df[df['location'].str.contains(voivodships)] print('---', voivodships, '---') print('ilość ogłoszeń w :', len(voivodships))

1条回答

网友

1楼 · 发布于 2024-05-14 10:18:14

我认为您不需要循环，也不应该使用循环

这里有一个只使用熊猫的解决方案。我扩展了你的数据集

df = pd.DataFrame.from_records([
    ['Mercedes-Benz Klasa S', 4390.0, '(Dolnośląskie)'],
    ['Mitsubishi Carisma 1.8', 3999.0, '(Pomorskie)'],
    ['Audi A4 1.6', 5790.0, '(Łódzkie)'],
    ['Mercedes-Benz Klasa S', 2390.0, '(Dolnośląskie)'],
    ['Mitsubishi Carisma 1.8', 3999.0, '(Dolnośląskie)'],
    ['Audi A6 2.0', 7000.0, '(Łódzkie)'],
    ['Mercedes-Benz Klasa S', 5390.0, '(Łódzkie)'],
    ['Audi A3 1.6', 4000.0, '(Dolnośląskie)'],
], columns=['make_model', 'price', 'region'])
df.region = df.region.str.slice(1,-1)
df['make'] = df.make_model.str.split(' ').str[0]
df.pivot_table(
    index=['region', 'make'], 
    aggfunc={'price':['mean', 'min', 'max'], 'make':'count'}
)

请参阅内联注释以获取进一步解释。结果:

                           count     max    mean     min
region       make                                       
Dolnośląskie Audi              1  4000.0  4000.0  4000.0
             Mercedes-Benz     2  4390.0  3390.0  2390.0
             Mitsubishi        1  3999.0  3999.0  3999.0
Pomorskie    Mitsubishi        1  3999.0  3999.0  3999.0
Łódzkie      Audi              2  7000.0  6395.0  5790.0
             Mercedes-Benz     1  5390.0  5390.0  5390.0

如果您只是想查看按地区划分的最常见的品牌，可以这样做：

df.groupby('region')['make'].agg([
    lambda x: x.value_counts().index[0],
    lambda x: x.value_counts().values[0]
])

在这里，您按区域分组，然后在此组中，计算make的值，并取第一个、最频繁的值

相关问题更多 >

编程相关推荐

热门问题

热门文章