'Pandas Dataframe 循环并从数据库中更新'

2024-06-16 08:25:58 发布

男 | 程序猿一只，喜欢编程写python代码。

给定一个数据帧df，如下所示：

p_id | sales | salesperson | year
1    | 10,000| None        | 2017
2    | 15,000| None        | 2016
5    | 7,000 | None        | 2014
5    | 3,000 | None        | 2015

存在一个SQL表persons，如下所示：

p_id | p_name        | from_year | to_year
1    | Brian Griffin | 2017      | Null
2    | Quagmire      | 2016      | Null
5    | Cleveland     | 2014      | 2015
5    | Lois Griffin  | 2015      | Null

我正在尝试从SQL表填充dataframe中丢失的数据。一个p\ U id可以重复使用，只要一次只有一个人使用。你知道吗

我所做的是：

for index, row in df.iterrows():
     df.at[index, 'salesperson'] = fetch_name(row['p_id'], row['year'])

def fetch_name(pid, year):
     meta = sqlalchemy.MetaData()
     persons = sqlalchemy.Table('persons', meta, autoload=True, autoload_with=data_engine)

     stmt = sqlalchemy.select([persons.c.p_name]).where(
            and_(persons.c.p_id == pid, and_(year >= persons.c.from_year, 
            or_(year < persons.c.to_year, persons.c.to_year.is_(None))))

     name = data_engine.execute(stmt).scalar()

     return name

这个很好，但速度很慢。对于30000行的数据帧，映射和填充丢失的数据大约需要20分钟。你知道吗

有没有更好的方法达到同样的效果？你知道吗

Tags： to 数据 name from none id df sql

0条回答

目前没有回答

'Pandas Dataframe 循环并从数据库中更新'

相关问题更多 >

编程相关推荐

热门问题

热门文章

'Pandas Dataframe 循环并从数据库中更新'

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >