在Polars数据框中丢失“类型”信息
抱歉,如果我的问题听起来不太合理。我在Python方面的经验不多。
我有一些代码,长得像这样:
import polars as pl
from typing import NamedTuple
class Event(NamedTuple):
name: str
description: str
def event_table(num) -> list[Event]:
events = []
for i in range(5):
events.append(Event("name", "description"))
return events
def pretty_string(events: list[Event]) -> str:
pretty = ""
for event in events:
pretty += f"{event.name}: {event.description}\n"
return pretty
# This does work
print(pretty_string(event_table(5)))
# But then it doesn't work if I have my `list[Event]` in a dataframe
data = {"events": [0, 1, 2, 3, 4]}
df = pl.DataFrame(data).select(events=pl.col("events").map_elements(event_table))
# This doesn't work
pretty_df = df.select(events=pl.col("events").map_elements(pretty_string))
print(pretty_df)
# Neither does this
print(pretty_string(df["events"][0]))
但是运行时出错了,错误信息是:
Traceback (most recent call last):
File "path/to/script.py", line 32, in <module>
pretty_df = df.select(events=pl.col("events").map_elements(pretty_string))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "path/to/.venv/lib/python3.11/site-packages/polars/dataframe/frame.py", line 8116, in select
return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "path/to/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1934, in collect
return wrap_df(ldf.collect())
^^^^^^^^^^^^^
polars.exceptions.ComputeError: AttributeError: 'dict' object has no attribute 'name'
看起来我的 list[Event]
在 df
里面不再存在了。我不太确定该怎么做才能让它正常工作。
1 个回答
2
你可以通过传递 return_dtype=pl.Object
来保留事件对象。
df.select(pl.col("events").map_elements(event_table))
shape: (5, 1)
┌───────────────────────────────────┐
│ events │
│ --- │
│ list[struct[2]] │
╞═══════════════════════════════════╡
│ [{"name","description"}, {"name"… │
│ [{"name","description"}, {"name"… │
│ [{"name","description"}, {"name"… │
│ [{"name","description"}, {"name"… │
│ [{"name","description"}, {"name"… │
└───────────────────────────────────┘
df.select(pl.col("events").map_elements(event_table, return_dtype=pl.Object))
shape: (5, 1)
┌───────────────────────────────────┐
│ events │
│ --- │
│ object │
╞═══════════════════════════════════╡
│ [Event(name='name', description=… │
│ [Event(name='name', description=… │
│ [Event(name='name', description=… │
│ [Event(name='name', description=… │
│ [Event(name='name', description=… │
└───────────────────────────────────┘