在Polars数据框中丢失“类型”信息

2 投票
1 回答
58 浏览
提问于 2025-04-14 15:59

抱歉,如果我的问题听起来不太合理。我在Python方面的经验不多。

我有一些代码,长得像这样:

import polars as pl
from typing import NamedTuple


class Event(NamedTuple):
    name: str
    description: str


def event_table(num) -> list[Event]:
    events = []
    for i in range(5):
        events.append(Event("name", "description"))
    return events


def pretty_string(events: list[Event]) -> str:
    pretty = ""
    for event in events:
        pretty += f"{event.name}: {event.description}\n"
    return pretty

# This does work
print(pretty_string(event_table(5)))

# But then it doesn't work if I have my `list[Event]` in a dataframe
data = {"events": [0, 1, 2, 3, 4]}
df = pl.DataFrame(data).select(events=pl.col("events").map_elements(event_table))

# This doesn't work
pretty_df = df.select(events=pl.col("events").map_elements(pretty_string))
print(pretty_df)

# Neither does this
print(pretty_string(df["events"][0]))

但是运行时出错了,错误信息是:

Traceback (most recent call last):
  File "path/to/script.py", line 32, in <module>
    pretty_df = df.select(events=pl.col("events").map_elements(pretty_string))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/to/.venv/lib/python3.11/site-packages/polars/dataframe/frame.py", line 8116, in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/to/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1934, in collect
    return wrap_df(ldf.collect())
                   ^^^^^^^^^^^^^
polars.exceptions.ComputeError: AttributeError: 'dict' object has no attribute 'name'

看起来我的 list[Event]df 里面不再存在了。我不太确定该怎么做才能让它正常工作。

1 个回答

2

你可以通过传递 return_dtype=pl.Object 来保留事件对象。

df.select(pl.col("events").map_elements(event_table))
shape: (5, 1)
┌───────────────────────────────────┐
│ events                            │
│ ---                               │
│ list[struct[2]]                   │
╞═══════════════════════════════════╡
│ [{"name","description"}, {"name"… │
│ [{"name","description"}, {"name"… │
│ [{"name","description"}, {"name"… │
│ [{"name","description"}, {"name"… │
│ [{"name","description"}, {"name"… │
└───────────────────────────────────┘
df.select(pl.col("events").map_elements(event_table, return_dtype=pl.Object))
shape: (5, 1)
┌───────────────────────────────────┐
│ events                            │
│ ---                               │
│ object                            │
╞═══════════════════════════════════╡
│ [Event(name='name', description=… │
│ [Event(name='name', description=… │
│ [Event(name='name', description=… │
│ [Event(name='name', description=… │
│ [Event(name='name', description=… │
└───────────────────────────────────┘

撰写回答