在使用TSfresh进行特征提取时,Fabric Notebook出现错误

-1 投票
1 回答
22 浏览
提问于 2025-04-12 18:30

我在微软Fabric的笔记本中运行TSFresh的extract_features时,总是遇到一系列相同的错误。

Dependency not available for matrix_profile, this feature will be disabled!
Feature Extraction:   0%|          | 0/20 [00:00<?, ?it/s]2024-03-26:07:49:13,37 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,36 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,95 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,103 ERROR    [synapse_mlflow_utils.py:348] 'c'
Traceback (most recent call last):
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 345, in set_envs
    config = MLConfig(sc)
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 128, in __init__
    self.env_configs = self.get_mlflow_configs()
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 163, in get_mlflow_configs
    region = self._get_spark_config("spark.cluster.region")
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 135, in _get_spark_config
    value = self.sc.getConf().get(key, "")
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 2375, in getConf
    conf.setAll(self._conf.getAll())
  File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 238, in getAll
    return [(elem._1(), elem._2()) for elem in cast(JavaObject, self._jconf).getAll()]
  File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 238, in <listcomp>
    return [(elem._1(), elem._2()) for elem in cast(JavaObject, self._jconf).getAll()]
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py", line 342, in get_return_value
    return OUTPUT_CONVERTER[type](answer[2:], gateway_client)
KeyError: 'c'
2024-03-26:07:49:13,192 ERROR    [synapse_mlflow_utils.py:349] ## Not In PBI Synapse Platform ##
2024-03-26:07:49:13,336 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,341 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,342 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,344 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,346 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,347 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,350 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,351 ERROR    [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
    url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,357 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,348 ERROR    [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
    url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,360 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>._call_endpoint exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,361 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,362 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>._call_endpoint exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,362 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>.create_run exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,364 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>.create_run exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,343 ERROR    [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
    url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,371 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,369 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,369 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,372 ERROR    [tracking_store.py:67] get_host_credentials fatal error

如果我让程序继续运行,最后几个错误会不断重复出现。

它还一直提到MLFlow,这是一个我知道在Fabric的笔记本中集成的包,但我并没有主动调用它。我尝试按照错误提示使用set_mlflow_env_config,但在文档中找不到相关内容。

下面的示例代码正好复现了我的问题(来源:https://tsfresh.readthedocs.io/en/latest/text/quick_start.html

import pandas as pd
import numpy as np

import tsfresh
from tsfresh import extract_features
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute


# Example Dataset from Tsfresh
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, load_robot_execution_failures
download_robot_execution_failures()
timeseries, y = load_robot_execution_failures()

#Extract features, in the style of the documentation: https://tsfresh.readthedocs.io/en/latest/text/quick_start.html 
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
impute(extracted_features)
features_filtered = select_features(extracted_features, y)

我该如何解决这个问题,防止MLFlow干扰?

我也尝试过导入并设置一个MLFlow实验,看看是否能解决问题,但没有成功。它创建了很多我无法追溯的数据的ML模型,依然没有提取到我的特征。

我目前的最佳猜测是,TSFresh使用SKlearn或类似的工具来拟合特征,而MLFlow认为它应该进行跟踪。

1 个回答

0

结果发现,这段代码实际上是可以运行的。虽然MLflow不断报错,但似乎并没有影响TSFresh进行特征提取的能力。我的数据集太大了,所以处理起来花了不少时间,而所有的错误信息让进度变得不太明显。

不过,如果你想关闭mlflow(我强烈推荐这样做),可以使用以下方法(来源):

import mlflow
mlflow.autolog(disable=True)

撰写回答