在使用TSfresh进行特征提取时,Fabric Notebook出现错误
我在微软Fabric的笔记本中运行TSFresh的extract_features
时,总是遇到一系列相同的错误。
Dependency not available for matrix_profile, this feature will be disabled!
Feature Extraction: 0%| | 0/20 [00:00<?, ?it/s]2024-03-26:07:49:13,37 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,36 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,95 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,103 ERROR [synapse_mlflow_utils.py:348] 'c'
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 345, in set_envs
config = MLConfig(sc)
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 128, in __init__
self.env_configs = self.get_mlflow_configs()
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 163, in get_mlflow_configs
region = self._get_spark_config("spark.cluster.region")
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 135, in _get_spark_config
value = self.sc.getConf().get(key, "")
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 2375, in getConf
conf.setAll(self._conf.getAll())
File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 238, in getAll
return [(elem._1(), elem._2()) for elem in cast(JavaObject, self._jconf).getAll()]
File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 238, in <listcomp>
return [(elem._1(), elem._2()) for elem in cast(JavaObject, self._jconf).getAll()]
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py", line 342, in get_return_value
return OUTPUT_CONVERTER[type](answer[2:], gateway_client)
KeyError: 'c'
2024-03-26:07:49:13,192 ERROR [synapse_mlflow_utils.py:349] ## Not In PBI Synapse Platform ##
2024-03-26:07:49:13,336 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,341 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,342 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,344 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,346 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,347 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,350 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,351 ERROR [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,357 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,348 ERROR [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,360 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>._call_endpoint exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,361 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,362 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>._call_endpoint exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,362 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>.create_run exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,364 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>.create_run exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,343 ERROR [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,371 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,369 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,369 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,372 ERROR [tracking_store.py:67] get_host_credentials fatal error
如果我让程序继续运行,最后几个错误会不断重复出现。
它还一直提到MLFlow,这是一个我知道在Fabric的笔记本中集成的包,但我并没有主动调用它。我尝试按照错误提示使用set_mlflow_env_config
,但在文档中找不到相关内容。
下面的示例代码正好复现了我的问题(来源:https://tsfresh.readthedocs.io/en/latest/text/quick_start.html)
import pandas as pd
import numpy as np
import tsfresh
from tsfresh import extract_features
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute
# Example Dataset from Tsfresh
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, load_robot_execution_failures
download_robot_execution_failures()
timeseries, y = load_robot_execution_failures()
#Extract features, in the style of the documentation: https://tsfresh.readthedocs.io/en/latest/text/quick_start.html
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
impute(extracted_features)
features_filtered = select_features(extracted_features, y)
我该如何解决这个问题,防止MLFlow干扰?
我也尝试过导入并设置一个MLFlow实验,看看是否能解决问题,但没有成功。它创建了很多我无法追溯的数据的ML模型,依然没有提取到我的特征。
我目前的最佳猜测是,TSFresh使用SKlearn或类似的工具来拟合特征,而MLFlow认为它应该进行跟踪。
1 个回答
0
结果发现,这段代码实际上是可以运行的。虽然MLflow不断报错,但似乎并没有影响TSFresh进行特征提取的能力。我的数据集太大了,所以处理起来花了不少时间,而所有的错误信息让进度变得不太明显。
不过,如果你想关闭mlflow(我强烈推荐这样做),可以使用以下方法(来源):
import mlflow
mlflow.autolog(disable=True)