如何处理Pyspark中数据科学问题的异常

def rename_columnsName(df, columns): #provide names in dictionary format if isinstance(columns, dict): for old_name, new_name in columns.items(): df = df.withColumnRenamed(old_name, new_name) return df.show() else: raise ValueError("'columns' should be a dict, like {'old_name':'new_name', 'old_name_one more':'new_name_1'}")

2条回答

网友

1楼 · 编辑于 2024-06-08 04:18:35

下面是一个如何测试抛出异常的PySpark函数的示例。在本例中，我们正在验证排序顺序为"cats"时是否引发异常

def it_throws_an_error_if_the_sort_order_is_invalid(spark):
    source_df = spark.create_df(
        [
            ("jose", "oak", "switch"),
            ("li", "redwood", "xbox"),
            ("luisa", "maple", "ps4"),
        ],
        [
            ("name", StringType(), True),
            ("tree", StringType(), True),
            ("gaming_system", StringType(), True),
        ]
    )
    with pytest.raises(ValueError) as excinfo:
        quinn.sort_columns(source_df, "cats")
    assert excinfo.value.args[0] == "['asc', 'desc'] are the only valid sort orders and you entered a sort order of 'cats'"

请注意，测试正在验证所提供的特定错误消息

您可以向rename_columnsName函数提供无效输入，并验证错误消息是否符合预期

其他一些提示：

按照示例重命名列here和here。你不应该在循环中调用withColumnRenamed
使用the standard transform format编写数据帧转换，以便它们可以与数据帧#转换链接
使用pytest-describe组织这些类型的测试
查看this test file以获得一系列示例

网友

2楼 · 编辑于 2024-06-08 04:18:35

我找到了这个问题的解决方案，我们可以像python一样在Pyspark中处理异常。例如：

def rename_columnsName(df, columns):#provide names in dictionary format
try:

   if isinstance(columns, dict):
      for old_name, new_name in columns.items():     
    
           df = df.withColumnRenamed(old_name, new_name)
return df.show()
   else:
         raise ValueError("'columns' should be a dict, like {'old_name':'new_name', 
                'old_name_one more':'new_name_1'}")
except Exception as e:
      print(e)

相关问题更多 >

编程相关推荐

热门问题

热门文章