源代码（机器学习）（Python）输出不同

2 投票

1 回答

25 浏览

提问于 2025-04-13 00:59

我现在正在做一个小的图像机器学习项目。我找到了一位朋友的Kaggle代码，然后尝试从头开始复制它。不过，在主要部分我就遇到了一个错误。

我觉得可能是我这边的设置有问题，但我搞不清楚具体是什么。

我的代码是：

#Import Libraries

#Data processing modules
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import cv2
#File directory modules
import glob as gb
import os
#Training and testing (machine learning) modules
import tensorflow as tf 
import keras

#Importing the images into the code

trainDataset = 'melanoma_cancer_dataset/train'
testDataset = 'melanoma_cancer_dataset/test'
predictionDataset = 'melanoma_cancer_dataset/skinTest'

#creating empty lists for the images to fall into for processing
training_List = []
testing_list = []
#making a classification dictionary for the two keys, benign and malignant
#used for inserting into the images
diction = {'benign' : 0, 'malignant' : 1}

#Read through the folder's length contents
for folder in os.listdir(trainDataset):
    data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
    print(f'{len(data)} in folder {folder}')
    #read the images, resize them in a uniform order, and store them in the empty lists
    for data in data:
        image = cv2.imread(data)
        imageList = cv2.resize(image(120,120))
        training_List.append(list(imageList))

笔记本的输出显示文件夹里存储的图片/内容数量为0。现在我有点怀疑发生了什么，希望能得到一些解答。谢谢大家。我也在用自己的VScode。

这是我文件的截图：

图像处理文件管理计算机视觉错误调试机器学习 kaggle VSCode 数据集

1 个回答

根据你提供的文件夹结构和代码，问题在于你没有在文件夹路径的末尾加上斜杠。
在你提供的代码中，你试图直接把文件夹名称和路径连接在一起。但是，如果你漏掉了斜杠，或者文件夹变量没有包含末尾的斜杠，就会导致路径不正确。

你可以这样更新路径：

trainDataset = 'melanoma_cancer_dataset/train/'
testDataset = 'melanoma_cancer_dataset/test/'
predictionDataset = 'melanoma_cancer_dataset/skinTest/'

你的代码在这里做的事情是：

for folder in os.listdir(trainDataset):
    data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))

它是去训练数据集的路径，然后使用 os.listdir() 列出那里的文件夹（这些文件夹叫做 malignant 和 benign）。
这些路径被连接在一起，生成最终的图片路径：

data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))

另外，在这一行有一个小的语法错误：

imageList = cv2.resize(image(120,120))

应该是

cv2.resize(image, (120, 120))

还有，你在添加到 training_List 的方式可能也不对。你需要先把 imageList 转换成列表再添加，或者如果你想保持图片数组的结构，可以直接添加 imageList。

完整的更新代码：

# Data processing modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
# File directory modules
import glob as gb
import os
# Training and testing (machine learning) modules
import tensorflow as tf
import keras

# Directories
trainDataset = 'melanoma_cancer_dataset/train/'
testDataset = 'melanoma_cancer_dataset/test/'
predictionDataset = 'melanoma_cancer_dataset/skinTest/'

# Empty list for the images
training_List = []
testing_list = []

# Classification dictionary
diction = {'benign': 0, 'malignant': 1}

# Read through the folder's contents
for folder in os.listdir(trainDataset):
    # Corrected the path pattern and added a slash
    data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
    print(f'{len(data)} in folder {folder}')
    # Read the images, resize them, and store them in the list
    for file_path in data:
        image = cv2.imread(file_path)
        # Corrected the resize function call
        imageList = cv2.resize(image, (120, 120))
        # Append the image array directly
        training_List.append(imageList)

print(f'Total images in training set: {len(training_List)}')

回答于 2025-04-13 由 Python大师

分享举报

源代码（机器学习）（Python）输出不同

1 个回答

撰写回答