Jupyter笔记本,使用数据进行机器学习

2024-06-16 10:43:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我是一个很新的工作与jupyter笔记本。总的来说,我喜欢它,尽管有时我会遇到一些奇怪的错误,有时会出现,有时不会。例如,我有一个如下所示的数据集(显示.head()):

enter image description here

现在,如果我设置say volume=data[“avg\u volume”],然后卷头()我明白了:

enter image description here

但是假设我删除了那一行,把它放在别的地方,我有时会得到这样的错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-9c1c4c11ebf0> in <module>()
----> 1 volume = data["avg_volume"]
      2 volume.head()

TypeError: 'float' object is not subscriptable

我注意到在这一行之后:

pnl = data["MTM_pnl"]
for data in pnl:
    if(data > 0):
        profit = np.sum(data)
print(profit)

会引起问题。我只是不明白为什么会这样,这对我来说毫无意义,让我相信使用jupyter笔记本是垃圾。 代码如下:

# coding: utf-8

# In[1]:


# import modules
import numpy as np
import pandas as pd
import torch
import matplotlib.pyplot as plt
import tensorflow as tf


# In[2]:


# import dataset
data = pd.read_csv('output.csv')
data.head()


# In[3]:


# Goal with data set: The goal is to maximize the PNL column, secondary goals are to minimize MAE (Maximum Adverse Excursion)
# and maximize MFE (Maximum Favorable Excursion). Once a predictable model is established the next step is to work on adding
# alpha by optimizing the stop/take profit logic.
# Assumptions: The thesis is that an earning stock (a stock that has published an earnings report in the past 24 hours) 
# that gaps on open, continues in the direction of the gap.


# In[4]:


# Get statistical information
data.describe()


# In[5]:


# See how correlated each variable is to MTM_pnl
data.corr(method='pearson', min_periods=1)


# In[6]:


# create some histograms
data[data.dtypes[(data.dtypes=="float64")|(data.dtypes=="int64")]
                        .index.values].hist(figsize=[11,11])


# In[7]:


# def maximize_profit(data):
#     LIR = data["LIR"]
#     volume = data["avg_volume"]
#     earnings = data["earning_time"]
volume = data["avg_volume"]
volume.head()


# In[8]:


pnl = data["MTM_pnl"]
for data in pnl:
    if(data > 0):
        profit = np.sum(data)
print(profit)


# In[9]:


volume = data["avg_volume"]
volume.head()

数据集可以在here找到。不,github存储库本身并不相关,但这是我第一个要访问数据集的想法。你知道吗


Tags: theto数据inimportdataisas
1条回答
网友
1楼 · 发布于 2024-06-16 10:43:15

在您的代码for data in pnl中,您重新定义了变量data,因此它不再是DataFrame,并且不能按列名称索引。你知道吗

顺便说一句,当您试图生成一个最小的、完整的、可验证的示例时,许多类似这样的错误都会被发现。您会注意到,当您删除for循环时,这个bug就消失了。你知道吗

相关问题 更多 >