包含数据帧的Pandas导致数据帧不明确

2024-05-21 05:02:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我的目标是在每次迭代中将多个数据帧连接到单个数据帧中。我正在抓取一个表并用它创建数据帧。下面是注释代码

def visit_table_links():
    links = grab_initial_links()

    df_final = None
    for obi in links:

        resp = requests.get(obi[1])
        tree = html.fromstring(resp.content)

        dflist = []

        for attr in tree.xpath('//th[contains(normalize-space(text()),  "sometext")]/ancestor::table/tbody/tr'):
            population = attr.xpath('normalize-space(string(.//td[2]))')
            try:
                population = population.replace(',', '')
                population = int(population)
                year = attr.xpath('normalize-space(string(.//td[1]))')
                year = re.findall(r'\d+', year)
                year = ''.join(year)
                year = int(year)


                #appending a to a list, 3 values first two integer last is string
                dflist.append([year, population, obi[0]])

            except Exception as e:
                pass

        #creating a dataframe which works fine

        df = pd.DataFrame(dflist, columns = ['Year', 'Population', 'Municipality'])

        #first time df_final is none so just make first df = df_final
        #next time df_final is previous dataframe so concat with the new one

        if df_final != None:
            df_final = pd.concat(df_final, df)
        else:

            df_final = df


visit_table_links()

下面是即将到来的数据帧

第一个数据帧

   Year  Population Municipality
0  1970       10193   Cape Coral
1  1980       32103   Cape Coral
2  1990       74991   Cape Coral
3  2000      102286   Cape Coral
4  2010      154305   Cape Coral
5  2018      189343   Cape Coral

第二个数据帧

    Year  Population Municipality
0   1900         383   Clearwater
1   1910        1171   Clearwater
2   1920        2427   Clearwater
3   1930        7607   Clearwater
4   1940       10136   Clearwater
5   1950       15581   Clearwater
6   1960       34653   Clearwater
7   1970       52074   Clearwater
8   1980       85170   Clearwater
9   1990       98669   Clearwater
10  2000      108787   Clearwater
11  2010      107685   Clearwater
12  2018      116478   Clearwater

尝试对其进行加密会导致此错误

ValueError                                Traceback (most recent call last)
<ipython-input-93-429ad4d9bce8> in <module>
     75 
     76 
---> 77 visit_table_links()
     78 
     79 

<ipython-input-93-429ad4d9bce8> in visit_table_links()
     62         print(df)
     63 
---> 64         if df_final != None:
     65             df_final = pd.concat(df_final, df)
     66         else:

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __nonzero__(self)
   1476         raise ValueError("The truth value of a {0} is ambiguous. "
   1477                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478                          .format(self.__class__.__name__))
   1479 
   1480     __bool__ = __nonzero__

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我搜索了很多线索,用尽了我的资源,我对熊猫还不熟悉,不明白为什么会发生这种情况

首先我认为这是因为重复的索引,然后我将uuid.uuid4.int()作为索引 使用df.set_index('ID', drop=True, inplace=True)仍然会出现相同的错误

任何指导都会很有帮助,谢谢

编辑:1

抱歉说不清楚 该错误是由

df_final = pd.concat(df_final, df)

当我尝试将当前数据帧与前一个数据帧连接时

编辑2:

将参数作为列表传递

df_final = pd.concat([df_final, df])

还是同样的错误


Tags: 数据indfistablelinksvisityear
2条回答

尝试使用len(df_final) == 0而不是df_final != None

另外,在pd.concat命令中,尝试将参数作为列表传递,即df_final = pd.concat([df_final, df])

来自萨扬的建议len(df_final) == 0

我有一个想法,如果我最初将df_最终值设置为None,或者设置一个具有相同列的空数据帧,会有什么不同

结果是肯定的

这是新代码

def visit_table_links():
    links = grab_initial_links()

    df_final = pd.DataFrame(columns=['Year', 'Population', 'Municipality'])
    for obi in links:
        resp = requests.get(obi[1])
        tree = html.fromstring(resp.content)

        dflist = []

        for attr in tree.xpath('//th[contains(normalize-space(text()),  "sometext")]/ancestor::table/tbody/tr'):
            population = attr.xpath('normalize-space(string(.//td[2]))')
            try:
                population = population.replace(',', '')
                population = int(population)
                year = attr.xpath('normalize-space(string(.//td[1]))')
                year = re.findall(r'\d+', year)
                year = ''.join(year)
                year = int(year)

                dflist.append([year, population, obi[0]])

            except Exception as e:
                pass

        df = pd.DataFrame(dflist, columns = ['Year', 'Population', 'Municipality'])

        df_final = pd.concat([df_final, df])

visit_table_links()

由于某些原因,设置df_final = None会导致熊猫抛出该错误 即使在第一次迭代中,当df_final为无时,我分配df_final = df

因此,在下一次迭代中,最初的df_final是什么并不重要

出于某种原因,这确实很重要

所以这一行{}代替了这一行{}解决了这个问题

这是合并的数据帧

    Year Population   Municipality
0   1970      10193     Cape Coral
1   1980      32103     Cape Coral
2   1990      74991     Cape Coral
3   2000     102286     Cape Coral
4   2010     154305     Cape Coral
5   2018     189343     Cape Coral
0   1900        383     Clearwater
1   1910       1171     Clearwater
2   1920       2427     Clearwater
3   1930       7607     Clearwater
4   1940      10136     Clearwater
5   1950      15581     Clearwater
6   1960      34653     Clearwater
7   1970      52074     Clearwater
8   1980      85170     Clearwater
9   1990      98669     Clearwater
10  2000     108787     Clearwater
11  2010     107685     Clearwater
12  2018     116478     Clearwater
0   1970       1489  Coral Springs
1   1980      37349  Coral Springs
2   1990      79443  Coral Springs
3   2000     117549  Coral Springs
4   2010     121096  Coral Springs
5   2018     133507  Coral Springs

相关问题 更多 >