以lis格式从文件中读取值

2024-04-25 14:59:04 发布

您现在位置:Python中文网/ 问答频道 /正文

这可能已经得到了回答,如果是这样,请指示我的解决方案页面与链接。你知道吗

我拥有的是一个包含100 largest countries by total area (land and water surface)详细信息的文件:

('1','Russia','17,098,242(6,601,668)','Asia/Europe','Azerbaijan, Belarus, China, Estonia, Finland, Georgia, Kazakhstan, Latvia, Lithuania, Mongolia, North Korea, Norway, Poland, Ukraine')
('2','Canada','9,984,670(3,855,100)','North America','United States')
('3','United States(incl. overseas territories)','9,857,348(3,805,943)','North America','Canada, Mexico')
('4','China','9,596,961(3,705,407)','Asia','Afghanistan, Bhutan, India, Kazakhstan, Kyrgyzstan, Laos, Mongolia, Myanmar, Nepal, North Korea, Pakistan, Russia, Tajikistan, Vietnam')
('5','Brazil','8,515,770(3,287,957)','South America','Argentina, Bolivia, Colombia, France (French Guiana), Guyana, Paraguay, Peru, Suriname, Uruguay, Venezuela'), 
....
....

是的,输入文件在行首和行尾都有(&;)。你知道吗

任何帮助都将不胜感激。你知道吗

到目前为止,我试图通过写下:

onlyCountries = 'allcountries.txt'
print([x.split(',')[1] for x in open(onlyCountries)])

但这给我的输出是:

["'Russia'", "'Canada'", "'United States(incl. overseas territories)'", "'China'", "'Brazil'"...]

注意,我从上面给出的输入文件示例中得到的额外双引号?我想得到如下输出:

['Russia','Canada','United States','China','Brazil',....]

Tags: 文件unitedbrazilstateschinainclnorthkorea
2条回答

你可以用熊猫来做这个:

import pandas as pd

df = pd.read_html("https://www.countries-ofthe-world.com/largest-countries.html" ,header=0, index_col=0)[0]
clist = df.Country.str.replace(r"\(.*", "").tolist()
print clist

输出:

[u'Russia', u'Canada', u'United States ', u'China', u'Brazil', u'Australia ', u'India', u'Argentina', u'Kazakhstan', u'Algeria', u'Democratic Republic of the Congo', u'Denmark ', u'Saudi Arabia', u'Mexico', u'Indonesia', u'Sudan', u'Libya', u'Iran', u'Mongolia', u'Peru', u'Chad', u'Niger', u'Angola', u'Mali', u'South Africa', u'Colombia', u'Ethiopia', u'Bolivia', u'Mauritania', u'Egypt', u'Tanzania', u'Nigeria', u'Venezuela', u'Namibia', u'Mozambique', u'Pakistan', u'Turkey', u'Chile', u'Zambia', u'Myanmar', u'Afghanistan', u'France ', u'Somalia', u'Central African Republic', u'South Sudan', u'Ukraine', u'Madagascar', u'Botswana', u'Kenya', u'Yemen', u'Thailand', u'Spain', u'Turkmenistan', u'Cameroon', u'Papua New Guinea', u'Sweden', u'Uzbekistan', u'Morocco', u'Iraq', u'Paraguay', u'Zimbabwe', u'Japan', u'Germany', u'Republic of the Congo', u'Finland ', u'Vietnam', u'Malaysia', u'Norway ', u"Cote d'Ivoire", u'Poland', u'Oman', u'Italy', u'Philippines', u'Ecuador', u'Burkina Faso', u'New Zealand ', u'Gabon', u'United Kingdom ', u'Guinea', u'Uganda', u'Ghana', u'Romania', u'Laos', u'Guyana', u'Belarus', u'Kyrgyzstan', u'Senegal', u'Syria', u'Cambodia', u'Uruguay', u'Suriname', u'Tunisia', u'Nepal', u'Bangladesh', u'Tajikistan', u'Greece', u'Nicaragua', u'North Korea', u'Malawi', u'Eritrea']
countries = []
with open('text.txt', 'r') as f:
    for line in f.readlines():
        country = line.split(',')[1]
        countries.append(country)
print(countries)

相关问题 更多 >