把输入文件分开是错误的

#prompt the user for the file name of keywords file keywordsinputfile = input("Please input file name: ") tweetsinputfile = input ("Please input tweets file name: ") #try to open given input file try: k=open(keywordsinputfile, "r") except IOError: print ("{} file not found".format(keywordsinputfile)) try: t=open(tweetsinputfile, "r") except IOError: print ("{} file not found".format(tweetsinputfile)) exit() def main (): #main function kinputfile = open(keywordsinputfile, "r") #Opens File for keywords tinputfile = open(tweetsinputfile, "r") #Opens file for tweets HappyWords = {} HappyValues = {} for line in kinputfile: #splits keywords entries = line.split(",") hvwords = str(entries[0]) hvalues = int(entries[1]) HappyWords["keywords"] = hvwords #stores Happiness keywords HappyValues["values"] = hvalues #stores Happiness Values for line in tinputfile: twoparts = line.split("]") #splits tweet file by ] creating a location and tweet parts, tweets are ignored for now startlocation = (twoparts[0]) #takes the first part (the locations) def testing(startlocation): for line in startlocation: intlocation = line.split("[") #then gets rid of the "[" at the beginning of the locations print (intlocation) testing(startlocation) main()

['', ''] ['2'] ['7'] ['.'] ['9'] ['9'] ['4'] ['1'] ['9'] ['5'] ['6'] ['9'] ['9'] ['9'] ['9'] ['9'] ['9'] ['9'] ['9'] [','] [' '] ['-'] ['8'] ['2'] ['.'] ['5'] ['6'] ['9'] ['4'] ['3'] ['4'] ['9'] ['0'] ['0'] ['0'] ['0'] ['0'] ['0'] ['0'] ['5']

2条回答

网友

1楼 · 编辑于 2024-05-26 19:54:13

您只需要插入一些跟踪打印语句来显示发生了什么。我是这样做的：

for line in tinputfile:
    twoparts = line.split("]")  #splits tweet file by ] creating a location and tweet parts, tweets are ignored for now
    startlocation = (twoparts[0])   #takes the first part (the locations)
    print ("     -")
    print ("twoparts", twoparts) 
    print ("startlocation", startlocation)
def testing(startlocation):
    for line in startlocation:     
        print ("line", line)
        intlocation = line.split("[")      #then gets rid of the "[" at the beginning of the locations
        print ("intlocation", intlocation)
testing(startlocation)

。。。找到了一条线索，开头是：

     -
twoparts ['[41.298669629999999, -81.915329330000006', " 6 2011-08-28 19:02:36 Work needs to fly by ... I'm so excited to see Spy Kids 4 with then love of my life ... ARREIC\n"]
startlocation [41.298669629999999, -81.915329330000006
     -
twoparts ['[33.702900329999999, -117.95095704000001', " 6 2011-08-28 19:03:13 Today is going to be the greatest day of my life. Hired to take pictures at my best friend's gparents 50th anniversary. 60 old people. Woo.\n"]
startlocation [33.702900329999999, -117.95095704000001
     -
twoparts ['[38.809954939999997, -77.125144050000003', ' 6 2011-08-28 19:07:05 I just put my life in like 5 suitcases\n']
startlocation [38.809954939999997, -77.125144050000003
     -
twoparts ['[27.994195699999999, -82.569434900000005', ' 6 2011-08-28 19:08:02 @Miss_mariiix3 is the love of my life\n']
startlocation [27.994195699999999, -82.569434900000005
line [
intlocation ['', '']
line 2
intlocation ['2']
line 7

分析：

有两个基本问题：

处理语句testing（startlocation）位于循环之外，因此它只使用最后一个输入行。你知道吗
正如您在“twoparts”的输出中所看到的，您所需的坐标仍然是string格式，而不是浮点列表。你需要把支架剥下来，把它们分开。然后将它们转换为float。在当前表单中，当您遍历intlocation时，您遍历的是字符串的字符，而不是两个float。你知道吗

另外：为什么要在循环中定义函数？这将在每次执行时重新定义函数。将它移到主程序之前；这是表现良好的函数的所在。：-）

添加了关于第2点的信息：

让我们使用示例输入的最后一行，逐步浏览您的代码。从tinputfile中的行的循环顶部开始

twoparts = line.split("]")

两部分现在是一对元素，两个字符串：

['[27.994195699999999, -82.569434900000005',
 ' 6 2011-08-28 19:08:02 @Miss_mariiix3 is the love of my life\n']

然后将startlocation设置为第一个元素：

'[27.994195699999999, -82.569434900000005'

然后是对函数测试的冗余重新定义，它不会产生任何变化。下一个语句调用测试；我们进入例程。你知道吗

testing(startlocation)
for line in startlocation:

这里重要的一点是，shortocation是一个字符串：

'[27.994195699999999, -82.569434900000005'

。。。所以当你执行这个循环时，你迭代字符串，一次一个字符。你知道吗

更正：

老实说，我不知道测试应该做什么。看起来你所需要做的就是剥掉那个支架：

intlocation = startlocation.split('[')

。。。或者只是

intlocation = startlocation[1:]

相反，如果希望将float值作为两个元素的列表，（a）去掉上面的括号，在逗号处拆分元素，然后转换为float：

intlocation = [ float(x) for x in startlocation[1:].split(',') ]

网友
2楼 · 编辑于 2024-05-26 19:54:13

看起来，它真正需要的是ast.literal_eval。你知道吗
for line in tinputfile: twoparts = line.split("]") startlocation = ast.literal_eval(twoparts[0] + ']') # add the ']' back in # startlocation is now a list of two coordinates.
但是你最好还是用re。你知道吗
> import re > example = '[27.994195699999999, -82.569434900000005] 6 2011-08-28 19:02:36 text text text text' > fmt = re.split(r'\[(-?[0-9.]+),\s?(-?[0-9.]+).\s*\d\s*(\d{4}-\d{1,2}-\d{1,2}\s+\d{2}:\d{2}:\d{2})',example) > fmt ['', '27.994195699999999', '-82.569434900000005', '2011-08-28 19:02:36', ' text text text text'] > location = (float(fmt[1]), float(fmt[2])) > time = fmt[3] > text = fmt[4]
怎么回事？你知道吗
正则表达式（re模块）中的每一个(...)都告诉re.split“将此片段作为自己的索引”。你知道吗
第一个和第二个是-?[0-9.]。这意味着匹配任何可能有一个负号后接数字和小数位（我们可以更严格，但你真的不需要）。你知道吗
下一组()匹配任何日期：\d{4}表示“四位数”。\d{1,2}表示“一个或两个数字”。你知道吗
或者，您可以同时使用这两者：
> fmt = re.split(r'\[(-?[0-9.]+,\s?-?[0-9.]+).\s*\d\s*(\d{4}-\d{1,2}-\d{1,2}\s+\d{2}:\d{2}:\d{2})',example) > fmt # watch what happens when I change the grouping. ['', '27.994195699999999, -82.569434900000005', '2011-08-28 19:02:36', ' text text text text'] > location = literal_eval('(' + fmt[1] + ')') > time = fmt[2] > text = fmt[3]

相关问题更多 >

编程相关推荐

热门问题

热门文章