在python中从.dat文件读取和执行计算

2024-04-19 12:42:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要用python读取一个.dat文件,它总共有12列,数百万行。我需要把第2、3和4栏和第1栏分开计算。所以在加载.dat文件之前,是否需要删除所有其他不需要的列?如果没有,如何有选择地声明列并要求python进行计算?

.dat文件的一个例子是 data.dat

我对python还不熟悉,所以有点关于打开、阅读和计算的指导会很感激。

我已经根据您的建议添加了我作为初学者使用的代码:

from sys import argv

import pandas as pd



script, filename = argv

txt = open(filename)

print "Here's your file %r:" % filename
print txt.read()

def your_func(row):
    return row['x-momentum'] / row['mass']

columns_to_keep = ['mass', 'x-momentum']
dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

还有我犯的错误:

Traceback (most recent call last):
  File "flash.py", line 18, in <module>
    dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 529, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 295, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 612, in __init__
    self._make_engine(self.engine)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1119, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030)
ValueError: No columns to parse from file

Tags: inpyioselfparserpandashomeread
3条回答

考虑使用一般的^{}函数(其中read_csv()是一种特殊类型),pandas可以轻松地导入指定空格分隔符sep='\s+'的特定.dat文件。此外,逐列计算不需要定义带apply()的函数。

下面的numpy用于条件除以零。此外,示例.dat文件的第一列是#time,第2、3、4列是x-momentumy-momentummass(代码中的表达式不同,但需要修改)。

import pandas as pd
import numpy as np

columns_to_keep = ['#time', 'x-momentum', 'y-momentum', 'mass']
df = pd.read_table("flash.dat", sep="\s+", usecols=columns_to_keep)

df['mass_per_time'] = np.where(df['#time'] > 0, df['mass']/df['#time'], np.nan)
df['x-momentum_per_time'] = np.where(df['#time'] > 0, df['x-momentum']/df['#time'], np.nan)
df['y-momentum_per_time'] = np.where(df['#time'] > 0, df['y-momentum']/df['#time'], np.nan)

在查看了您的flash.dat文件之后,很明显您需要在处理它之前进行一些清理。以下代码将其转换为CSV文件:

import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]

# write it as a new CSV file
with open("./flash.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)

现在,使用Pandas计算新列。

import pandas as pd

def your_func(row):
    return row['x-momentum'] / row['mass']

columns_to_keep = ['#time', 'x-momentum', 'mass']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

print dataframe

尝试以下方法:

datContent = [i.strip().split() for i in open("filename.dat").readlines()]

然后你会把你的数据列在一个列表里。

如果你想拥有更复杂的东西,可以使用Pandas,请参阅链接的食谱。

相关问题 更多 >