在python3中从单列数据派生多列

2024-06-09 13:25:12 发布

您现在位置:Python中文网/ 问答频道 /正文

源数据:

20  7369    CLERK
30  7499    SALESMAN
30  7521    SALESMAN
20  7566    MANAGER
30  7654    SALESMAN
30  7698    MANAGER
10  7782    MANAGER
20  7788    ANALYST
10  7839    PRESIDENT
30  7844    SALESMAN
20  7876    CLERK
30  7900    CLERK
20  7902    ANALYST

要求: 012345678901234567890123456789

大家好,

我正在将这个.dat文件数据读入python。 一行数据从左到右的长度为30(012345678901234567890123456789) 我的要求是, 我需要导出3列

From left to right: 1 to 4 (length 4) spaces as DEPTNO 
From left to right: 5 to 13 (length 9) spaces as EMPNO 
From left to right: 14 to 30 (length 9) spaces as EMPNO 

我试过这个代码:

import pandas as pd    
with open('Emp.dat','r') as f:
    next(f) # skip first row
    df = pd.DataFrame(l.rstrip().split() for l in f)

所需输出:

DEPTNO  EMPNO   JOB
20      7369    CLERK
30      7499    SALESMAN
30      7521    SALESMAN
20      7566    MANAGER
30      7654    SALESMAN
30      7698    MANAGER
10      7782    MANAGER
20      7788    ANALYST
10      7839    PRESIDENT
30      7844    SALESMAN
20      7876    CLERK
30      7900    CLERK
20      7902    ANALYST

Tags: to数据fromrightasmanagerleftlength
2条回答

可以使用columns参数:

import pandas as pd    
with open('Emp.dat','r') as f:
    next(f) # skip first row
    df = pd.DataFrame((l.rstrip().split() for l in f), columns=['DEPTNO', 'EMPNO', 'JOB'])

输出:

   DEPTNO EMPNO        JOB
0      20  7369      CLERK
1      30  7499   SALESMAN
2      30  7521   SALESMAN
3      20  7566    MANAGER
4      30  7654   SALESMAN
5      30  7698    MANAGER
6      10  7782    MANAGER
7      20  7788    ANALYST
8      10  7839  PRESIDENT
9      30  7844   SALESMAN
10     20  7876      CLERK
11     30  7900      CLERK
12     20  7902    ANALYST

这里有两种方法。你知道吗

  1. 使用df = pd.read_csv('emp.dat', sep=r'\s+)分割任意数量的空白字符上的每一行(更多详细信息请参见How to make separator in pandas read_csv more flexible wrt whitespace?

  2. 使用固定宽度字段df = pd.read_fwf(io.StringIO(t), width=[4,9,9])

在这两种情况下,第一行都将用作标题行。使用pd.read...(..., header=None, skiprows=[0])完全忽略它

相关问题 更多 >