在python中为散乱的d选择特定列

2024-05-14 09:40:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个空间隔离的文件,我必须从中获取特定列的数据。我的文件如下所示:

chr1.trna124 (75052562-75052633)        Length: 72 bp
Type: His       Anticodon: ATG at 33-35 (75052594-75052596)     Score: 35.2
HMM Sc=29.40    Sec struct Sc=5.80
     *    |    *    |    *    |    *    |    *    |    *    |    *    |
Seq: TGGGGTATAGCTCCATGGTAGAGCGCATGCCTATGAAGCGTGAGGtCCTGGGTTTGATCCCCAGAACCACAA
Str: >>>>>>>..>>>>.......<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<.

chr1.trna131 (78297795-78297866)        Length: 72 bp
Type: Pro       Anticodon: AGG at 33-35 (78297827-78297829)     Score: 39.1
HMM Sc=24.30    Sec struct Sc=14.80
     *    |    *    |    *    |    *    |    *    |    *    |    *    |
Seq: GGCTTGTTGGTCTAGGGGTATGATTCTCACTTAGGGTGTGAGAGGtCCTGGGTTCAAATCTTGGACGAGTCC
Str: >>>>>>>..>>>>.......<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<.

从上面我想提取ID ie“chr1.trna124”列和Anticodon的第二行:ATG在33-35只有33-35直到文件结束。 最好的办法是什么? 我正在尝试将模式匹配“chr”的行合并到下一个“chr”,然后获取列。我试过通过How to grab the lines AFTER a matched line in python但是我甚至都做不到。有没有更好的办法? 在python2x和3X中有不同的方法吗


Tags: 文件typeseclengthseqstructatscore
1条回答
网友
1楼 · 发布于 2024-05-14 09:40:33
You can use re.findall(r"(?ms) see below"):
 (1) "^[\w.]+\s\((\d+-\d+)\)" matches the ID, from start of a line;
 (3) "(Anticodon:.+?)$" matches from "Anticodon" until the line end,
 '^' and '$' match not only the start/end of the string but each line start/end, too, according the 'm' in (?ms);
 (2) ".+?" matches anything from the end of the ID to the "Anticodon", and . matches new line, too, according to 's' in "(?ms)".
 You can assemble the expression:-)

相关问题 更多 >

    热门问题