对于以下二进制文件(可从here下载):
*NEWRECORD
RECTYPE = D
MH = Calcimycin
AQ = AA AD AE AG AI AN BI BL CF CH CL CS CT EC HI IM IP ME PD PK PO RE SD ST TO TU UR
ENTRY = A-23187|T109|T195|LAB|NRW|NLM (1991)|900308|abbcdef
ENTRY = A23187|T109|T195|LAB|NRW|UNK (19XX)|741111|abbcdef
ENTRY = Antibiotic A23187|T109|T195|NON|NRW|NLM (1991)|900308|abbcdef
ENTRY = A 23187
ENTRY = A23187, Antibiotic
MN = D03.633.100.221.173
PA = Anti-Bacterial Agents
PA = Calcium Ionophores
MH_TH = FDA SRS (2014)
MH_TH = NLM (1975)
ST = T109
ST = T195
N1 = 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S-(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))-
RN = 37H9VM9WZL
RR = 52665-69-7 (Calcimycin)
PI = Antibiotics (1973-1974)
PI = Carboxylic Acids (1973-1974)
MS = An ionophorous, polyether antibiotic from Streptomyces chartreusensis. It binds and transports CALCIUM and other divalent cations across membranes and uncouples oxidative phosphorylation while inhibiting ATPase of rat liver mitochondria. The substance is used mostly as a biochemical tool to study the role of divalent cations in various biological systems.
OL = use CALCIMYCIN to search A 23187 1975-90
PM = 91; was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
HN = 91(75); was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
MR = 20160527
DA = 19741119
DC = 1
DX = 19840101
UI = D000001
*NEWRECORD
RECTYPE = D
MH = Temefos
AQ = AA AD AE AG AI AN BL CF CH CL CS CT EC HI IM IP ME PD PK RE SD ST TO TU UR
ENTRY = Abate|T109|T131|TRD|NRW|NLM (1996)|941114|abbcdef
ENTRY = Difos|T109|T131|TRD|NRW|UNK (19XX)|861007|abbcdef
ENTRY = Temephos|T109|T131|TRD|EQV|NLM (1996)|941201|abbcdef
MN = D02.705.400.625.800
MN = D02.705.539.345.800
MN = D02.886.300.692.800
PA = Insecticides
MH_TH = FDA SRS (2014)
MH_TH = INN (19XX)
MH_TH = USAN (1974)
ST = T109
ST = T131
N1 = Phosphorothioic acid, O,O'-(thiodi-4,1-phenylene) O,O,O',O'-tetramethyl ester
RN = ONP3ME32DL
RR = 3383-96-8 (Temefos)
AN = for use to kill or control insects, use no qualifiers on the insecticide or the insect; appropriate qualifiers may be used when other aspects of the insecticide are discussed such as the effect on a physiologic process or behavioral aspect of the insect; for poisoning, coordinate with ORGANOPHOSPHATE POISONING
PI = Insecticides (1966-1971)
MS = An organothiophosphate insecticide.
PM = 96; was ABATE 1972-95 (see under INSECTICIDES, ORGANOTHIOPHOSPHATE 1972-90)
HN = 96; was ABATE 1972-95 (see under INSECTICIDES, ORGANOTHIOPHOSPHATE 1972-90)
MR = 20130708
DA = 19990101
DC = 1
DX = 19910101
UI = D000002
我有以下Python代码:
import re
terms = {}
numbers = {}
meshFile = 'd2017.bin'
with open(meshFile, mode='rb') as file:
mesh = file.readlines()
outputFile = open('mesh.txt', 'w')
for line in mesh:
meshTerm = re.search(b'MH = (.+)$', line)
if meshTerm:
term = meshTerm.group(1)
meshNumber = re.search(b'MN = (.+)$', line)
if meshNumber:
number = meshNumber.group(1)
numbers[str(number)] = term
if term in terms:
terms[term] = terms[term] + ' ' + str(number)
else:
terms[term] = str(number)
cumlist = []
keylist = terms.keys()
for key in keylist:
#print('THE ORIGIN FOR ', key, file=outputFile)
item_list = terms[key].split(" ")
for phrase in item_list:
cumlist.append(phrase)
print(cumlist)
for item in cumlist:
print(numbers[str(item)], '\n', item, file=outputFile)
输出如下所示:
b'Calcimycin\r'
b'D03.633.100.221.173\r'
b'Temefos\r'
b'D02.705.400.625.800\r'
b'Temefos\r'
b'D02.705.539.345.800\r'
b'Temefos\r'
b'D02.886.300.692.800\r'
如何将输出重新格式化为如下所示:
Calcimycin
D03.633.100.221.173
Temefos
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800
谢谢。你知道吗
你可以试试这个正则表达式:
Demo
示例代码:(Run it here)
样本输出:
相关问题 更多 >
编程相关推荐