是否仍然可以将特定的文本数据转换为csv格式，并用python给出标题名称？

PMID- 20301691 STAT- Publisher DA - 20100320 DRDT- 20210311 CTDT- 20000204 PB - University of Washington, Seattle DP - 1993 TI - Classic Galactosemia and Clinical Variant Galactosemia BTI - GeneReviews((R)) AB - CLINICAL CHARACTERISTICS: The term "galactosemia" refers to disorders of galactose metabolism that include classic galactosemia, clinical variant galactosemia, and biochemical variant galactosemia (not covered in this chapter). This GeneReview focuses on: Classic galactosemia, which can result in life-threatening complications including feeding problems, failure to thrive, hepatocellular damage, bleeding, and E coli sepsis in untreated infants. If a lactose-restricted diet is provided during the first ten days of life, the neonatal signs usually quickly resolve and the complications of liver failure, sepsis, and neonatal death are prevented; however, despite adequate treatment from an early age, children with classic galactosemia remain at increased risk for developmental delays, speech problems (termed childhood apraxia of speech and dysarthria), and abnormalities of motor function. Almost all females with classic galactosemia manifest hypergonadatropic hypogonadism or premature ovarian insufficiency (POI). Clinical variant galactosemia, which can result in life-threatening complications including feeding problems, failure to thrive, hepatocellular damage including cirrhosis, and bleeding in untreated infants. This is exemplified by the disease that occurs in African Americans and native Africans in South Africa. Persons with clinical variant galactosemia may be missed with newborn screening as the hypergalactosemia is not as marked as in classic galactosemia and breath testing is normal. If a lactose-restricted diet is provided during the first ten days of life, the severe acute neonatal complications are usually prevented. African Americans with clinical variant galactosemia and adequate early treatment do not appear to be at risk for long-term complications, including POI. DIAGNOSIS/TESTING: The diagnosis of classic galactosemia and clinical variant galactosemia is established by detection of elevated erythrocyte galactose-1-phosphate concentration, reduced erythrocyte galactose-1-phosphate uridylyltranserase (GALT) enzyme activity, and/or biallelic pathogenic variants in GALT. In classic galactosemia, erythrocyte galactose-1-phosphate is usually >10 mg/dL and erythrocyte GALT enzyme activity is absent or barely detectable. In clinical variant galactosemia, erythrocyte GALT enzyme activity is close to or above 1% of control values but probably never >10%-15%. However, in African Americans with clinical variant galactosemia, the erythrocyte GALT enzyme activity may be absent or barely detectable but is often much higher in liver and in intestinal tissue (e.g., 10% of control values). Virtually 100% of infants with classic galactosemia or clinical variant galactosemia can be detected in newborn screening programs that include testing for galactosemia in their panel. However, infants with clinical variant galactosemia may be missed if the program only measures blood total galactose level and not erythrocyte GALT enzyme activity. MANAGEMENT: Treatment of manifestations: Standard of care in any newborn who is "screen-positive" for galactosemia is immediate dietary intervention while diagnostic testing is under way. Once a diagnosis is confirmed, restriction of galactose intake is continued and all milk products are replaced with lactose-free formulas (e.g., Isomil((R)) or Prosobee((R))) containing non-galactose carbohydrates; dietary restrictions on all lactose-containing foods and other dairy products should continue throughout life, although management of the diet becomes less important after infancy and early childhood. In rare instances, cataract surgery may be needed in the first year of life. Childhood apraxia of speech and dysarthria require expert speech therapy. Developmental assessment at age one year by a psychologist and/or developmental pediatrician is recommended in order to formulate a treatment plan with the speech therapist and treating physician. For school-age children, an individual education plan and/or professional help with learning skills and special classrooms as needed. Hormone replacement therapy as needed for delayed pubertal development and/or primary or secondary amenorrhea. Stimulation with follicle-stimulating hormone may be useful in producing ovulation in some women. Prevention of secondary complications: Recommended calcium, vitamin D, and vitamin K intake to help prevent decreased bone mineralization; standard treatment for gastrointestinal dysfunction. Surveillance: Biochemical genetics clinic visits every three months for the first year of life or as needed depending on the nature of the potential acute complications; every six months during the second year of life; yearly thereafter. Routine monitoring for: the accumulation of toxic analytes (e.g., erythrocyte galactose-1-phosphate and urinary galactitol); cataracts; speech and development; movement disorder; POI; nutritional deficiency; and osteoporosis. Agents/circumstances to avoid: Breast milk, proprietary infant formulas containing lactose, cow's milk, dairy products, and casein or whey-containing foods; medications with lactose and galactose. Evaluation of relatives at risk: To allow for earliest possible diagnosis and treatment of at-risk sibs: Perform prenatal diagnosis when the GALT pathogenic variants in the family are known; or If prenatal testing has not been performed, test the newborn for either the family-specific GALT pathogenic variants or erythrocyte GALT enzyme activity. Pregnancy management: Women with classic galactosemia should maintain a lactose-restricted diet during pregnancy. GENETIC COUNSELING: Classic galactosemia and clinical variant galactosemia are inherited in an autosomal recessive manner. Couples who have had one affected child have a 25% chance of having an affected child in each subsequent pregnancy. Molecular genetic carrier testing for at-risk sibs and prenatal testing for pregnancies at increased risk are an option if the GALT pathogenic variants in the family are known. If the GALT pathogenic variants in a family are not known, prenatal testing can rely on assay of GALT enzyme activity in cultured amniotic fluid cells. CI - Copyright (c) 1993-2021, University of Washington, Seattle. GeneReviews is a registered trademark of the University of Washington, Seattle. All rights reserved. FED - Adam, Margaret P ED - Adam MP FED - Ardinger, Holly H ED - Ardinger HH FED - Pagon, Roberta A

2条回答

网友

1楼 · 编辑于 2024-06-08 23:30:41

也许

给定这样的文件： 包含以下文本：

PMID- 20301691 
STAT- Publisher
DA  - 20100320
DRDT- 20210311
CTDT- 20000204
PB  - University of Washington, Seattle
DP  - 1993
TI  - Classic Galactosemia and Clinical Variant Galactosemia
BTI - GeneReviews((R))


PMID- 33237688
STAT- Publisher
DA  - 20201126
CTDT- 20201125
PB  - University of Washington, Seattle
DP  - 1993
TI  - MIRAGE Syndrome
BTI - GeneReviews((R))

试试看：

import pandas as pd

df = pd.read_csv('text.csv', sep='-', header=None)

# clean up
df[0] = df[0].str.strip()
df[1] = df[1].str.strip()

# create a dictionary
data = df.groupby(0)[1].apply(list).to_dict()

# create a dataframe and make sure the arrays are equal length
# borrowed from https://stackoverflow.com/a/19736406/9192284
df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in data.items() ]))

print(df)

输出：

                BTI      CTDT        DA    DP      DRDT  \
0  GeneReviews((R))  20000204  20100320  1993  20210311   
1  GeneReviews((R))  20201125  20201126  1993       NaN   

                                  PB      PMID       STAT  \
0  University of Washington, Seattle  20301691  Publisher   
1  University of Washington, Seattle  33237688  Publisher   

                                                  TI  
0  Classic Galactosemia and Clinical Variant Gala...  
1                                    MIRAGE Syndrome

网友

2楼 · 编辑于 2024-06-08 23:30:41

我知道的最简单的方法是：

使用以下命令读取数据文件：

with open("sepsis2015.txt") as file:
    lines = file.readlines()
lines = ''.join(lines).split('\n\n')

这将为您提供一份记录列表：

['PMID- 20301691 \nSTAT- Publisher\nDA  - 20100320\nDRDT- 20210311\nCTDT- 20000204\nPB  - University of Washington, Seattle\nDP  - 1993\nTI  - Classic Galactosemia and Clinical Variant Galactosemia\nBTI - GeneReviews((R))', '\nPMID- 33237688\nSTAT- Publisher\nDA  - 20201126\nCTDT- 20201125\nPB  - University of Washington, Seattle\nDP  - 1993\nTI  - MIRAGE Syndrome\nBTI - GeneReviews((R))']

将存储在lines列表中的数据转换为data字典：

data = {i: {item.split('-')[0].replace(' ', ''): item.split('-')[1][1:] for item in row.split('\n') if '-' in item} for i, row in enumerate(lines)}

所以你有：

{0: {'PMID': '20301691', 'STAT': 'Publisher', 'DA': '20100320', 'DRDT': '20210311', 'CTDT': '20000204', 'PB': 'University of Washington, Seattle', 'DP': '1993', 'TI': 'Classic Galactosemia and Clinical Variant Galactosemia', 'BTI': 'GeneReviews((R))'}, 1: {'PMID': '33237688', 'STAT': 'Publisher', 'DA': '20201126', 'CTDT': '20201125', 'PB': 'University of Washington, Seattle', 'DP': '1993', 'TI': 'MIRAGE Syndrome', 'BTI': 'GeneReviews((R))'}}

最后，使用以下命令将此词典转换为pandas.DataFrame：
```
df = pd.DataFrame.from_dict(data, orient = 'index')
```

完整代码

import pandas as pd

with open(r'data/data.csv') as file:
    lines = file.readlines()
lines = ''.join(lines).split('\n\n')

data = {i: {item.split('-')[0].replace(' ', ''): item.split('-')[1][1:] for item in row.split('\n') if '-' in item} for i, row in enumerate(lines)}
print(data)
df = pd.DataFrame.from_dict(data, orient = 'index')

       PMID       STAT        DA      DRDT      CTDT                                 PB    DP                                                      TI               BTI
0  20301691  Publisher  20100320  20210311  20000204  University of Washington, Seattle  1993  Classic Galactosemia and Clinical Variant Galactosemia  GeneReviews((R))
1  33237688  Publisher  20201126       NaN  20201125  University of Washington, Seattle  1993                                         MIRAGE Syndrome  GeneReviews((R))

完整代码

相关问题更多 >

编程相关推荐

热门问题

热门文章