我有一个pandas dataframe,它是通过添加一系列列表生成的,主要由字符串组成,这些字符串有一个分隔符(“'\n'
”),如下所示:
content
0 American Regent/Luitpold (Reverified 10/26/2016)\nCompany Contact Information:\n800-645-1706\n\nPresentation Availability and Estimated Shortage Duration Related Information Shortage Reason (per FDASIA)\n2 mL single-dose vial, package of 10 (NDC 00517-2502-10) Available for NDC 00517-2502-10. Demand increase for the drug
1 Amphastar Pharmaceuticals, Inc./IMS (Reverified 08/18/2016)\nCompany Contact Information:\n800-423-4136\n\nPresentation Availability and Estimated Shortage Duration Related Information Shortage Reason (per FDASIA)\nCalcium Chloride Inj. USP, 10%, 10mL Luer-Jet Prefilled Syringe, (NDC 0548-3304-00), new (NDC 76329-3304-1) Product available Demand increase for the drug\nHospira, Inc. (Reverified 10/21/2016)
2 American Regent/Luitpold (Reverified 10/26/2016)\nCompany Contact Information:\n800-645-1706\n\nPresentation Availability and Estimated Shortage Duration Related Information Shortage Reason (per FDASIA)\n10%, 50 mL vial; Calcium (0.465 mEq/mL), Preservative Free (NDC 0517-3950-25) Unavailable for NDC 00517-3950-25. No product available for release. No plan to manufacture. American Regent is currently not releasing Calcium Gluconate 50 mL vial (NDC 00517-3950-25). Other\n10%, 100 mL vial; Calcium (0.465 mEq/mL), Preservative Free (NDC 0517-3900-25) Unavailable for NDC 00517-3900-25. American Regent is currently not releasing Calcium Gluconate 100 mL vial (NDC 0517-3900-25). Other\nFresenius Kabi USA, LLC (Revised 11/01/2016)
.......
n Apotex Corp. (Revised 05/16/2016)\nCompany Contact Information:\n800-706-5575\n\nPresentation\n1gm; (25 Vials) (NDC 60505-0749-5)\n1gm; (25 Vials)(NDC 60505-6093-5)\n10 gm; (10 Vials) (NDC 60505-0769-0)\n10 gm; (10 Vials) (NDC 60505-6094-0)\nNote:\nAvailable\nB. Braun Medical Inc. (Revised 05/16/2016)\n\n\nBaxter Healthcare (Revised 05/16/2016)\n\n\nFresenius Kabi USA, LLC (Revised 05/16/2016)\n\n\nHospira, Inc. (Revised 05/16/2016)\n\n\nSagent Pharmaceuticals (Revised 05/16/2016)\n\n\nSandoz (Revised 05/16/2016)\n\n\nWest-Ward Pharmaceuticals (Revised 05/16/2016)\n\n\nWG Critical Care (Revised 05/16/2016)
n-1 Apotex Corp. (Reverified 10/26/2016)\nCompany Contact Information:\n800-706-5575\n\nPresentation Availability and Estimated Shortage Duration Related Information Shortage Reason (per FDASIA)\nCefepime for Injection, USP 1 gm (10 Vials) (NDC 60505-6030-4) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for Injection, USP 2 gm (10 Vials)(NDC 60505-6031-4) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for injection, USP 1 gm (10 Vials) (NDC 60605-0834-04) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for injection, USP 2 gm (10 Vials) (NDC 60505-0681-4) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for injection, USP 1 gm (1 Vial) (NDC 60505-0834-00) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nCefepime for injection, USP 2 gm (10 Vials) (NDC 60505-0681-0) On backorder. Shortage duration is unknown. Requirements relating to complying with current good manufacturing practices (cGMP).\nB. Braun Medical Inc. (New 07/22/2015)\n\n\nBaxter Healthcare (Reverified 10/25/2016)\n\n\nFresenius Kabi USA, LLC (Revised 11/01/2016)\n\n\nHospira, Inc. (Reverified 10/21/2016)\n\n\nSagent Pharmaceuticals (Revised 08/29/2016)\n\n\nWG Critical Care (Revised 06/08/2016)
如何用新行\n
在更多列中分隔dataframe的内容:
我试着:
df['col'] = df['content'].str.split('\n', expand = true)
很明显,我得到的项目数量不对,通过了45个,位置意味着1个。同样因为我在做:
df = pd.DataFrame(lis, columns = ['content'])
我无法使用sep
。在
类似问题here
编辑 在讨论完之后,这里是将多个文件加载到单个数据帧中的更新代码:
^{pr2}$需要注意的关键事项: “\n”是原始数据中的文本,因此它在python中被读入为“\\n”。read_csv中的sep关键字不允许对多个字符进行分隔,这就是为什么您对此有问题。在
这将输出每个字符串所在的文件和行号。它假定files变量包含一个带有路径的文件名列表。在
相关问题 更多 >
编程相关推荐