如何删除带制表符的行?你知道吗
我有一个这样的文件:
0 absinth
Bohemian-style absinth
Bohemian-style or Czech-style absinth (also called anise-free absinthe, or just “absinth” without the “e”) is an ersatz version of the traditional spirit absinthe, though is more accurately described as a kind of wormwood bitters.
It is produced mainly in the Czech Republic, from which it gets its designations as “Bohemian” or “Czech,” although not all absinthe from the Czech Republic is Bohemian-style.
1 acidophilus milk
Sweet acidophilus milk is consumed by individuals who suffer from lactose intolerance or maldigestion, which occurs when enzymes (lactase) cannot break down lactose (milk sugar) in the intestine.
To aid digestion in those with lactose intolerance, milk with added bacterial cultures such as "Lactobacillus acidophilus" ("acidophilus milk") and bifidobacteria ("a/B milk") is available in some areas.
High Activity of Lactobacillus Acidophilus Milk
2 adobo
Adobo
Adobo (Spanish: marinade, sauce, or seasoning) is the immersion of raw food in a stock (or sauce) composed variously of paprika, oregano, salt, garlic, and vinegar to preserve and enhance its flavor.
In the Philippines, the name "adobo" was given by the Spanish colonists to an indigenous cooking method that also uses vinegar, which although superficially similar had developed independent of Spanish influence.
所需的输出具有移除了制表符的行,即:
Bohemian-style absinth
Bohemian-style or Czech-style absinth (also called anise-free absinthe, or just “absinth” without the “e”) is an ersatz version of the traditional spirit absinthe, though is more accurately described as a kind of wormwood bitters.
It is produced mainly in the Czech Republic, from which it gets its designations as “Bohemian” or “Czech,” although not all absinthe from the Czech Republic is Bohemian-style.
Sweet acidophilus milk is consumed by individuals who suffer from lactose intolerance or maldigestion, which occurs when enzymes (lactase) cannot break down lactose (milk sugar) in the intestine.
To aid digestion in those with lactose intolerance, milk with added bacterial cultures such as "Lactobacillus acidophilus" ("acidophilus milk") and bifidobacteria ("a/B milk") is available in some areas.
High Activity of Lactobacillus Acidophilus Milk
Adobo
Adobo (Spanish: marinade, sauce, or seasoning) is the immersion of raw food in a stock (or sauce) composed variously of paprika, oregano, salt, garlic, and vinegar to preserve and enhance its flavor.
In the Philippines, the name "adobo" was given by the Spanish colonists to an indigenous cooking method that also uses vinegar, which although superficially similar had developed independent of Spanish influence.
我可以在python中执行以下操作以获得相同的结果:
with open('file.txt', 'r') as fin, open('file2.txt', 'w') as fout:
for line in fin:
if '\t' in line:
continue
else:
fout.write(line)
但我有数百万条线路,效率不高。所以我试着用cut删除第二行,然后用单个字符删除行:
$ cut -f1 WIKI_WN_food | awk 'length>1' | less
什么是获得所需输出的更适合的方法?
有没有比我上面展示的cut+awk管道更有效的方法?
。。。。。。。。。。。。你知道吗
您的代码正常,您可以尝试优化只在字符串开头查找:
如果子字符串的长度取决于最大记录数,那么它可能会对不匹配的长字符串产生影响,谁知道呢。。。你知道吗
此外,您可能希望测试
mawk
、grep
等,如看看它是否比python解决方案快。你知道吗
测试
在我的系统里,有一个重复复制你的文件。它的尺寸是1418973184 我有大约的时间如下:grep1.6s、sed6.4s、python4.6s。你知道吗
附录
我用
mawk
测试了Jidder awk解决方案(在评论中给出),我的近似时间是3.2s。。。获胜者是grep -vF
测试成绩单
执行之间的运行时间相差0.1秒,这里我只报告每个命令的一个运行时间。。。为了接近结果,人们不能做出明确的决定。你知道吗
另一方面,不同的工具给出的结果与实验误差相差甚远,我认为我们可以得出一些结论。。。你知道吗
我的示例文件有一个截断的最后一行,因此python和sed之间的行数相差一倍,而其他所有工具都是如此。你知道吗
你可以用sed做这个
查找“\t”并删除包含它的行
相关问题 更多 >
编程相关推荐