如何在Python中查找特定文件问题的回答

如何在Python中查找特定文件

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<blockquote> What I need to do is correctly identify the gene name. So somehow, I need to quickly search over that directory, find the file that has the line[i+1] value in it's uniprot field and then pull out the gene name. </blockquote> 想想你在壳里是怎么做到的： <pre><code>$ ls mutation_directory/*_A8K2U0_MutationOutput.txt mutation_directory/A2ML1_A8K2U0_MutationOutput.txt </code></pre> 或者，如果您在Windows上： <pre><code>D:\Somewhere> dir mutation_directory\*_A8K2U0_MutationOutput.txt A2ML1_A8K2U0_MutationOutput.txt </code></pre> 在Python中，您可以使用<a href="https://docs.python.org/3/library/glob.html" rel="nofollow">^{<cd1>}</a>模块执行完全相同的操作： <pre><code>>>> import glob >>> glob.glob('mutation_directory/*_A8K2U0_MutationOutput.txt') ['mutation_directory/A2ML1_A8K2U0_MutationOutput.txt'] </code></pre> 当然，你可以用一个函数来表示： <pre><code>>>> def find_gene(uniprot): ... pattern = 'mutation_directory/*_{}_MutationOutput.txt'.format(uniprot) ... return glob.glob(pattern)[0] </code></pre> <hr/> <blockquote> But is there a way I can do that smarter? Should I use a dictionary? </blockquote> 这是否“聪明”取决于你的使用模式。你知道吗 如果每次运行都要查找数千个文件，那么只读取一次目录并使用字典而不是重复搜索肯定会更有效率。但是如果你打算，例如，无论如何，读取一个完整的文件，那要比查找它花费数个数量级的时间，所以这可能无关紧要。你知道他们怎么说过早优化。你知道吗 但是如果你想，你可以很容易地用Uniprot数字来编一本字典： <pre><code>d = {} for f in os.listdir('mutation_directory'): gene, uniprot, suffix = f.split('_') d[uniprot] = f </code></pre> 然后： <pre><code>>>> d['A8K2U0'] 'mutation_directory/A2ML1_A8K2U0_MutationOutput.txt' </code></pre> <hr/> <blockquote> Can I just do a quick regex search? </blockquote> 对于简单的情况，不需要正则表达式。* 更重要的是，你要找什么？要么你要循环，在这种情况下你可以使用<code>glob</code>，要么你要建立一个人工的巨型字符串来搜索，在这种情况下你最好只建立字典。你知道吗 <hr/> *事实上，至少在某些平台/实现上，<code>glob</code>是通过从简单的通配符模式中生成正则表达式来实现的，但您不必担心这一点。

如何在Python中查找特定文件

1 个回答

相关Python问题