如何在Python中查找特定文件

2条回答

网友

1楼 · 编辑于 2024-05-23 22:35:30

你可以用glob

In [4]: import glob

In [5]: files = glob.glob('*_Q9UNA3_*')

In [6]: files
Out[6]: ['A4GNT_Q9UNA3_MutationOutput.txt']

网友

2楼 · 编辑于 2024-05-23 22:35:30

What I need to do is correctly identify the gene name. So somehow, I need to quickly search over that directory, find the file that has the line[i+1] value in it's uniprot field and then pull out the gene name.

想想你在壳里是怎么做到的：

$ ls mutation_directory/*_A8K2U0_MutationOutput.txt
mutation_directory/A2ML1_A8K2U0_MutationOutput.txt

或者，如果您在Windows上：

D:\Somewhere> dir mutation_directory\*_A8K2U0_MutationOutput.txt
A2ML1_A8K2U0_MutationOutput.txt

在Python中，您可以使用^{}模块执行完全相同的操作：

>>> import glob
>>> glob.glob('mutation_directory/*_A8K2U0_MutationOutput.txt')
['mutation_directory/A2ML1_A8K2U0_MutationOutput.txt']

当然，你可以用一个函数来表示：

>>> def find_gene(uniprot):
...     pattern = 'mutation_directory/*_{}_MutationOutput.txt'.format(uniprot)
...     return glob.glob(pattern)[0]

But is there a way I can do that smarter? Should I use a dictionary?

这是否“聪明”取决于你的使用模式。你知道吗

如果每次运行都要查找数千个文件，那么只读取一次目录并使用字典而不是重复搜索肯定会更有效率。但是如果你打算，例如，无论如何，读取一个完整的文件，那要比查找它花费数个数量级的时间，所以这可能无关紧要。你知道他们怎么说过早优化。你知道吗

但是如果你想，你可以很容易地用Uniprot数字来编一本字典：

d = {}
for f in os.listdir('mutation_directory'):
    gene, uniprot, suffix = f.split('_')
    d[uniprot] = f

然后：

>>> d['A8K2U0']
'mutation_directory/A2ML1_A8K2U0_MutationOutput.txt'

Can I just do a quick regex search?

对于简单的情况，不需要正则表达式。*

更重要的是，你要找什么？要么你要循环，在这种情况下你可以使用glob，要么你要建立一个人工的巨型字符串来搜索，在这种情况下你最好只建立字典。你知道吗

_{*事实上，至少在某些平台/实现上，glob是通过从简单的通配符模式中生成正则表达式来实现的，但您不必担心这一点。}

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在Python中查找特定文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >