用python处理大型文本文件

3条回答

网友

1楼 · 编辑于 2024-04-20 11:22:57

假设每个条目的结构始终相同，只需执行以下操作：

import csv

# Open the file
f = open("/path/to/large.file", "r")
# Create an output file
output_file = open("/desired/path/to/final/file", "w")

# Use the CSV module to make use of existing functionality.
final_file = csv.writer(output_file)

# Write the header row - can be skipped if headers not needed.
final_file.writerow(["LoginID","EmailAddress"])

# Set up our temporary cache for a user
current_user = []

# Iterate over the large file
# Note that we are avoiding loading the entire file into memory
for line in f:
    if line.startswith("LoginID"):
        current_user.append(line[9:].strip())
    # If more information is desired, simply add it to the conditions here
    # (additional elif's should do)
    # and add it to the current user.

    elif line.startswith("mail"):
        current_user.append(line[6:].strip())
        # Once you know you have reached the end of a user entry
        # write the row to the final file
        # and clear your temporary list.
        final_file.writerow(current_user)
        current_user = []

    # Skip lines that aren't interesting.
    else:
        continue

网友

2楼 · 编辑于 2024-04-20 11:22:57

在我看来，这实际上像是一个LDIF文件。python-ldap库有一个纯Python LDIF处理库，如果您的文件具有LDIF中可能存在的一些讨厌的问题，例如Base64编码值、条目折叠等，则可以提供帮助

你可以这样使用它：

import csv
import ldif

class ParseRecords(ldif.LDIFParser):
   def __init__(self, csv_writer):
       self.csv_writer = csv_writer
   def handle(self, dn, entry):
       self.csv_writer.writerow([entry['LoginId'], entry['mail']])

with open('/path/to/large_file') as input, with open('output_file', 'wb') as output:
    csv_writer = csv.writer(output)
    csv_writer.writerow(['LoginId', 'Mail'])
    ParseRecords(input, csv_writer).parse()

编辑

因此，要使用python-ldap库从一个活动的LDAP目录中提取，您需要执行如下操作：

^{2}$

可能值得通读一下documentation for the ldap module，尤其是example。在

请注意，在上面的示例中，我完全跳过了提供过滤器，您可能希望在生产中使用它。LDAP中的过滤器类似于SQL语句中的WHERE子句；它限制返回的对象。Microsoft actually has a good guide on LDAP filters。LDAP过滤器的规范引用是RFC 4515。在

类似地，如果即使在应用了适当的过滤器之后仍有可能有几千个条目，那么您可能需要研究LDAP paging control，尽管使用它会再次使示例更加复杂。希望这已经足够让你开始了，但是如果有任何事情发生，请随时提出或提出一个新的问题。在

祝你好运。在

网友

3楼 · 编辑于 2024-04-20 11:22:57

再次假设您的文件格式正确：

with open(inputfilename) as inputfile, with open(outputfilename) as outputfile:
    mail = loginid = ''
    for line in inputfile:
        line = inputfile.split(':')
        if line[0] not in ('LoginId', 'mail'):
            continue
        if line[0] == 'LoginId':
            loginid = line[1].strip()
        if line[0] == 'mail':
            mail = line[1].strip()
        if mail and loginid:
            output.write(loginid + ',' + mail + '\n')
            mail = loginid = ''

基本上等同于其他方法。在

相关问题更多 >

编程相关推荐

热门问题

热门文章