<p><a href="https://pandas.pydata.org/" rel="nofollow noreferrer">Pandas</a>应该是一个选项,如果您不介意先花点时间将数据加载到dataframe。你知道吗</p>
<p>下面是一个使用<strong>Pandas</strong>的解决方案,然后简单地将时间成本与<strong>map</strong>解决方案进行比较。你知道吗</p>
<pre><code>import pandas as pd
from datetime import datetime
data = [
["10","2018-03-22 14:38:18.329963","name 10","url10","True"],
["11","2018-03-22 14:38:18.433497","name 11","url11","False"],
["12","2018-03-22 14:38:18.532312","name 12","url12","False"]
]*10000 #multiply 10000 to simulate large data, you can test with one bigger number.
#Pandas
df = pd.DataFrame(data=data, columns=['seq', 'datetime', 'name', 'url', 'boolean'])
pandas_beg = datetime.now()
df['seq'] = df['seq'].astype(int)
df['url'] = 'http://' + df['url']
df['boolean'] = df['boolean'] == 'True'
pandas_end = datetime.now()
print('pandas: ', (pandas_end - pandas_beg))
#map
def clean_data(row):
val, date, name, url, truthy = row
return [int(val), date, name, 'http://{}'.format(url), truthy == 'True']
map_beg = datetime.now()
result = list(map(clean_data, data))
map_end = datetime.now()
print('map: ', (map_end - map_beg))
</code></pre>
<p><strong>输出:</strong></p>
<pre><code>pandas: 0:00:00.016091
map: 0:00:00.036025
[Finished in 0.997s]
</code></pre>