<p><code>pd.read_sas()</code>中的<code>encoding</code>参数导致我有非常大的数据帧,这会导致我出现与内存相关的错误。</p>
<p>处理这个问题的另一种方法是将字节字符串<code>convert</code>转换成另一种编码(例如<code>utf8</code>)。</p>
<h3>示例:</h3>
<p>数据帧示例:</p>
<pre class="lang-py prettyprint-override"><code>
df = pd.DataFrame({"A": [1, 2, 3],
"B": [b"a", b"b", b"c"],
"C": ["a", "b", "c"]})
</code></pre>
<p>将字节字符串转换为字符串:</p>
<pre class="lang-py prettyprint-override"><code>for col in df:
if isinstance(df[col][0], bytes):
print(col, "will be transformed from bytestring to string")
df[col] = df[col].str.decode("utf8") # or any other encoding
print(df)
</code></pre>
<p>输出:</p>
<pre><code> A B C
0 1 a a
1 2 b b
2 3 c c
</code></pre>
<p>有用的链接:</p>
<ol>
<li><p><a href="https://www.geeksforgeeks.org/python-pandas-series-str-decode/" rel="nofollow noreferrer">Pandas Series.str.decode() page of GeeksforGeeks</a>(我找到解决方案的地方)</p></li>
<li><p><a href="https://stackoverflow.com/questions/6224052/what-is-the-difference-between-a-string-and-a-byte-string">What is the difference between a string and a byte string?</a></p></li>
</ol>