随机插入不存在的分隔符

2024-05-29 03:06:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我对这件事真是摸不着头脑,但这对我来说毫无意义。我用熊猫是一种非常简单的方式,在tsv中阅读。下面是最简单的代码:

source = pd.read_csv("neimanmarcus.csv", sep="\t")
images = source["image_link"]

此文件中的所有行正好有53个制表符。出于某种原因,熊猫们认为其中大约2%的熊猫恰好有72个标签符号。这将导致以下错误:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 54 fields in line x, saw 73

也就是说,在手动检查时,我在受影响的行中找不到任何差异。在这种情况下,跳过行是非常有问题的,所以我正试图解决这个问题,但我已经束手无策了。我很抱歉,如果这是一些愚蠢的,但这里是“正确”和“不正确”行的例子

正确:

sku157001669    Tango Dancer-Print A-Line Dress, Size: 4, TANGO - Carolina Herrera  Carolina Herrera Tango Dancer-Print A-Line Dress Details Carolina Herrera tango dancer-print woven dress. Approx. measurements: 35.5"L center back to hem, 35.5"L center front to hem. V'd jewel neckline. Cap sleeves. Self-tie belt at natural waist; ties at left. Inverted center pleat at A-line skirt. Straight hem. Fit and flare silhouette. Hidden back zip. Cotton/spandex; dry clean. Made in Italy. Model's measurements: Height 5'10"/177cm, bust 34"/86cm, waist 26"/66cm, hips 35.5"/90cm, dress size US 2. Designer About Carolina Herrera: The empress of classically refined looks for both day and evening, Carolina Herrera launched her eponymous line in 1980 after encouragement from her friend, legendary Vogue editor Diana Vreeland. Over the years she has collected a number of fashion's highest accolades as well as a star-studded client list. With both a global focus and adoration for the sum of all things beautiful, Carolina Herrera has been hailed as "Fashion's First Lady." Size: 4. Color: TANGO. Age Group: Adult. Material: 97% COTTON, 3% ELASTANE. Apparel & Accessories > Clothing > Dresses  Women's Apparel > Mid-Length > Daytime Dresses > Mid    1390.00 USD 1390.00 USD     http://www.neimanmarcus.com/en-us/Carolina-Herrera-Tango-Dancer-Print-A-Line-Dress/prod177890243/p.prod     http://images.neimanmarcus.com/product_assets/B/2/W/Y/K/NMB2WYK_mz.jpg  http://images.neimanmarcus.com/product_assets/B/2/W/Y/K/NMB2WYK_az.jpg  Carolina Herrera    07667702164817  prod177890243       new in stock        prod177890243   TANGO   97% COTTON, 3% ELASTANE     4           female  Adult       US::Ground:0.00 USD                                                                                             

不正确:

sku158601482    Sleeveless Faux-Wrap Jersey Dress, Women's, Size: 2X, BLACK - Eileen Fisher Eileen Fisher Sleeveless Faux-Wrap Jersey Dress, Women's Details Eileen Fisher jersey dress in your choice of color. Round neckline; sleeveless. Faux-wrap style. Shift silhouette. Viscose/spandex; machine wash. Made in USA of imported materials. Model's measurements: Height 5'10.5"/179cm, bust 32"/81cm, waist 24"/61cm, hips 35.5"/90cm, dress size US 2/4. Necklace not included. Designer Please note: Apparel may be available in more sizes: Shop Eileen Fisher Petite Shop Eileen Fisher Women's About Eileen Fisher: Former interior and graphic designer Eileen Fisher launched her self-named collection in 1984. The acclaimed designer made her mark with clean lines, simple shapes, and a timeless, functional style. Size: 2X. Color: BLACK. Age Group: Adult. Material: " 92% Viscose/8% Spandex F4VF-D3502 / D2502X: Body: 92% Viscose, 8% Spandex Hem: 80% Recycled Polyester, 20% Lycra? F4VF-S1496: Body: 92% Viscose, 8% Spandex Hem Panel: 80% Recycled Polyester, 20% Lycra?. Apparel & Accessories > Clothing > Dresses  Women's Apparel > Women's > Special Sizes > Mid 198.00 USD  198.00 USD      http://www.neimanmarcus.com/en-us/Eileen-Fisher-Sleeveless-Faux-Wrap-Jersey-Dress-Women-s/prod179830418/p.prod      http://images.neimanmarcus.com/product_assets/T/A/6/X/8/NMTA6X8_mz.jpg  http://images.neimanmarcus.com/product_assets/T/A/6/X/8/NMTA6X8_az.jpg  Eileen Fisher   00713259663697  prod179830418       new in stock        prod179830418   BLACK   " 92% Viscose/8% Spandex F4VF-D3502 / D2502X: Body: 92% Viscose, 8% Spandex Hem: 80% Recycled Polyester, 20% Lycra? F4VF-S1496: Body: 92% Viscose, 8% Spandex Hem Panel: 80% Recycled Polyester, 20 Graphic 2X          female  Adult       US::Ground:0.00 USD                                 

在这种情况下,只需简单地调用line.split('\t')就可以像预期的那样工作,熊猫似乎因为某种原因而崩溃了


Tags: andincomhttpusdimagesfishercarolina
1条回答
网友
1楼 · 发布于 2024-05-29 03:06:45

您的数据包含不匹配的引号字符(它似乎使用"来表示英寸,例如Height 5'10.5")。这使解析器认为有引号字段,但由于引号不成对,导致数据损坏

尝试将quoting=csv.QUOTE_NONE作为附加参数传递给read_csv(你需要先做import csv。或者你可以传递quoting=3。)

相关问题 更多 >

    热门问题