Python:parse（date）返回有符号整数大于最大值问题的回答

Python:parse（date）返回有符号整数大于最大值

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

给出如下项目列表（用制表符分隔的列）： <ul> <li>9123456780\t John Dude\t地址城市\t 1980年7月19日\t M</li> <li>9123456781\t Jane Dudette\t地址省\t 1980年8月19日\t f</li> <li>9123456782\t Sam Pol Data\t等城市\t 1/1/91\t</li> <li>9123456783\t可能是1993年某个城市\t</li> <li>9123456784\t Mark Mywards\t地址城市\t M</li> <li>9123456785\t地址城市\t 1980年7月19日\t M</li> <li>9123456780\t米</li> <li>米拉诺瓦市地址：1980年7月19日</li> </ul> 我要确定哪一个是MSISDN（10位数字）、姓名、地址、日期和性别。在 我很确定这是不可能做到100%正确/准确的，因为缺乏比较点，而且经常丢失数据。在 所以我是这么做的： 一行一行地浏览列表。然后每行按制表符（\t）拆分，成为一个列表。然后在for循环中测试列表中的每个项目： <pre><code>for item in csv_cols: if reg_msisdn.match(item): s_msisdn = item if item.lower() in list_male or item.lower() in list_female: s_gender = item if parse(item): s_birthdate = item if any(ext in item.lower() for ext in list_place) or any(ext in item.lower() for ext in list_ad): s_address = item else: s_name = item s_all = s_msisdn + "^" + s_name + "^" + s_address + "^" + s_birthdate + "^" + s_gender </code></pre> 编辑：我在每一个<code>s_(value) = item</code>之后添加了一个<code>csv_cols.remove(item)</code>，这样测试的项目已经被删除了-它没有改变任何东西。在 <ol> <li>所有的<code>s_(value)</code>都以<code>NULL</code>作为文本开始</li> <li>如果任何项是10位数字（regex），则将其视为<code>s_msisdn</code>。在</li> <li>如果任何一个项目仅仅是m，f，male，female，female，它被认为是<code>s_gender</code>。在</li> <li>如果任何项目的关键字city、ave等（list_ad）或与地点列表（list_place）中的某个项目匹配，则将其视为<code>s_address</code>。在</li> <li>如果任何项可以是<a href="https://stackoverflow.com/questions/25341945/check-if-string-has-date-any-format?">parsed as a date</a>，则自动为<code>s_birthdate</code>。在</li> <li>否则，它可能是<code>s_name</code>。在</li> <li>编辑：从列表中删除所述项目。在</li> <li>整个过程都在Try异常块中。在</li> </ol> 我很肯定我的逻辑会有明显的漏洞，但我真的想不出其他办法来做。在 也就是说，即使使用这种分散式的逻辑，我也遇到了一些问题，特别是上面的第5项，它返回以下错误毫无帮助： <code>signed integer is greater than maximum</code> 我知道这一点是因为把它从循环中去掉，剩下的代码就可以工作了。在 我能帮忙吗？在 谢谢。在 注：我用的是Mac/UNIX。在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

几乎不可能在所有情况下区分姓名和地址（是姓名还是地址？）。最好的方法是假设名称总是出现在地址之前，否则您将面临各种复杂的查找。在 我会逐行处理数据，首先将行转换为如下列表： <pre><code>>>> row = "9123456780 \t John Dude \t City of Address \t July 19, 1980 \t M" >>> row = [entry.strip() for entry in row.split("\t")] >>> row ['9123456780', 'John Dude', 'City of Address', 'July 19, 1980', 'M'] </code></pre> 现在我要定义一组函数来决定每个条目代表什么。确定MSISDN编号、性别和日期应该相对简单。在 要确定行中的某个条目是否是10位的MSISDN编号： ^{pr2}$ 要确定行中的条目是否表示性别： <pre><code>def is_gender(entry): if entry in ("m", "f", "M", "F"): return True </code></pre> 要确定行中的条目是否代表日期： <pre><code>from dateutil.parser import parse def is_date(entry): try: parse(entry) return True except ValueError: return False </code></pre> 现在使用这些函数来构建另一个解析行条目的函数： <pre><code>def parse_row(row): s_all = ["<blank>"] * 5 for entry in row: if is_msisdn(entry): s_all[0] = entry elif is_gender(entry): s_all[4] = entry elif is_date(entry): s_all[3] = entry elif s_all[1] == "<blank>": s_all[1] = entry else: s_all[2] = entry return " ^ ".join(s_all) </code></pre> 例如： <pre><code>>>> row = ['Mira Nova', 'City of Address', 'July 19, 1980'] >>> parse_row(row) '<blank> ^ Mira Nova ^ City of Address ^ July 19, 1980 ^ <blank>' >>> row = ['9123456784', 'Mark Mywards', 'City of Address', 'M'] >>> parse_row(row) '9123456784 ^ Mark Mywards ^ City of Address ^ <blank> ^ M' </code></pre>

Python:parse（date）返回有符号整数大于最大值

1 个回答

相关Python问题