Python中的复杂数据抽取

2024-05-12 21:44:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要一些帮助来启动一个程序。我每周都打几场在线扑克比赛。事实证明,我使用的网站记录手的历史,并保存到我的硬盘驱动器作为.txt文件。不幸的是,数据的格式有些粗糙。我想创建一个程序,每一只手,告诉我多少我赢了或输了。我已经从下面的一只手上粘贴了一个样本,我想从每只手上提取以下信息。你知道吗

  1. 盲牌和赌注。在示例中向下滚动,可以看到“玩家8有小盲牌(250)”,然后是“玩家1有大盲牌(500)”。上面提到了每个玩家的赌注“玩家英雄赌注(50)”。在这种情况下,小盲=250,大盲=500,安特=50。

  2. 我的堆栈大小。我将我的玩家表示为“英雄”。我的堆栈大小在第6行,上面写着“座位3:英雄(17595)”。在这个例子中,我的堆栈大小是17595。

  3. 我的手。在这个例子中,它由“玩家英雄收到卡:[10c];玩家英雄收到卡:[7h]”表示。所以我的手是“10c7h”

  4. 玩家数量。样本中有8个玩家。

  5. 我的位置。这个可能很棘手。我决定从大盲板开始,给它赋值0。小盲=1,按钮=2,等等。这在某种程度上违背了“扑克逻辑”,但从编程的角度来看,对我来说更有意义,因为总会有一个大盲,而其他一些位置将取决于桌上有多少玩家。

  6. 利润/损失。这在“摘要”标签下的文本底部附近玩家英雄不显示打牌。下注: 50. 收集:0。损失:50。“在这种情况下,我的利润是-50(即损失50),这意味着我付了50英镑的赌注,并把我的手合拢。

下面是.txt文件的外观。注意这是一只手。在实际的.txt文件中,这只手后面会跟着几十只或几百只其他的手。开始总是用“游戏开始”来表示,最后一行总是说“游戏结束”。你知道吗

Game started at: 2018/1/9 10:14:10
Game ID: 1094127759 250/500 $5,000 GTD, Table 4 (Hold'em)
Seat 7 is the button
Seat 1: Player1 (9650).
Seat 2: Player2 (19433).
Seat 3: Hero (17595).
Seat 4: Player4 (8900).
Seat 5: Player5 (12670).
Seat 6: Player6 (11187).
Seat 7: Player7 (11300).
Seat 8: Player8 (17603).
Player Player8 ante (50)
Player Player1 ante (50)
Player Player2 ante (50)
Player Hero ante (50)
Player Player4 ante (50)
Player Player5 ante (50)
Player Player6 ante (50)
Player Player7 ante (50)
Player Player8 has small blind (250)
Player Player1 has big blind (500)
Player Player8 received a card.
Player Player8 received a card.
Player Player1 received a card.
Player Player1 received a card.
Player Player2 received a card.
Player Player2 received a card.
Player Hero received card: [10c]
Player Hero received card: [7h]
Player Player4 received a card.
Player Player4 received a card.
Player Player5 received a card.
Player Player5 received a card.
Player Player6 received a card.
Player Player6 received a card.
Player Player7 received a card.
Player Player7 received a card.
Player Player2 folds
Player Hero folds
Player Player4 raises (1000)
Player Player5 folds
Player Player6 folds
Player Player7 folds
Player Player8 folds
Player Player1 folds
Uncalled bet (500) returned to Player4
Player Player4 mucks cards
------ Summary ------
Pot: 1650
Player Player1 does not show cards.Bets: 550. Collects: 0. Loses: 550.
Player Player2 does not show cards.Bets: 50. Collects: 0. Loses: 50.
Player Hero does not show cards.Bets: 50. Collects: 0. Loses: 50.
*Player Player4 mucks (does not show cards). Bets: 550. Collects: 1650. Wins: 1100.
Player Player5 does not show cards.Bets: 50. Collects: 0. Loses: 50.
Player Player6 does not show cards.Bets: 50. Collects: 0. Loses: 50.
Player Player7 does not show cards.Bets: 50. Collects: 0. Loses: 50.
Player Player8 does not show cards.Bets: 300. Collects: 0. Loses: 300.
Game ended at: 2018/1/9 10:14:52

感谢您的帮助。甚至只是一些关于我该怎么做或者我该学什么的想法。在我看来,输出应该是这样的:

HandNumber = 000001
BigBlind = 500
Ante = 50
Players = 8
StackSize = 17595
Hand = 10c7h
Position = 6    # small blind = 1; add 5 since I'm 5 positions removed
Profit = -50

我的经验水平:我已经参加了大约6个月的Python开发、数据科学和SQL的在线课程。我对课程有一些熟悉,但没有太多创建自己课程的经验。我设计了一些程序,帮助使用正则表达式从财务报表中提取数据。你知道吗


Tags: show玩家notcardcardsplayerreceivedseat
1条回答
网友
1楼 · 发布于 2024-05-12 21:44:24

这将是最容易解决的使用正则表达式分割不同的游戏,然后更多的正则表达式提取信息。 我会制作一个类来保存所有这些信息。然后可以使用db或json来存储这些信息

def split_file(file_handle):
    pat_str = '''\
^Game started at: (?P<game_start>.*?)
(?P<game>.*?)
^    Summary    
(?P<summary>.*)
^Game ended at: (?P<game_end>.*)$\
'''
    pat = re.compile(pat_str, flags=re.MULTILINE|re.DOTALL)
    text = file_handle.read()
    for game in pat.finditer(text):
        yield game

class Pokergame:
    def __init__(self, game_info, playername = 'Hero'):
        self.game_start = datetime.datetime.strptime(game_info['game_start'], "%Y/%m/%d %H:%M:%S")
        self.game_end = datetime.datetime.strptime(game_info['game_end'], "%Y/%m/%d %H:%M:%S")
        self.game_info = _parse_game(game_info['game'], playername)
        self.summary = _parse_summary(game_info['summary'], playername)

def _parse_game(game_str, playername):
    pattern_seat = f'Seat (\d+): {playername} \((\d+)\).'
    seat_match = re.search(pattern=pattern_seat, string=game_str)
    if seat_match:
        seat, stack = seat_match.groups()
    pattern_cards = f'Player {playername} received card: \[(?P<card>\w+)\]'
    cards = tuple(i['card'] for i in re.finditer(pattern_cards, game_str))

    result = {
        'seat': seat,
        'stack': stack,
        'cards': cards,
        'text': game_str,
    }
    return result   

def _parse_summary(summary_str, playername):

    return summary_str


games = []
with StringIO(hand_text) as file_handle:
    for game_info in split_file(file_handle):
        games.append(Pokergame(game_info))

我已经使用StringIO来模拟open(file)。您将不得不充实__init___parse_...更多内容,但这将使您走上正确的道路。你知道吗

如果您有多个文件,可以使用itertools.chain连接游戏

games[0].game_info
{'cards': ('10c', '7h'),
 'seat': '3',
 'stack': '17595',
 'text': "Game ID: 1094127759 250/500 $5,000 GTD, ...\nPlayer Player4 mucks cards"}

相关问题 更多 >