在python中如何用nltk标识字符串中的颜色？

3条回答

网友

1楼 · 编辑于 2024-06-16 11:39:36

您可以使用the ^{} package获取它识别的所有css颜色名称，只需检查webcolors.CSS3_NAMES_TO_HEX的成员资格：

>>> import webcolors
>>> "green" in webcolors.CSS3_NAMES_TO_HEX
True
>>> "deepskyblue" in webcolors.CSS3_NAMES_TO_HEX
True
>>> "aquamarine" in webcolors.CSS3_NAMES_TO_HEX
True
>>> len(webcolors.CSS3_NAMES_TO_HEX)
147

这意味着webcolors.CSS3_NAMES_TO_HEX.keys()将给您一个python2中的列表或python3中的dictkeys集合，其中包含所有css3颜色名称。

网友

2楼 · 编辑于 2024-06-16 11:39:36

解决方案（无论如何对我来说）：

Note: If you simply need the colors without spaces ('deepskyblue' instead of 'deep sky blue') any of the previous answers will work. However, since I'm using this in combination with speech recognition I need the colors seperated by spaces as in natural language which can be achieved using the following code (in python 3) which I view as more complete:

import urllib.request
from bs4 import BeautifulSoup

def getColors():
    html = urllib.request.urlopen('http://www.w3schools.com/colors/colors_names.asp').read()
    soup = BeautifulSoup(html, 'html.parser')
    children = [item.findChildren() for item in soup.find_all('tr')]
    colors = [''.join( ' '+x if 'A' <= x <= 'Z' else x for x in item[0].text.replace(u'\xa0', '')).strip().lower() for item in children]
    return colors[1:]

如果你跑了

^{pr2}$

你会得到：

 ['alice blue', 'antique white', 'aqua', 'aquamarine', 'azure', 'beige', 'bisque', 'black', 'blanched almond', 'blue', 'blue violet', 'brown', 'burly wood', 'cadet blue', 'chartreuse', 'chocolate', 'coral', 'cornflower blue', 'cornsilk', 'crimson', 'cyan', 'dark blue', 'dark cyan', 'dark golden rod', 'dark gray', 'dark grey', 'dark green', 'dark khaki', 'dark magenta', 'dark olive green', 'dark orange', 'dark orchid', 'dark red', 'dark salmon', 'dark sea green', 'dark slate blue', 'dark slate gray', 'dark slate grey', 'dark turquoise', 'dark violet', 'deep pink', 'deep sky blue', 'dim gray', 'dim grey', 'dodger blue', 'fire brick', 'floral white', 'forest green', 'fuchsia', 'gainsboro', 'ghost white', 'gold', 'golden rod', 'gray', 'grey', 'green', 'green yellow', 'honey dew', 'hot pink', 'indian red', 'indigo', 'ivory', 'khaki', 'lavender', 'lavender blush', 'lawn green', 'lemon chiffon', 'light blue', 'light coral', 'light cyan', 'light golden rod yellow', 'light gray', 'light grey', 'light green', 'light pink', 'light salmon', 'light sea green', 'light sky blue', 'light slate gray', 'light slate grey', 'light steel blue', 'light yellow', 'lime', 'lime green', 'linen', 'magenta', 'maroon', 'medium aqua marine', 'medium blue', 'medium orchid', 'medium purple', 'medium sea green', 'medium slate blue', 'medium spring green', 'medium turquoise', 'medium violet red', 'midnight blue', 'mint cream', 'misty rose', 'moccasin', 'navajo white', 'navy', 'old lace', 'olive', 'olive drab', 'orange', 'orange red', 'orchid', 'pale golden rod', 'pale green', 'pale turquoise', 'pale violet red', 'papaya whip', 'peach puff', 'peru', 'pink', 'plum', 'powder blue', 'purple', 'rebecca purple', 'red', 'rosy brown', 'royal blue', 'saddle brown', 'salmon', 'sandy brown', 'sea green', 'sea shell', 'sienna', 'silver', 'sky blue', 'slate blue', 'slate gray', 'slate grey', 'snow', 'spring green', 'steel blue', 'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'wheat', 'white', 'white smoke', 'yellow', 'yellow green']

希望这有帮助！

网友

3楼 · 编辑于 2024-06-16 11:39:36

我不会使用nltk而是regex。

获取所有css颜色的列表（here）
提取颜色名称并建立一个列表（使用beauthulsoup）
构建正则表达式模式
使用此regex模式匹配字符串中需要的内容

我的工作
（如果需要，只需更改最后两行和代理设置）

from bs4 import BeautifulSoup

color_url = 'http://colours.neilorangepeel.com/'
proxies = {'http': 'http://proxy.foobar.fr:3128'}#if needed

#GET THE HTML FILE
import urllib.request
authinfo = urllib.request.HTTPBasicAuthHandler()# set up authentication info
proxy_support = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)# build a new opener that adds authentication and caching FTP handlers
urllib.request.install_opener(opener)# install the opener
colorfile = urllib.request.urlopen(color_url)

soup = BeautifulSoup(colorfile, 'html.parser')

#BUILD THE REGEX PATERN
colors = soup.find_all('h1')
colorsnames = [color.string for color in colors]
colorspattern = '|'.join(colorsnames)
colorregex = re.compile(colorspattern)

#MATCH WHAT YOU NEED
if colorregex.search(yourstring):
    do what you want

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中如何用nltk标识字符串中的颜色？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >