在python中如何用nltk标识字符串中的颜色?

2024-06-16 11:39:36 发布

您现在位置:Python中文网/ 问答频道 /正文

这个问题本身就说明了问题,但我的问题是我想用nltk识别字符串中的颜色,我能找到的就是如何对词类进行分类。我知道我可以列出我想支持的所有颜色,但是因为我想支持css中所有可用的颜色,这将是一个相当长的列表(其中一些颜色会变得奇怪,比如青色和海蓝色)。如果有比把它们都写出来更简单的方法,我们将不胜感激。谢谢!在

编辑:

当我第一次问我的问题时,我似乎忘了提,我要求颜色名称像自然语言一样间隔开,而不是在一起运行,因为它在语音识别中的应用。因此,我选择了“Tadhg McDonald Jensen”的答案作为最佳答案,因为它很好地回答了我最初的问题。不过,我也张贴了我自己的答案,提供了颜色名称与空格。希望这有帮助!在


Tags: 方法字符串答案名称编辑列表间隔颜色
3条回答

您可以使用the ^{} package获取它识别的所有css颜色名称,只需检查webcolors.CSS3_NAMES_TO_HEX的成员资格:

>>> import webcolors
>>> "green" in webcolors.CSS3_NAMES_TO_HEX
True
>>> "deepskyblue" in webcolors.CSS3_NAMES_TO_HEX
True
>>> "aquamarine" in webcolors.CSS3_NAMES_TO_HEX
True
>>> len(webcolors.CSS3_NAMES_TO_HEX)
147

这意味着webcolors.CSS3_NAMES_TO_HEX.keys()将给您一个python2中的列表或python3中的dictkeys集合,其中包含所有css3颜色名称。

解决方案(无论如何对我来说):

Note: If you simply need the colors without spaces ('deepskyblue' instead of 'deep sky blue') any of the previous answers will work. However, since I'm using this in combination with speech recognition I need the colors seperated by spaces as in natural language which can be achieved using the following code (in python 3) which I view as more complete:

import urllib.request
from bs4 import BeautifulSoup

def getColors():
    html = urllib.request.urlopen('http://www.w3schools.com/colors/colors_names.asp').read()
    soup = BeautifulSoup(html, 'html.parser')
    children = [item.findChildren() for item in soup.find_all('tr')]
    colors = [''.join( ' '+x if 'A' <= x <= 'Z' else x for x in item[0].text.replace(u'\xa0', '')).strip().lower() for item in children]
    return colors[1:]

如果你跑了

^{pr2}$

你会得到:

 ['alice blue', 'antique white', 'aqua', 'aquamarine', 'azure', 'beige', 'bisque', 'black', 'blanched almond', 'blue', 'blue violet', 'brown', 'burly wood', 'cadet blue', 'chartreuse', 'chocolate', 'coral', 'cornflower blue', 'cornsilk', 'crimson', 'cyan', 'dark blue', 'dark cyan', 'dark golden rod', 'dark gray', 'dark grey', 'dark green', 'dark khaki', 'dark magenta', 'dark olive green', 'dark orange', 'dark orchid', 'dark red', 'dark salmon', 'dark sea green', 'dark slate blue', 'dark slate gray', 'dark slate grey', 'dark turquoise', 'dark violet', 'deep pink', 'deep sky blue', 'dim gray', 'dim grey', 'dodger blue', 'fire brick', 'floral white', 'forest green', 'fuchsia', 'gainsboro', 'ghost white', 'gold', 'golden rod', 'gray', 'grey', 'green', 'green yellow', 'honey dew', 'hot pink', 'indian red', 'indigo', 'ivory', 'khaki', 'lavender', 'lavender blush', 'lawn green', 'lemon chiffon', 'light blue', 'light coral', 'light cyan', 'light golden rod yellow', 'light gray', 'light grey', 'light green', 'light pink', 'light salmon', 'light sea green', 'light sky blue', 'light slate gray', 'light slate grey', 'light steel blue', 'light yellow', 'lime', 'lime green', 'linen', 'magenta', 'maroon', 'medium aqua marine', 'medium blue', 'medium orchid', 'medium purple', 'medium sea green', 'medium slate blue', 'medium spring green', 'medium turquoise', 'medium violet red', 'midnight blue', 'mint cream', 'misty rose', 'moccasin', 'navajo white', 'navy', 'old lace', 'olive', 'olive drab', 'orange', 'orange red', 'orchid', 'pale golden rod', 'pale green', 'pale turquoise', 'pale violet red', 'papaya whip', 'peach puff', 'peru', 'pink', 'plum', 'powder blue', 'purple', 'rebecca purple', 'red', 'rosy brown', 'royal blue', 'saddle brown', 'salmon', 'sandy brown', 'sea green', 'sea shell', 'sienna', 'silver', 'sky blue', 'slate blue', 'slate gray', 'slate grey', 'snow', 'spring green', 'steel blue', 'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'wheat', 'white', 'white smoke', 'yellow', 'yellow green']

希望这有帮助!

我不会使用nltk而是regex。

  1. 获取所有css颜色的列表(here
  2. 提取颜色名称并建立一个列表(使用beauthulsoup)
  3. 构建正则表达式模式
  4. 使用此regex模式匹配字符串中需要的内容

我的工作
(如果需要,只需更改最后两行和代理设置)

from bs4 import BeautifulSoup

color_url = 'http://colours.neilorangepeel.com/'
proxies = {'http': 'http://proxy.foobar.fr:3128'}#if needed

#GET THE HTML FILE
import urllib.request
authinfo = urllib.request.HTTPBasicAuthHandler()# set up authentication info
proxy_support = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)# build a new opener that adds authentication and caching FTP handlers
urllib.request.install_opener(opener)# install the opener
colorfile = urllib.request.urlopen(color_url)

soup = BeautifulSoup(colorfile, 'html.parser')

#BUILD THE REGEX PATERN
colors = soup.find_all('h1')
colorsnames = [color.string for color in colors]
colorspattern = '|'.join(colorsnames)
colorregex = re.compile(colorspattern)

#MATCH WHAT YOU NEED
if colorregex.search(yourstring):
    do what you want

相关问题 更多 >