Unicode类别数据库

unicategories的Python项目详细描述


单一类别

Unicode类别数据库,在安装时生成。

此模块公开包含RangeGroup实例的类别字典。

示例

fromunicategoriesimportcategoriesupperchars=categories['Lu'].characters()# iteratorprint('Unicode uppercase caracters are "%s"'%''.join(upperchars))# Unicode uppercase caracters are "ABCDEF..."

范围组

不可变iterable(基于元组,使用一些有用的方法)的(开始,结束) 元组就像python的range,在末尾打开。

为了提高存储效率,我们选择了这种方法,分别存储 记忆中的字符会占用大量的记忆。

rangegroup类提供以下方法:

range_group.characters()

Get iterator with all characters on this range group.

:yields:iterator of characters (str of size 1):ytype:str

range_group.codes()

Get iterator for all unicode code points contained in this range group.

:yields:iterator of character index (int):ytype:int

range_group.has(character)

Get if character (or character code point) is contained by any range on
this range group.

:param character:character or unicode code point to look for:type character:str or int:returns:True if character is contained by any range, False otherwise:rtype:bool

Unicode类别


取自wikipedia

ValueCategory Major, minorBasic typeCharacter assignedFixedRemarks
LuLetter, uppercaseGraphicCharacter
LlLetter, lowercaseGraphicCharacter
LtLetter, titlecaseGraphicCharacterLigatures containing uppercase followed by lowercase letters (e.g., ^{} , ^{} , ^{} , and ^{} )
LmLetter, modifierGraphicCharacter
LoLetter, otherGraphicCharacter
MnMark, nonspacingGraphicCharacter
McMark, spacing combiningGraphicCharacter
MeMark, enclosingGraphicCharacter
NdNumber, decimal digitGraphicCharacterAll these, and only these, have Numeric Type = De
NlNumber, letterGraphicCharacterNumerals composed of letters or letterlike symbols (e.g., Roman numerals )
NoNumber, otherGraphicCharacterE.g., vulgar fractions , superscript and subscript digits
PcPunctuation, connectorGraphicCharacterIncludes "_" underscore
PdPunctuation, dashGraphicCharacterIncludes several hyphen characters
PsPunctuation, openGraphicCharacterOpening bracket characters
PePunctuation, closeGraphicCharacterClosing bracket characters
PiPunctuation, initial quoteGraphicCharacterOpening quotation mark . Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
PfPunctuation, final quoteGraphicCharacterClosing quotation mark. May behave like Ps or Pe depending on usage
PoPunctuation, otherGraphicCharacter
SmSymbol, mathGraphicCharacter
ScSymbol, currencyGraphicCharacter
SkSymbol, modifierGraphicCharacter
SoSymbol, otherGraphicCharacter
ZsSeparator, spaceGraphicCharacterIncludes the space, but not TAB , CR , or LF , which are Cc
ZlSeparator, lineFormatCharacterOnly U+2028 LINE SEPARATOR (LSEP)
ZpSeparator, paragraphFormatCharacterOnly U+2029 PARAGRAPH SEPARATOR (PSEP)
CcOther, controlControlCharacterFixed 65No name , ^{}
CfOther, formatFormatCharacterIncludes the soft hyphen , control characters to support bi-directional text , and language tag characters
CsOther, surrogateSurrogateNot (but abstract)Fixed 2,048No name , ^{}
CoOther, private usePrivate-useNot (but abstract)Fixed 137,468 total: 6,400 in BMP , 131,068 in Planes 15–16No name , ^{}
CnOther, not assignedNoncharacterNotFixed 66No name , ^{}
CnOther, not assignedReservedNotNot fixedNo name , ^{}

除此之外,unicategories还提供一般类别LMNPSZC

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java OnResizeListener或OnDrawListener或类似的东西   java Orika映射嵌套子列表   保存时java Heroku请求超时代码H12   数据库在Java中出现socket读取超时异常的原因是什么?   java如何更改来自Sqlite数据库的特定数据在Listview中的行颜色   java JAXB解组器无法正确处理XML中的列表   java Android日期时区让我抓狂   java不透明属性在Swing中如何工作?   eclipse从JavaEE代码生成流程图   java如何在Hibernate中从相关表中获取计数   java Glassfish部署了项目的依赖项库   java使内容适合JavaFx中的WebView   java不满意的链接错误libcrypto。所以1.0.0   循环中java数组的使用   java找出哪个包调用服务