//Java 8 java.lang.String source code
public int codePointAt(int index) {
if ((index < 0) || (index >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return Character.codePointAtImpl(value, index, value.length);
}
//...
public int codePointBefore(int index) {
int i = index - 1;
if ((i < 0) || (i >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return Character.codePointBeforeImpl(value, index, 0);
}
字符中的相应方法识别并组合属于单个代码点的多个char:
//Java 8 java.lang.Character source code
static int codePointAtImpl(char[] a, int index, int limit) {
char c1 = a[index];
if (isHighSurrogate(c1) && ++index < limit) {
char c2 = a[index];
if (isLowSurrogate(c2)) {
return toCodePoint(c1, c2);
}
}
return c1;
}
//...
static int codePointBeforeImpl(char[] a, int index, int start) {
char c2 = a[ index];
if (isLowSurrogate(c2) && index > start) {
char c1 = a[ index];
if (isHighSurrogate(c1)) {
return toCodePoint(c1, c2);
}
}
return c2;
}
# 1 楼答案
一个代码点可以由多个仍然是only 16-bit unicode的
char
组成。在它的基础数组char[] value
的索引中String中给方法的索引,而不是代码点的索引。Character的这些检查边界和换行方法:字符中的相应方法识别并组合属于单个代码点的多个
char
:这种差异很重要,因为
index-1
并不总是前一个代码点的开始;因此codePointBefore()
需要从index-1
开始并向后看,而codePointAt()
需要从index
开始并向前看