java如何使用正则表达式替换字符串的一部分

3 周，4 日 Questions & Answers 910

我不是正则表达式的初学者，但它们在perl中的使用似乎与在Java中的使用略有不同

总之，我基本上有一本速记单词及其定义的字典。我想反复阅读字典中的单词，并用它们的意思替换它们。在JAVA中实现这一点的最佳方法是什么

我见过绳子。replaceAll（），字符串。replace（），以及Pattern/Matcher类。我希望做一个不区分大小写的替换，大致如下：

word =~ s/\s?\Q$short_word\E\s?/ \Q$short_def\E /sig

当我这么做时，你认为最好是从字符串中提取所有单词，然后应用我的字典，还是只对字符串应用字典？我知道我需要小心，因为这些速记词可能与其他速记词的部分意思相匹配

希望这一切都有意义

谢谢

澄清：

字典是这样的：哈哈大笑，罗福：在地板上滚来滚去，笑个不停

字符串是：哈哈，我是罗夫

替换文本：大声笑，我在地板上打滚大笑

请注意，ll并没有添加到任何地方

共 (3) 个答案

# 1 楼答案

如果您坚持使用正则表达式，这将起作用（采用Zoltan Balazs的字典映射方法）：

Map<String, String> substitutions = loadDictionaryFromSomewhere();
int lengthOfShortestKeyInMap = 3; //Calculate
int lengthOfLongestKeyInMap = 3; //Calculate

StringBuffer output = new StringBuffer(input.length());
Pattern pattern = Pattern.compile("\\b(\\w{" + lengthOfShortestKeyInMap + "," + lengthOfLongestKeyInMap + "})\\b");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
    String candidate = matcher.group(1);
    String substitute = substitutions.get(candidate);
    if (substitute == null)
        substitute = candidate; // no match, use original
    matcher.appendReplacement(output, Matcher.quoteReplacement(substitute));
}
matcher.appendTail(output);
// output now contains the text with substituted words

如果您计划处理许多输入，那么预编译模式比使用String.split()更有效，后者在每次调用时编译一个新的Pattern

（编辑）将所有键编译为单个模式会产生更高效的方法，如下所示：

Pattern pattern = Pattern.compile("\\b(lol|rtfm|rofl|wtf)\\b");
// rest of the method unchanged, don't need the shortest/longest key stuff

这允许正则表达式引擎跳过任何恰好足够短但不在列表中的单词，从而节省了大量地图访问

# 2 楼答案

我想到的第一件事是：

...
// eg: lol -> laugh out loud
Map<String, String> dictionatry;

ArrayList<String> originalText;
ArrayList<String> replacedText;

for(String string : originalText) {
   if(dictionary.contains(string)) {
      replacedText.add(dictionary.get(string));
   } else {
      replacedText.add(string);
   }
...

或者您可以使用StringBuffer而不是replacedText

# 3 楼答案
危险在于正常词汇中的误报。“摔倒”！=“费利克斯柠檬”

一种方法是在空白处拆分单词（需要保留多个空格吗？）然后在列表上循环执行上面的'if contains（）{replace}else{output original}思想

我的输出类将是StringBuffer
```
StringBuffer outputBuffer = new StringBuffer();
for(String s: split(inputText)) {
   outputBuffer.append(  dictionary.contains(s) ? dictionary.get(s) : s); 
   }
```
使拆分方法足够智能，以返回单词分隔符：
```
split("now is the  time") -> now,<space>,is,<space>,the,<space><space>,time
```
这样你就不必担心保留空白了——上面的循环只会将任何不是字典单词的东西附加到StringBuffer中

这是最近一篇关于retaining delimiters when regexing的SO帖子

Python中文网

有 Java 编程相关的问题?

java如何使用正则表达式替换字符串的一部分

共 (3) 个答案

# 1 楼答案

# 2 楼答案

# 3 楼答案