需要一点代码翻译帮助（Python到C#）

4 投票

2 回答

502 浏览

数据工程师

提问于 2025-04-16 05:34

大家晚上好，

我有点不好意思问这个问题，因为我知道我应该能自己找到答案。不过，我对Python的了解真的很有限，所以我需要一些比我更有经验的人的帮助...

下面的代码来自于Norvig的《自然语言语料库数据》一章，内容是将一个句子“likethisone”转换成“[like, this, one]”（也就是正确地把单词分开）...

我把所有的代码都移植到了C#（实际上是我自己重写的程序），除了一个叫segment的函数，我在理解它的语法时遇到了很大的困难。有没有人能帮我把它翻译成更易读的C#形式呢？

非常感谢大家的帮助。

################ Word Segmentation (p. 223)

@memo
def segment(text):
    "Return a list of words that is the best segmentation of text."
    if not text: return []
    candidates = ([first]+segment(rem) for first,rem in splits(text))
    return max(candidates, key=Pwords)

def splits(text, L=20):
    "Return a list of all possible (first, rem) pairs, len(first)<=L."
    return [(text[:i+1], text[i+1:]) 
            for i in range(min(len(text), L))]

def Pwords(words): 
    "The Naive Bayes probability of a sequence of words."
    return product(Pw(w) for w in words)

#### Support functions (p. 224)

def product(nums):
    "Return the product of a sequence of numbers."
    return reduce(operator.mul, nums, 1)

class Pdist(dict):
    "A probability distribution estimated from counts in datafile."
    def __init__(self, data=[], N=None, missingfn=None):
        for key,count in data:
            self[key] = self.get(key, 0) + int(count)
        self.N = float(N or sum(self.itervalues()))
        self.missingfn = missingfn or (lambda k, N: 1./N)
    def __call__(self, key): 
        if key in self: return self[key]/self.N  
        else: return self.missingfn(key, self.N)

def datafile(name, sep='\t'):
    "Read key,value pairs from file."
    for line in file(name):
        yield line.split(sep)

def avoid_long_words(key, N):
    "Estimate the probability of an unknown word."
    return 10./(N * 10**len(key))

N = 1024908267229 ## Number of tokens

Pw  = Pdist(datafile('count_1w.txt'), N, avoid_long_words)

c# 代码翻译自然语言处理函数重写编程帮助句子分割语法解析代码移植

2 个回答

我对C#一点都不了解，但我可以解释一下这段Python代码是怎么工作的。

@memo
def segment(text):
    "Return a list of words that is the best segmentation of text."
    if not text: return []
    candidates = ([first]+segment(rem) for first,rem in splits(text))
    return max(candidates, key=Pwords)

第一行，

@memo

是一个装饰器。这个装饰器的作用是把后面定义的函数包裹在另一个函数里。装饰器通常用来过滤输入和输出。在这个例子中，根据函数的名字和它的作用，我猜这个函数是用来记忆化对segment的调用。

接下来：

def segment(text):
    "Return a list of words that is the best segmentation of text."
    if not text: return []

这里声明了函数的主体，给出了一个文档字符串，并设置了这个函数递归的终止条件。

接下来是最复杂的一行，可能也是让你困惑的地方：

    candidates = ([first]+segment(rem) for first,rem in splits(text))

外面的括号和for..in结构结合起来，形成了一个生成器表达式。这是一种高效的遍历序列的方法，这里是对splits(text)的遍历。生成器表达式有点像紧凑的for循环，可以逐个产生值。在这个例子中，这些值成为了迭代的元素candidates。生成器表达式类似于列表推导式，但在内存使用上更高效，因为它不会保留每个生成的值。

所以，对于splits(text)返回的每个值，生成器表达式会产生一个列表。

从splits(text)得到的每个值都是一个(first, rem)的组合。

每个生成的列表以对象first开头；这通过把first放在一个列表字面量中来表示，也就是[first]。然后再加上另一个列表；这个第二个列表是通过递归调用segment来确定的。在Python中，列表相加会把它们连接在一起，比如[1, 2] + [3, 4]会得到[1, 2, 3, 4]。

最后，在

    return max(candidates, key=Pwords)

递归生成的列表iteration和一个关键函数被传递给max。这个关键函数会对迭代中的每个值进行调用，以获取用于判断该列表是否在迭代中具有最高值的值。

回答于 2025-04-16 由 Python大师

分享举报

我们先来看看第一个函数：

def segment(text): 
    "Return a list of words that is the best segmentation of text." 
    if not text: return [] 
    candidates = ([first]+segment(rem) for first,rem in splits(text)) 
    return max(candidates, key=Pwords)

这个函数接收一个单词，并返回它最可能的单词列表，所以它的定义是 static IEnumerable<string> segment(string text)。很明显，如果 text 是一个空字符串，那么它的结果应该是一个空列表。否则，它会创建一个递归的列表理解，定义可能的单词候选列表，并根据概率返回最大值。

static IEnumerable<string> segment(string text)
{
    if (text == "") return new string[0]; // C# idiom for empty list of strings
    var candidates = from pair in splits(text)
                     select new[] {pair.Item1}.Concat(segment(pair.Item2));
    return candidates.OrderBy(Pwords).First();
}

当然，现在我们需要翻译 splits 函数。它的工作是返回一个包含单词开始和结束的所有可能组合的列表。这个翻译相对简单：

static IEnumerable<Tuple<string, string>> splits(string text, int L = 20)
{
    return from i in Enumerable.Range(1, Math.Min(text.Length, L))
           select Tuple.Create(text.Substring(0, i), text.Substring(i));
}

接下来是 Pwords，它只是对输入列表中每个单词调用 Pw 的结果，再传给 product 函数：

static double Pwords(IEnumerable<string> words)
{
    return product(from w in words select Pw(w));
}

而 product 函数也很简单：

static double product(IEnumerable<double> nums)
{
    return nums.Aggregate((a, b) => a * b);
}

附录：

从完整的源代码来看，Norvig 似乎希望 segment 函数的结果能够被缓存，以提高速度。这里有一个版本可以实现这个加速：

static Dictionary<string, IEnumerable<string>> segmentTable =
   new Dictionary<string, IEnumerable<string>>();

static IEnumerable<string> segment(string text)
{
    if (text == "") return new string[0]; // C# idiom for empty list of strings
    if (!segmentTable.ContainsKey(text))
    {
        var candidates = from pair in splits(text)
                         select new[] {pair.Item1}.Concat(segment(pair.Item2));
        segmentTable[text] = candidates.OrderBy(Pwords).First().ToList();
    }
    return segmentTable[text];
}

回答于 2025-04-16 由 Python大师

分享举报

需要一点代码翻译帮助（Python到C#）

2 个回答

附录：

撰写回答