在代码翻译方面需要一些帮助（从Python到C#）问题的回答

在代码翻译方面需要一些帮助（从Python到C#）

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

大家晚安 这个问题让我有点尴尬，因为，我知道我应该可以独自得到答案。不过，我对Python的了解还远远不够，所以我需要比我更有经验的人来帮助我。。。在 下面的代码来自最近编辑的一本书中的<a href="http://norvig.com/ngrams" rel="nofollow">Norvig's "Natural Language Corpus Data"</a>一章，它是关于将一个句子“likethisone”转换成“[像，这个，一个]”（意思是，正确地分割单词）。。。在 除了函数<code>segment</code>之外，我已经将所有代码移植到C#（事实上，我自己重新编写了这个程序），我甚至在试图理解它的语法时也遇到了很多麻烦。有人能帮我把它翻译成一个更易读的C格式吗？在 事先非常感谢。在 <pre><code>################ Word Segmentation (p. 223) @memo def segment(text): "Return a list of words that is the best segmentation of text." if not text: return [] candidates = ([first]+segment(rem) for first,rem in splits(text)) return max(candidates, key=Pwords) def splits(text, L=20): "Return a list of all possible (first, rem) pairs, len(first)<=L." return [(text[:i+1], text[i+1:]) for i in range(min(len(text), L))] def Pwords(words): "The Naive Bayes probability of a sequence of words." return product(Pw(w) for w in words) #### Support functions (p. 224) def product(nums): "Return the product of a sequence of numbers." return reduce(operator.mul, nums, 1) class Pdist(dict): "A probability distribution estimated from counts in datafile." def __init__(self, data=[], N=None, missingfn=None): for key,count in data: self[key] = self.get(key, 0) + int(count) self.N = float(N or sum(self.itervalues())) self.missingfn = missingfn or (lambda k, N: 1./N) def __call__(self, key): if key in self: return self[key]/self.N else: return self.missingfn(key, self.N) def datafile(name, sep='\t'): "Read key,value pairs from file." for line in file(name): yield line.split(sep) def avoid_long_words(key, N): "Estimate the probability of an unknown word." return 10./(N * 10**len(key)) N = 1024908267229 ## Number of tokens Pw = Pdist(datafile('count_1w.txt'), N, avoid_long_words) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

让我们先来处理第一个函数： <pre><code>def segment(text): "Return a list of words that is the best segmentation of text." if not text: return [] candidates = ([first]+segment(rem) for first,rem in splits(text)) return max(candidates, key=Pwords) </code></pre> 它接受一个单词并返回它可能是的最可能的单词列表，因此它的签名将是<code>static IEnumerable<string> segment(string text)</code>。显然，如果<code>text</code>是一个空字符串，那么它的结果应该是一个空列表。否则，它创建一个递归列表理解，定义可能的候选单词列表，并根据其概率返回最大值。在 ^{pr2}$ 当然，现在我们要翻译<code>splits</code>函数。它的任务是返回一个单词开头和结尾的所有可能元组的列表。翻译起来相当简单： <pre><code>static IEnumerable<Tuple<string, string>> splits(string text, int L = 20) { return from i in Enumerable.Range(1, Math.Min(text.Length, L)) select Tuple.Create(text.Substring(0, i), text.Substring(i)); } </code></pre> 接下来是<code>Pwords</code>，它只是对输入列表中每个单词的<code>Pw</code>的结果调用<code>product</code>函数： <pre><code>static double Pwords(IEnumerable<string> words) { return product(from w in words select Pw(w)); } </code></pre> 而且<code>product</code>非常简单： <pre><code>static double product(IEnumerable<double> nums) { return nums.Aggregate((a, b) => a * b); } </code></pre> <h2>附录：</h2> 查看完整的源代码，很明显，Norvig打算将<code>segment</code>函数的结果存储起来以提高速度。以下是提供这种加速的版本： <pre><code>static Dictionary<string, IEnumerable<string>> segmentTable = new Dictionary<string, IEnumerable<string>>(); static IEnumerable<string> segment(string text) { if (text == "") return new string[0]; // C# idiom for empty list of strings if (!segmentTable.ContainsKey(text)) { var candidates = from pair in splits(text) select new[] {pair.Item1}.Concat(segment(pair.Item2)); segmentTable[text] = candidates.OrderBy(Pwords).First().ToList(); } return segmentTable[text]; } </code></pre>

在代码翻译方面需要一些帮助（从Python到C#）

1 个回答

相关Python问题