如何计算 Python 中的 BLEU 分数

作者：Jayant Verma

Python 中的 Bleu 分数是一个指标，用于衡量机器翻译模型的能力。虽然最初仅用于翻译模型上，Bleu 现在也可以用于其它处理自然语言的应用程序。

Bleu 分数将候选句与一句或多句参考句进行比较，给出其与参考句子列表的匹配程度。Bleu 输出一个介于0到1之间的分数。

BLEU 分数为1意味着候选句可以与参考句之一完美匹配。

BLEU 分数也常用做衡量图像字幕模型的指标。

在本教程中，我们将使用 nltk 库中的 sentence_bleu() 函数计算 BLEU 分数。让我们开始吧。

在 Python 中计算 Bleu 分数

为了计算 Bleu 分数，我们需要以字符（token）的格式提供参考句与候选句。

以下部分将介绍如何做到这一点，以及如何计算分数。让我们从导入必要模块开始。

from nltk.translate.bleu_score import sentence_bleu

现在我们可以以列表的形式输入参考句子了。但在将句子传入 sentence_bleu() 函数前，我们还需从中创建字符（token）。

输入并分割句子

参考句列表中中的句子有：

'this is a dog'
    'it is dog
    'dog it is'
    'a dog, it is'

我们可以用 split 函数将它们分割成字符。

reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split()
]
print(reference)

输出：

[['this', 'is', 'a', 'dog'], ['it', 'is', 'dog'], ['dog', 'it', 'is'], ['a', 'dog,', 'it', 'is']]

这就是字符形式的句子。现在我们可以调用 sentence_bleu() 函数来计算分数了。

在 Python 中计算 BLEU 分数

使用以下代码计算分数：

candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

输出：

BLEU score -> 1.0

因为候选句就是参考句中的一句，我们得到了满分1分。让我们来试试另一句。

candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

输出：

BLEU score -> 0.8408964152537145

在参考句中有一句与其类似，但二者并非一模一样。所以这一句的得分是0.84。

在 Python 中应用 BLEU 分数的完整代码

以上部分的完整代码如下：

from nltk.translate.bleu_score import sentence_bleu
reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split()
]
candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate )))

candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

计算 n-gram 分数

在匹配句子时，您可以选择模型一次匹配的单词数。例如，您可以一次匹配一个单词（1-gram）、一次匹配两个单词（2-gram）或三个单词（3-gram）。

以下部分将介绍如何计算这些 n-gram 分数。

您可以将对应各个 gram 的权值作为参数传入 sentence_bleu() 函数中。

例如，您可以用以下权值分别计算 gram 分数。

Individual 1-gram: (1, 0, 0, 0)
Individual 2-gram: (0, 1, 0, 0).
Individual 3-gram: (1, 0, 1, 0).
Individual 4-gram: (0, 0, 0, 1).

用于计算的 Python 代码如下所示：

from nltk.translate.bleu_score import sentence_bleu
reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split()
]
candidate = 'it is a dog'.split()

print('Individual 1-gram: %f' % sentence_bleu(reference, candidate, weights=(1, 0, 0, 0)))
print('Individual 2-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 1, 0, 0)))
print('Individual 3-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 0, 1, 0)))
print('Individual 4-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 0, 0, 1)))

输出：

Individual 1-gram: 1.000000
Individual 2-gram: 1.000000
Individual 3-gram: 0.500000
Individual 4-gram: 1.000000

默认情况下，sentence_bleu() 函数计算的是 4-gram BLEU 分数的累计值，也称为 BLEU-4。BLEU-4的权值如下：

(0.25, 0.25, 0.25, 0.25)

让我们来看看 BLEU-4 的代码：

score = sentence_bleu(reference, candidate, weights=(0.25, 0.25, 0.25, 0.25))
print(score)

输出：

0.8408964152537145

这正是之前没有输入 n-gram 权值时我们得到的分数。

结语

本教程介绍了如何在 Python 中计算 BLEU 分数。通过本教程，您可以了解什么是 BLEU 分数，以及如何计算单个和累积的 n-gram Bleu 分数。希望您学得愉快！

如何计算 Python 中的 BLEU 分数​

在 Python 中计算 Bleu 分数​

结语​

如何计算 Python 中的 BLEU 分数

在 Python 中计算 Bleu 分数

结语