Test Report

Model Name: BERTNLU-RuleDST-RulePolicy-TemplateNLG

Dataset: multiwoz

Time: 2020-04-24 21:20:09

Overall Results

Success Rate: 85.5 %

(Precision, Recall, F1) : (0.798, 0.928, 0.838)

Average Dialog Turn (Succ): 12.681

Average Dialog Turn (All): 13.814

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel4300.7580.5040.6690.5460.0515.8906.586
attraction3340.9370.8880.9500.9100.0215.9746.431
taxi1750.9941.0000.9970.9980.0066.1156.149
restaurant4130.9590.7920.8400.8080.0365.1166.223
train3870.9950.8450.9280.8690.0056.4786.491
police231.0001.0001.0001.0000.0004.0874.087
hospital301.0001.0001.0001.0000.0004.0674.067

Dialogue Loop

Domain hotel

Overall Results

Success Rate: 75.8 %

(Precision, Recall, F1) : (0.504, 0.669, 0.546)

Average Dialog Turn (Succ): 5.890

Average Dialog Turn (All): 6.586

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 93.7 %

(Precision, Recall, F1) : (0.888, 0.950, 0.910)

Average Dialog Turn (Succ): 5.974

Average Dialog Turn (All): 6.431

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain taxi

Overall Results

Success Rate: 99.4 %

(Precision, Recall, F1) : (1.000, 0.997, 0.998)

Average Dialog Turn (Succ): 6.115

Average Dialog Turn (All): 6.149

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain restaurant

Overall Results

Success Rate: 95.9 %

(Precision, Recall, F1) : (0.792, 0.840, 0.808)

Average Dialog Turn (Succ): 5.116

Average Dialog Turn (All): 6.223

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 99.5 %

(Precision, Recall, F1) : (0.845, 0.928, 0.869)

Average Dialog Turn (Succ): 6.478

Average Dialog Turn (All): 6.491

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (1.000, 1.000, 1.000)

Average Dialog Turn (Succ): 4.087

Average Dialog Turn (All): 4.087

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (1.000, 1.000, 1.000)

Average Dialog Turn (Succ): 4.067

Average Dialog Turn (All): 4.067

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing