Test Report

Model Name: MILU-RuleDST-RulePolicy-TemplateNLG

Dataset: multiwoz

Time: 2020-04-24 21:12:47

Overall Results

Success Rate: 83.1 %

(Precision, Recall, F1) : (0.783, 0.917, 0.825)

Average Dialog Turn (Succ): 12.144

Average Dialog Turn (All): 13.854

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel4290.7510.4820.6490.5260.0585.5966.410
restaurant4110.9540.8010.8500.8180.0324.9495.781
attraction3310.9400.8780.9530.9050.0275.6336.278
taxi1730.8790.9080.8930.8980.1217.0269.399
train3890.9950.8400.9280.8670.0055.7675.779
police230.9130.9130.9130.9130.0875.5248.522
hospital301.0001.0001.0001.0000.0004.0674.067

Domain hotel

Overall Results

Success Rate: 75.1 %

(Precision, Recall, F1) : (0.482, 0.649, 0.526)

Average Dialog Turn (Succ): 5.596

Average Dialog Turn (All): 6.410

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 95.4 %

(Precision, Recall, F1) : (0.801, 0.850, 0.818)

Average Dialog Turn (Succ): 4.949

Average Dialog Turn (All): 5.781

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 94.0 %

(Precision, Recall, F1) : (0.878, 0.953, 0.905)

Average Dialog Turn (Succ): 5.633

Average Dialog Turn (All): 6.278

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain taxi

Overall Results

Success Rate: 87.9 %

(Precision, Recall, F1) : (0.908, 0.893, 0.898)

Average Dialog Turn (Succ): 7.026

Average Dialog Turn (All): 9.399

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain train

Overall Results

Success Rate: 99.5 %

(Precision, Recall, F1) : (0.840, 0.928, 0.867)

Average Dialog Turn (Succ): 5.767

Average Dialog Turn (All): 5.779

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 91.3 %

(Precision, Recall, F1) : (0.913, 0.913, 0.913)

Average Dialog Turn (Succ): 5.524

Average Dialog Turn (All): 8.522

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (1.000, 1.000, 1.000)

Average Dialog Turn (Succ): 4.067

Average Dialog Turn (All): 4.067

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing