Test Report

Model Name: SVMNLU-RuleDST-RulePolicy-TemplateNLG

Dataset: multiwoz

Time: 2020-04-24 22:13:40

Overall Results

Success Rate: 70.4 %

(Precision, Recall, F1) : (0.791, 0.888, 0.815)

Average Dialog Turn (Succ): 14.759

Average Dialog Turn (All): 17.736

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel4250.6050.5390.6470.5630.1227.2308.551
attraction3330.9310.8910.9400.9070.0156.3616.541
taxi1670.8440.8860.8650.8720.1567.36210.060
restaurant4100.8490.7710.8070.7810.0545.8916.941
train3890.8950.8250.8790.8420.05411.24712.036
police231.0001.0001.0001.0000.0005.6525.652
hospital301.0001.0001.0001.0000.0004.0674.067

Domain hotel

Overall Results

Success Rate: 60.5 %

(Precision, Recall, F1) : (0.539, 0.647, 0.563)

Average Dialog Turn (Succ): 7.230

Average Dialog Turn (All): 8.551

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 93.1 %

(Precision, Recall, F1) : (0.891, 0.940, 0.907)

Average Dialog Turn (Succ): 6.361

Average Dialog Turn (All): 6.541

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain taxi

Overall Results

Success Rate: 84.4 %

(Precision, Recall, F1) : (0.886, 0.865, 0.872)

Average Dialog Turn (Succ): 7.362

Average Dialog Turn (All): 10.060

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain restaurant

Overall Results

Success Rate: 84.9 %

(Precision, Recall, F1) : (0.771, 0.807, 0.781)

Average Dialog Turn (Succ): 5.891

Average Dialog Turn (All): 6.941

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 89.5 %

(Precision, Recall, F1) : (0.825, 0.879, 0.842)

Average Dialog Turn (Succ): 11.247

Average Dialog Turn (All): 12.036

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (1.000, 1.000, 1.000)

Average Dialog Turn (Succ): 5.652

Average Dialog Turn (All): 5.652

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (1.000, 1.000, 1.000)

Average Dialog Turn (Succ): 4.067

Average Dialog Turn (All): 4.067

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing