Test Report

Model Name: SUMBT-RulePolicy-TemplateNLG

Dataset: multiwoz

Time: 2020-04-27 11:53:17

Overall Results

Success Rate: 33.8 %

(Precision, Recall, F1) : (0.523, 0.506, 0.473)

Average Dialog Turn (Succ): 12.089

Average Dialog Turn (All): 26.552

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3380.3400.1740.2520.1840.45910.10420.107
restaurant3680.5540.5500.5550.5310.3995.00017.685
attraction2990.3110.5690.5160.5140.5457.52721.993
train2780.9460.8090.8910.8330.0227.2027.647
taxi460.8911.0000.9460.9640.1093.4637.435
police230.0000.0000.0000.0001.0000.00040.000
hospital300.3670.3670.3670.3670.63318.36432.067

Domain hotel

Overall Results

Success Rate: 34.0 %

(Precision, Recall, F1) : (0.174, 0.252, 0.184)

Average Dialog Turn (Succ): 10.104

Average Dialog Turn (All): 20.107

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 55.4 %

(Precision, Recall, F1) : (0.550, 0.555, 0.531)

Average Dialog Turn (Succ): 5.000

Average Dialog Turn (All): 17.685

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 31.1 %

(Precision, Recall, F1) : (0.569, 0.516, 0.514)

Average Dialog Turn (Succ): 7.527

Average Dialog Turn (All): 21.993

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 94.6 %

(Precision, Recall, F1) : (0.809, 0.891, 0.833)

Average Dialog Turn (Succ): 7.202

Average Dialog Turn (All): 7.647

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain taxi

Overall Results

Success Rate: 89.1 %

(Precision, Recall, F1) : (1.000, 0.946, 0.964)

Average Dialog Turn (Succ): 3.463

Average Dialog Turn (All): 7.435

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 40.000

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 36.7 %

(Precision, Recall, F1) : (0.367, 0.367, 0.367)

Average Dialog Turn (Succ): 18.364

Average Dialog Turn (All): 32.067

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing