Test Report

Model Name: BERTNLU-RuleDST-PGPolicy-TemplateNLG

Dataset: multiwoz

Time: 2020-04-27 13:55:52

Overall Results

Success Rate: 43.3 %

(Precision, Recall, F1) : (0.619, 0.668, 0.604)

Average Dialog Turn (Succ): 14.693

Average Dialog Turn (All): 29.068

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3450.4000.3230.4650.3530.4907.52221.154
restaurant3830.6030.4970.7130.5560.38911.24721.770
train3140.7100.8100.7900.7900.28012.40418.930
attraction2690.8440.7640.8970.7990.1416.90711.346
taxi860.7210.7670.7440.7520.2677.25812.558
police231.0000.9351.0000.9570.0002.0002.000
hospital300.9330.9330.9330.9330.06738.42938.533

Domain hotel

Overall Results

Success Rate: 40.0 %

(Precision, Recall, F1) : (0.323, 0.465, 0.353)

Average Dialog Turn (Succ): 7.522

Average Dialog Turn (All): 21.154

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 60.3 %

(Precision, Recall, F1) : (0.497, 0.713, 0.556)

Average Dialog Turn (Succ): 11.247

Average Dialog Turn (All): 21.770

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 71.0 %

(Precision, Recall, F1) : (0.810, 0.790, 0.790)

Average Dialog Turn (Succ): 12.404

Average Dialog Turn (All): 18.930

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain attraction

Overall Results

Success Rate: 84.4 %

(Precision, Recall, F1) : (0.764, 0.897, 0.799)

Average Dialog Turn (Succ): 6.907

Average Dialog Turn (All): 11.346

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain taxi

Overall Results

Success Rate: 72.1 %

(Precision, Recall, F1) : (0.767, 0.744, 0.752)

Average Dialog Turn (Succ): 7.258

Average Dialog Turn (All): 12.558

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (0.935, 1.000, 0.957)

Average Dialog Turn (Succ): 2.000

Average Dialog Turn (All): 2.000

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Domain hospital

Overall Results

Success Rate: 93.3 %

(Precision, Recall, F1) : (0.933, 0.933, 0.933)

Average Dialog Turn (Succ): 38.429

Average Dialog Turn (All): 38.533

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing