Test Report

Model Name: BERTNLU-RuleDST-GDPLPolicy-TemplateNLG

Dataset: multiwoz

Time: 2020-04-27 02:32:42

Overall Results

Success Rate: 49.5 %

(Precision, Recall, F1) : (0.670, 0.764, 0.682)

Average Dialog Turn (Succ): 11.479

Average Dialog Turn (All): 24.328

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel4000.3780.4060.4780.4200.5327.33821.630
restaurant3900.6740.5310.7790.6050.3107.65815.036
attraction3020.8810.7640.9200.8070.0996.5419.642
taxi1040.5580.5960.5770.5830.4337.27615.308
train3440.8200.9040.9140.9060.1025.8098.331
police231.0000.5871.0000.7330.0002.0002.000
hospital300.9330.9330.9330.9330.0674.0006.400

Domain hotel

Overall Results

Success Rate: 37.8 %

(Precision, Recall, F1) : (0.406, 0.478, 0.420)

Average Dialog Turn (Succ): 7.338

Average Dialog Turn (All): 21.630

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 67.4 %

(Precision, Recall, F1) : (0.531, 0.779, 0.605)

Average Dialog Turn (Succ): 7.658

Average Dialog Turn (All): 15.036

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 88.1 %

(Precision, Recall, F1) : (0.764, 0.920, 0.807)

Average Dialog Turn (Succ): 6.541

Average Dialog Turn (All): 9.642

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain taxi

Overall Results

Success Rate: 55.8 %

(Precision, Recall, F1) : (0.596, 0.577, 0.583)

Average Dialog Turn (Succ): 7.276

Average Dialog Turn (All): 15.308

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain train

Overall Results

Success Rate: 82.0 %

(Precision, Recall, F1) : (0.904, 0.914, 0.906)

Average Dialog Turn (Succ): 5.809

Average Dialog Turn (All): 8.331

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (0.587, 1.000, 0.733)

Average Dialog Turn (Succ): 2.000

Average Dialog Turn (All): 2.000

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Domain hospital

Overall Results

Success Rate: 93.3 %

(Precision, Recall, F1) : (0.933, 0.933, 0.933)

Average Dialog Turn (Succ): 4.000

Average Dialog Turn (All): 6.400

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing