Test Report

Model Name: BERTNLU-RuleDST-PPOPolicy-TemplateNLG

Dataset: multiwoz

Time: 2020-04-27 02:25:52

Overall Results

Success Rate: 56.6 %

(Precision, Recall, F1) : (0.648, 0.790, 0.681)

Average Dialog Turn (Succ): 12.880

Average Dialog Turn (All): 22.110

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel4020.5000.3570.5150.3910.3437.85116.343
restaurant4020.6770.5340.7390.5940.2968.21314.756
attraction3090.8830.7530.9200.8000.1006.3889.502
taxi1210.8020.8840.8430.8570.1747.0529.868
train3490.8570.9000.9190.9070.1297.3918.395
police231.0000.9351.0000.9570.0002.0002.000
hospital300.9330.9330.9330.9330.0672.0004.533

Domain hotel

Overall Results

Success Rate: 50.0 %

(Precision, Recall, F1) : (0.357, 0.515, 0.391)

Average Dialog Turn (Succ): 7.851

Average Dialog Turn (All): 16.343

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 67.7 %

(Precision, Recall, F1) : (0.534, 0.739, 0.594)

Average Dialog Turn (Succ): 8.213

Average Dialog Turn (All): 14.756

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 88.3 %

(Precision, Recall, F1) : (0.753, 0.920, 0.800)

Average Dialog Turn (Succ): 6.388

Average Dialog Turn (All): 9.502

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain taxi

Overall Results

Success Rate: 80.2 %

(Precision, Recall, F1) : (0.884, 0.843, 0.857)

Average Dialog Turn (Succ): 7.052

Average Dialog Turn (All): 9.868

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain train

Overall Results

Success Rate: 85.7 %

(Precision, Recall, F1) : (0.900, 0.919, 0.907)

Average Dialog Turn (Succ): 7.391

Average Dialog Turn (All): 8.395

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (0.935, 1.000, 0.957)

Average Dialog Turn (Succ): 2.000

Average Dialog Turn (All): 2.000

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Domain hospital

Overall Results

Success Rate: 93.3 %

(Precision, Recall, F1) : (0.933, 0.933, 0.933)

Average Dialog Turn (Succ): 2.000

Average Dialog Turn (All): 4.533

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing