Test Report

Model Name: MDBT_RulePolicy_TemplateNLG

Dataset: multiwoz

Time: 2020-04-28 15:55:48

Overall Results

Success Rate: 21.2 %

(Precision, Recall, F1) : (0.522, 0.410, 0.424)

Average Dialog Turn (Succ): 11.802

Average Dialog Turn (All): 32.096

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3340.2250.1710.1730.1630.5999.57324.581
restaurant3560.5280.6180.5810.5810.3885.26617.938
attraction2980.1740.4990.3460.3880.7798.57730.779
train2670.8280.8800.9010.8870.1057.9008.172
police230.0000.0000.0000.0001.0000.00040.000
hospital300.0670.0670.0670.0670.9332.00037.467
taxi510.0000.0000.0000.0000.8820.00035.294

Domain hotel

Overall Results

Success Rate: 22.5 %

(Precision, Recall, F1) : (0.171, 0.173, 0.163)

Average Dialog Turn (Succ): 9.573

Average Dialog Turn (All): 24.581

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 52.8 %

(Precision, Recall, F1) : (0.618, 0.581, 0.581)

Average Dialog Turn (Succ): 5.266

Average Dialog Turn (All): 17.938

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 17.4 %

(Precision, Recall, F1) : (0.499, 0.346, 0.388)

Average Dialog Turn (Succ): 8.577

Average Dialog Turn (All): 30.779

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 82.8 %

(Precision, Recall, F1) : (0.880, 0.901, 0.887)

Average Dialog Turn (Succ): 7.900

Average Dialog Turn (All): 8.172

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 40.000

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 6.7 %

(Precision, Recall, F1) : (0.067, 0.067, 0.067)

Average Dialog Turn (Succ): 2.000

Average Dialog Turn (All): 37.467

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain taxi

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 35.294

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing