Test Report

Model Name: BERTNLU-RuleDST-LaRL

Dataset: multiwoz

Time: 2020-04-27 19:16:35

Overall Results

Success Rate: 33.1 %

(Precision, Recall, F1) : (0.485, 0.560, 0.488)

Average Dialog Turn (Succ): 15.547

Average Dialog Turn (All): 28.734

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3430.3640.2820.4710.3260.45810.09616.711
attraction2940.5950.6190.6930.6310.3509.33718.197
restaurant3770.6180.5210.6430.5540.30810.87616.271
train3280.6340.7460.6760.6960.3549.49018.470
police230.7830.8700.8480.8430.2176.33313.652
hospital300.9000.9000.9000.9000.1003.8527.467
taxi940.0850.1380.1120.1210.8628.75027.872

Domain hotel

Overall Results

Success Rate: 36.4 %

(Precision, Recall, F1) : (0.282, 0.471, 0.326)

Average Dialog Turn (Succ): 10.096

Average Dialog Turn (All): 16.711

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 59.5 %

(Precision, Recall, F1) : (0.619, 0.693, 0.631)

Average Dialog Turn (Succ): 9.337

Average Dialog Turn (All): 18.197

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 61.8 %

(Precision, Recall, F1) : (0.521, 0.643, 0.554)

Average Dialog Turn (Succ): 10.876

Average Dialog Turn (All): 16.271

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 63.4 %

(Precision, Recall, F1) : (0.746, 0.676, 0.696)

Average Dialog Turn (Succ): 9.490

Average Dialog Turn (All): 18.470

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 78.3 %

(Precision, Recall, F1) : (0.870, 0.848, 0.843)

Average Dialog Turn (Succ): 6.333

Average Dialog Turn (All): 13.652

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 90.0 %

(Precision, Recall, F1) : (0.900, 0.900, 0.900)

Average Dialog Turn (Succ): 3.852

Average Dialog Turn (All): 7.467

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain taxi

Overall Results

Success Rate: 8.5 %

(Precision, Recall, F1) : (0.138, 0.112, 0.121)

Average Dialog Turn (Succ): 8.750

Average Dialog Turn (All): 27.872

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing