Test Report

Model Name: BERTNLU-RuleDST-HDSA

Dataset: multiwoz

Time: 2020-04-26 07:28:10

Overall Results

Success Rate: 27.5 %

(Precision, Recall, F1) : (0.478, 0.572, 0.488)

Average Dialog Turn (Succ): 12.996

Average Dialog Turn (All): 31.536

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3270.3330.3620.5210.4080.5997.32120.856
attraction2850.4420.5920.5710.5620.54010.60325.046
restaurant3680.5350.4660.7140.5360.4546.57916.848
train3140.4590.6170.5960.5930.52511.47222.656
police230.8700.9200.9130.9040.13010.70014.522
hospital300.8670.8670.8670.8670.1338.00012.267
taxi760.3420.6840.5130.5700.6322.15418.763

Domain hotel

Overall Results

Success Rate: 33.3 %

(Precision, Recall, F1) : (0.362, 0.521, 0.408)

Average Dialog Turn (Succ): 7.321

Average Dialog Turn (All): 20.856

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 44.2 %

(Precision, Recall, F1) : (0.592, 0.571, 0.562)

Average Dialog Turn (Succ): 10.603

Average Dialog Turn (All): 25.046

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 53.5 %

(Precision, Recall, F1) : (0.466, 0.714, 0.536)

Average Dialog Turn (Succ): 6.579

Average Dialog Turn (All): 16.848

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 45.9 %

(Precision, Recall, F1) : (0.617, 0.596, 0.593)

Average Dialog Turn (Succ): 11.472

Average Dialog Turn (All): 22.656

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 87.0 %

(Precision, Recall, F1) : (0.920, 0.913, 0.904)

Average Dialog Turn (Succ): 10.700

Average Dialog Turn (All): 14.522

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 86.7 %

(Precision, Recall, F1) : (0.867, 0.867, 0.867)

Average Dialog Turn (Succ): 8.000

Average Dialog Turn (All): 12.267

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain taxi

Overall Results

Success Rate: 34.2 %

(Precision, Recall, F1) : (0.684, 0.513, 0.570)

Average Dialog Turn (Succ): 2.154

Average Dialog Turn (All): 18.763

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing