Test Report

Model Name: BERTNLU-RuleDST-MDRG

Dataset: multiwoz

Time: 2020-04-28 06:00:57

Overall Results

Success Rate: 25.2 %

(Precision, Recall, F1) : (0.466, 0.431, 0.420)

Average Dialog Turn (Succ): 13.635

Average Dialog Turn (All): 33.580

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3320.4490.3060.4140.3390.4977.96020.464
attraction2850.5120.6230.6030.5920.46711.08221.551
restaurant3780.7060.5050.6520.5520.2886.41215.106
train3290.3770.5150.4110.4450.6148.00025.088
police230.0000.0000.0000.0001.0000.00040.000
hospital300.0000.0000.0000.0001.0000.00040.000
taxi740.0000.0000.0000.0000.9460.00030.973

Domain hotel

Overall Results

Success Rate: 44.9 %

(Precision, Recall, F1) : (0.306, 0.414, 0.339)

Average Dialog Turn (Succ): 7.960

Average Dialog Turn (All): 20.464

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 51.2 %

(Precision, Recall, F1) : (0.623, 0.603, 0.592)

Average Dialog Turn (Succ): 11.082

Average Dialog Turn (All): 21.551

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 70.6 %

(Precision, Recall, F1) : (0.505, 0.652, 0.552)

Average Dialog Turn (Succ): 6.412

Average Dialog Turn (All): 15.106

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 37.7 %

(Precision, Recall, F1) : (0.515, 0.411, 0.445)

Average Dialog Turn (Succ): 8.000

Average Dialog Turn (All): 25.088

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 40.000

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 40.000

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain taxi

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 30.973

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing