Test Report

Model Name: test

Dataset: multiwoz

Time: 2020-04-26 12:42:46

Overall Results

Success Rate: 41.0 %

(Precision, Recall, F1) : (0.685, 0.565, 0.591)

Average Dialog Turn (Succ): 11.624

Average Dialog Turn (All): 29.224

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3510.3280.3070.2620.2730.65510.83527.875
restaurant3550.7800.6330.6940.6530.2144.93112.039
attraction2800.5290.7420.6700.6930.4716.29721.229
taxi581.0001.0001.0001.0000.0004.5524.552
train3140.5830.7550.6440.6820.4045.95618.510
police230.1740.7390.4570.5510.8264.00033.739
hospital301.0001.0001.0001.0000.0004.0004.000

Dialogue Loop

Domain hotel

Overall Results

Success Rate: 32.8 %

(Precision, Recall, F1) : (0.307, 0.262, 0.273)

Average Dialog Turn (Succ): 10.835

Average Dialog Turn (All): 27.875

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 78.0 %

(Precision, Recall, F1) : (0.633, 0.694, 0.653)

Average Dialog Turn (Succ): 4.931

Average Dialog Turn (All): 12.039

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 52.9 %

(Precision, Recall, F1) : (0.742, 0.670, 0.693)

Average Dialog Turn (Succ): 6.297

Average Dialog Turn (All): 21.229

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain taxi

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (1.000, 1.000, 1.000)

Average Dialog Turn (Succ): 4.552

Average Dialog Turn (All): 4.552

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing

Domain train

Overall Results

Success Rate: 58.3 %

(Precision, Recall, F1) : (0.755, 0.644, 0.682)

Average Dialog Turn (Succ): 5.956

Average Dialog Turn (All): 18.510

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 17.4 %

(Precision, Recall, F1) : (0.739, 0.457, 0.551)

Average Dialog Turn (Succ): 4.000

Average Dialog Turn (All): 33.739

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 100.0 %

(Precision, Recall, F1) : (1.000, 1.000, 1.000)

Average Dialog Turn (Succ): 4.000

Average Dialog Turn (All): 4.000

System NLU Failed Dialog Act: User NLU Failed Dialog Act: Dialog Loop

Nothing

Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act

Nothing

Inform But Not Request Dialog Act

Nothing