Test Report

Model Name: Sequicity

Dataset: multiwoz

Time: 2020-04-26 00:56:51

Overall Results

Success Rate: 10.5 %

(Precision, Recall, F1) : (0.414, 0.308, 0.313)

Average Dialog Turn (Succ): 12.876

Average Dialog Turn (All): 38.320

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel2440.1640.2720.3280.2750.80310.55031.336
restaurant3320.2650.4940.4640.4560.7206.40929.229
train2510.1910.3450.2720.2950.78915.50032.430
attraction2340.1620.6350.4300.4880.8335.84233.103
police230.0870.0870.0870.0870.91316.00037.913
hospital300.8670.8670.8670.8670.1337.69212.000
taxi440.0000.0000.0000.0001.0000.00040.000

Domain hotel

Overall Results

Success Rate: 16.4 %

(Precision, Recall, F1) : (0.272, 0.328, 0.275)

Average Dialog Turn (Succ): 10.550

Average Dialog Turn (All): 31.336

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 26.5 %

(Precision, Recall, F1) : (0.494, 0.464, 0.456)

Average Dialog Turn (Succ): 6.409

Average Dialog Turn (All): 29.229

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 19.1 %

(Precision, Recall, F1) : (0.345, 0.272, 0.295)

Average Dialog Turn (Succ): 15.500

Average Dialog Turn (All): 32.430

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain attraction

Overall Results

Success Rate: 16.2 %

(Precision, Recall, F1) : (0.635, 0.430, 0.488)

Average Dialog Turn (Succ): 5.842

Average Dialog Turn (All): 33.103

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain police

Overall Results

Success Rate: 8.7 %

(Precision, Recall, F1) : (0.087, 0.087, 0.087)

Average Dialog Turn (Succ): 16.000

Average Dialog Turn (All): 37.913

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 86.7 %

(Precision, Recall, F1) : (0.867, 0.867, 0.867)

Average Dialog Turn (Succ): 7.692

Average Dialog Turn (All): 12.000

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain taxi

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 40.000

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing