Test Report

Model Name: TRADE-RulePolicy-TemplateNLG

Dataset: multiwoz

Time: 2020-04-29 21:10:04

Overall Results

Success Rate: 25.3 %

(Precision, Recall, F1) : (0.493, 0.481, 0.444)

Average Dialog Turn (Succ): 12.735

Average Dialog Turn (All): 24.684

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3300.2850.1670.2860.1880.4739.61716.448
attraction2910.2990.6140.5410.5470.5336.36820.192
restaurant3610.3350.4790.4900.4650.4274.74414.964
train2730.8970.7950.8650.8150.0849.90211.773
police230.0000.0000.0000.0001.0000.00040.000
hospital300.0000.0000.0000.0001.0000.00040.000
taxi460.9351.0000.9670.9780.0655.2097.478

Dialogue Loop

Domain hotel

Overall Results

Success Rate: 28.5 %

(Precision, Recall, F1) : (0.167, 0.286, 0.188)

Average Dialog Turn (Succ): 9.617

Average Dialog Turn (All): 16.448

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 29.9 %

(Precision, Recall, F1) : (0.614, 0.541, 0.547)

Average Dialog Turn (Succ): 6.368

Average Dialog Turn (All): 20.192

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 33.5 %

(Precision, Recall, F1) : (0.479, 0.490, 0.465)

Average Dialog Turn (Succ): 4.744

Average Dialog Turn (All): 14.964

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 89.7 %

(Precision, Recall, F1) : (0.795, 0.865, 0.815)

Average Dialog Turn (Succ): 9.902

Average Dialog Turn (All): 11.773

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 40.000

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 0.0 %

(Precision, Recall, F1) : (0.000, 0.000, 0.000)

Average Dialog Turn (Succ): 0.000

Average Dialog Turn (All): 40.000

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain taxi

Overall Results

Success Rate: 93.5 %

(Precision, Recall, F1) : (1.000, 0.967, 0.978)

Average Dialog Turn (Succ): 5.209

Average Dialog Turn (All): 7.478

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act: Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing