Test Report

Model Name: DAMD

Dataset: multiwoz

Time: 2020-04-26 13:13:21

Overall Results

Success Rate: 33.6 %

(Precision, Recall, F1) : (0.621, 0.607, 0.574)

Average Dialog Turn (Succ): 10.440

Average Dialog Turn (All): 28.206

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3290.5870.5660.6680.5970.1856.0939.909
attraction2890.7160.7460.8020.7540.1976.25112.097
restaurant3810.9210.6930.8320.7370.0715.5167.507
taxi1290.1400.5370.3410.4060.84510.88926.388
train3730.2140.2930.3210.2720.76918.15032.048
police230.7830.7830.7830.7830.2173.55611.478
hospital300.5330.5330.5330.5330.4677.25022.533

Dialogue Loop

Domain hotel

Overall Results

Success Rate: 58.7 %

(Precision, Recall, F1) : (0.566, 0.668, 0.597)

Average Dialog Turn (Succ): 6.093

Average Dialog Turn (All): 9.909

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 71.6 %

(Precision, Recall, F1) : (0.746, 0.802, 0.754)

Average Dialog Turn (Succ): 6.251

Average Dialog Turn (All): 12.097

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 92.1 %

(Precision, Recall, F1) : (0.693, 0.832, 0.737)

Average Dialog Turn (Succ): 5.516

Average Dialog Turn (All): 7.507

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain taxi

Overall Results

Success Rate: 14.0 %

(Precision, Recall, F1) : (0.537, 0.341, 0.406)

Average Dialog Turn (Succ): 10.889

Average Dialog Turn (All): 26.388

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain train

Overall Results

Success Rate: 21.4 %

(Precision, Recall, F1) : (0.293, 0.321, 0.272)

Average Dialog Turn (Succ): 18.150

Average Dialog Turn (All): 32.048

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 78.3 %

(Precision, Recall, F1) : (0.783, 0.783, 0.783)

Average Dialog Turn (Succ): 3.556

Average Dialog Turn (All): 11.478

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 53.3 %

(Precision, Recall, F1) : (0.533, 0.533, 0.533)

Average Dialog Turn (Succ): 7.250

Average Dialog Turn (All): 22.533

System NLU Failed Dialog Act:

Nothing

User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing