Test Report

Model Name: BERTNLU-RuleDST-HDSA

Dataset: multiwoz

Time: 2020-04-26 00:12:41

Overall Results

Success Rate: 34.0 %

(Precision, Recall, F1) : (0.478, 0.541, 0.476)

Average Dialog Turn (Succ): 15.000

Average Dialog Turn (All): 28.600

Metric

 Total NumSucc RatePrecisionRecallF1Dialog Loop Failed RateDialog Turn (Succ)Dialog Turn (All)
hotel3340.3770.2770.4480.3170.4409.85717.437
attraction2850.6140.6230.7190.6420.3378.65117.684
restaurant3770.5840.4820.5590.5020.3479.61817.416
train3200.6310.7540.6700.6970.3629.35618.812
police230.8700.9130.9130.9010.1309.70013.652
hospital300.9330.9330.9330.9330.0673.2865.733
taxi880.1360.1820.1590.1670.8308.16726.500

Dialogue Loop

Domain hotel

Overall Results

Success Rate: 37.7 %

(Precision, Recall, F1) : (0.277, 0.448, 0.317)

Average Dialog Turn (Succ): 9.857

Average Dialog Turn (All): 17.437

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain attraction

Overall Results

Success Rate: 61.4 %

(Precision, Recall, F1) : (0.623, 0.719, 0.642)

Average Dialog Turn (Succ): 8.651

Average Dialog Turn (All): 17.684

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain restaurant

Overall Results

Success Rate: 58.4 %

(Precision, Recall, F1) : (0.482, 0.559, 0.502)

Average Dialog Turn (Succ): 9.618

Average Dialog Turn (All): 17.416

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Domain train

Overall Results

Success Rate: 63.1 %

(Precision, Recall, F1) : (0.754, 0.670, 0.697)

Average Dialog Turn (Succ): 9.356

Average Dialog Turn (All): 18.812

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain police

Overall Results

Success Rate: 87.0 %

(Precision, Recall, F1) : (0.913, 0.913, 0.901)

Average Dialog Turn (Succ): 9.700

Average Dialog Turn (All): 13.652

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain hospital

Overall Results

Success Rate: 93.3 %

(Precision, Recall, F1) : (0.933, 0.933, 0.933)

Average Dialog Turn (Succ): 3.286

Average Dialog Turn (All): 5.733

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing

Domain taxi

Overall Results

Success Rate: 13.6 %

(Precision, Recall, F1) : (0.182, 0.159, 0.167)

Average Dialog Turn (Succ): 8.167

Average Dialog Turn (All): 26.500

System NLU Failed Dialog Act: User NLU Failed Dialog Act:

Nothing

Dialog Loop Bad Inform Dialog Act

Nothing

Request But Not Inform Dialog Act Inform But Not Request Dialog Act

Nothing