@@ -159,18 +159,18 @@ If you need a better accuracy, here are some ideas how to improve it \[2\]:
...
@@ -159,18 +159,18 @@ If you need a better accuracy, here are some ideas how to improve it \[2\]:
Run `python analyze.py` with the following arguments to analyze the image file `data/analyze.png` with the ground-truth text "are":
Run `python analyze.py` with the following arguments to analyze the image file `data/analyze.png` with the ground-truth text "are":
*`--relevance`: compute the pixel relevance for a the correct prediction.
*`--relevance`: compute the pixel relevance for the correct prediction.
*`--invariance`: check if the model is invariant to horizontal translations of the text.
*`--invariance`: check if the model is invariant to horizontal translations of the text.
* No argument provided: show the results.
* No argument provided: show the results.
Results are shown in the plots below.
Results are shown in the plots below.
The pixel relevance (left plot) shows how a pixel influences the score for the correct class.
The pixel relevance (left) shows how a pixel influences the score for the correct class.
Red pixels vote for the correct class, while blue pixels vote against the correct class.
Red pixels vote for the correct class, while blue pixels vote against the correct class.
It can be seen that the white space above vertical lines in images is important for the classifier to decide against the "i" character with its superscript dot.
It can be seen that the white space above vertical lines in images is important for the classifier to decide against the "i" character with its superscript dot.
Draw a dot above the "a" (red region in plot) and you will get "aive" instead of "are".
Draw a dot above the "a" (red region in plot) and you will get "aive" instead of "are".
The second plot (right) shows how the probability of the ground-truth text changes when the text is shifted to the right.
The second plot (right) shows how the probability of the ground-truth text changes when the text is shifted to the right.
As can be seen, the model is not translation invariant, as all images from IAM are left-aligned.
As can be seen, the model is not translation invariant, as all training images from IAM are left-aligned.
Adding data augmentation which uses random text-alignments can improve the translation invariance of the model.
Adding data augmentation which uses random text-alignments can improve the translation invariance of the model.