Skip to content
Snippets Groups Projects
Commit 330c3dba authored by Harald Scheidl's avatar Harald Scheidl
Browse files

cleanup: remove analyze code, and only keep code really relevant for HTR

parent a7e85ba7
No related branches found
No related tags found
No related merge requests found
...@@ -41,7 +41,7 @@ If neither `--train` nor `--validate` is specified, the NN infers the text from ...@@ -41,7 +41,7 @@ If neither `--train` nor `--validate` is specified, the NN infers the text from
## Integrate word beam search decoding ## Integrate word beam search decoding
It is possible to use the word beam search decoder \[4\] instead of the two decoders shipped with TF. It is possible to use the [word beam search decoder](https://repositum.tuwien.ac.at/obvutwoa/download/pdf/2774578) instead of the two decoders shipped with TF.
Words are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized. Words are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized.
The following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail. The following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail.
...@@ -61,7 +61,7 @@ Beam width is set to 50 to conform with the beam width of vanilla beam search de ...@@ -61,7 +61,7 @@ Beam width is set to 50 to conform with the beam width of vanilla beam search de
## Train model with IAM dataset ## Train model with IAM dataset
Follow these instructions to get the IAM dataset \[5\]: Follow these instructions to get the IAM dataset:
* Register for free at this [website](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database) * Register for free at this [website](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database)
* Download `words/words.tgz` * Download `words/words.tgz`
...@@ -88,7 +88,7 @@ Using the `--fast` option and a GTX 1050 Ti training takes around 3h with a batc ...@@ -88,7 +88,7 @@ Using the `--fast` option and a GTX 1050 Ti training takes around 3h with a batc
## Information about model ## Information about model
### Overview ### Overview
The model \[1\] is a stripped-down version of the HTR system I implemented for my thesis \[2\]\[3\]. The model is a stripped-down version of the HTR system I implemented for [my thesis]((https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742)).
What remains is what I think is the bare minimum to recognize text with an acceptable accuracy. What remains is what I think is the bare minimum to recognize text with an acceptable accuracy.
It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer. It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer.
The illustration below gives an overview of the NN (green: operations, pink: data flowing through NN) and here follows a short description: The illustration below gives an overview of the NN (green: operations, pink: data flowing through NN) and here follows a short description:
...@@ -102,33 +102,15 @@ The illustration below gives an overview of the NN (green: operations, pink: dat ...@@ -102,33 +102,15 @@ The illustration below gives an overview of the NN (green: operations, pink: dat
![nn_overview](./doc/nn_overview.png) ![nn_overview](./doc/nn_overview.png)
### Analyze model
Run `python analyze.py` with the following arguments to analyze the image file `data/analyze.png` with the ground-truth text "are":
* `--relevance`: compute the pixel relevance for the correct prediction
* `--invariance`: check if the model is invariant to horizontal translations of the text
* No argument provided: show the results
Results are shown in the plots below.
For more information see [this article](https://towardsdatascience.com/6c04864b8a98).
![analyze](./doc/analyze.png)
## FAQ ## FAQ
* I get the error message "... TFWordBeamSearch.so: cannot open shared object file: No such file or directory": if you want to use word beam search decoding, you have to compile the custom TF operation from source * I get the error message "... TFWordBeamSearch.so: cannot open shared object file: No such file or directory": if you want to use word beam search decoding, you have to compile the custom TF operation from source
* Where can I find the file `words.txt` of the IAM dataset: it is located in the subfolder `ascii` on the IAM website * Where can I find the file `words.txt` of the IAM dataset: it is located in the subfolder `ascii` on the IAM website
* I want to recognize text of line (or sentence) images: this is not possible with the provided model. The size of the input image is too small. For more information read [this article](https://medium.com/@harald_scheidl/27648fb18519) or have a look at the [lamhoangtung/LineHTR](https://github.com/lamhoangtung/LineHTR) repository * I want to recognize the text contained in a text-line: the model is too small for this, you have to first segment the line into words, e.g. using the model from the [WordDetectorNN](https://github.com/githubharald/WordDetectorNN) repository
* I get an error when running the script more than once from an interactive Python session: do **not** call function `main()` in file `main.py` from an interactive session, as the TF computation graph is created multiple times when calling `main()` multiple times. Run the script by executing `python main.py` instead * I get an error when running the script more than once from an interactive Python session: do **not** call function `main()` in file `main.py` from an interactive session, as the TF computation graph is created multiple times when calling `main()` multiple times. Run the script by executing `python main.py` instead
## References ## References
\[1\] [Build a Handwritten Text Recognition System using TensorFlow](https://towardsdatascience.com/2326a3487cd5) * [Build a Handwritten Text Recognition System using TensorFlow](https://towardsdatascience.com/2326a3487cd5)
* [Scheidl - Handwritten Text Recognition in Historical Documents](https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742)
\[2\] [Scheidl - Handwritten Text Recognition in Historical Documents](https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742) * [Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm](https://repositum.tuwien.ac.at/obvutwoa/download/pdf/2774578)
\[3\] [Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf)
\[4\] [Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm](https://repositum.tuwien.ac.at/obvutwoa/download/pdf/2774578)
\[5\] [Marti - The IAM-database: an English sentence database for offline handwriting recognition](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database)
data/analyze.png

4.27 KiB

doc/analyze.png

61.5 KiB

import copy
import math
import pickle
import sys
import cv2
import matplotlib.pyplot as plt
import numpy as np
from DataLoaderIAM import Batch
from Model import Model, DecoderType
from SamplePreprocessor import preprocess
# constants like filepaths
class Constants:
"filenames and paths to data"
fnCharList = '../model/charList.txt'
fnAnalyze = '../data/analyze.png'
fnPixelRelevance = '../data/pixelRelevance.npy'
fnTranslationInvariance = '../data/translationInvariance.npy'
fnTranslationInvarianceTexts = '../data/translationInvarianceTexts.pickle'
gtText = 'are'
distribution = 'histogram' # 'histogram' or 'uniform'
def odds(val):
return val / (1 - val)
def weightOfEvidence(origProb, margProb):
return math.log2(odds(origProb)) - math.log2(odds(margProb))
def analyzePixelRelevance():
"simplified implementation of paper: Zintgraf et al - Visualizing Deep Neural Network Decisions: Prediction Difference Analysis"
# setup model
model = Model(open(Constants.fnCharList).read(), DecoderType.BestPath, mustRestore=True)
# read image and specify ground-truth text
img = cv2.imread(Constants.fnAnalyze, cv2.IMREAD_GRAYSCALE)
(w, h) = img.shape
assert Model.imgSize[1] == w
# compute probability of gt text in original image
batch = Batch([Constants.gtText], [preprocess(img, Model.imgSize)])
(_, probs) = model.inferBatch(batch, calcProbability=True, probabilityOfGT=True)
origProb = probs[0]
grayValues = [0, 63, 127, 191, 255]
if Constants.distribution == 'histogram':
bins = [0, 31, 95, 159, 223, 255]
(hist, _) = np.histogram(img, bins=bins)
pixelProb = hist / sum(hist)
elif Constants.distribution == 'uniform':
pixelProb = [1.0 / len(grayValues) for _ in grayValues]
else:
raise Exception('unknown value for Constants.distribution')
# iterate over all pixels in image
pixelRelevance = np.zeros(img.shape, np.float32)
for x in range(w):
for y in range(h):
# try a subset of possible grayvalues of pixel (x,y)
imgsMarginalized = []
for g in grayValues:
imgChanged = copy.deepcopy(img)
imgChanged[x, y] = g
imgsMarginalized.append(preprocess(imgChanged, Model.imgSize))
# put them all into one batch
batch = Batch([Constants.gtText] * len(imgsMarginalized), imgsMarginalized)
# compute probabilities
(_, probs) = model.inferBatch(batch, calcProbability=True, probabilityOfGT=True)
# marginalize over pixel value (assume uniform distribution)
margProb = sum([probs[i] * pixelProb[i] for i in range(len(grayValues))])
pixelRelevance[x, y] = weightOfEvidence(origProb, margProb)
print(x, y, pixelRelevance[x, y], origProb, margProb)
np.save(Constants.fnPixelRelevance, pixelRelevance)
def analyzeTranslationInvariance():
# setup model
model = Model(open(Constants.fnCharList).read(), DecoderType.BestPath, mustRestore=True)
# read image and specify ground-truth text
img = cv2.imread(Constants.fnAnalyze, cv2.IMREAD_GRAYSCALE)
(w, h) = img.shape
assert Model.imgSize[1] == w
imgList = []
for dy in range(Model.imgSize[0] - h + 1):
targetImg = np.ones((Model.imgSize[1], Model.imgSize[0])) * 255
targetImg[:, dy:h + dy] = img
imgList.append(preprocess(targetImg, Model.imgSize))
# put images and gt texts into batch
batch = Batch([Constants.gtText] * len(imgList), imgList)
# compute probabilities
(texts, probs) = model.inferBatch(batch, calcProbability=True, probabilityOfGT=True)
# save results to file
f = open(Constants.fnTranslationInvarianceTexts, 'wb')
pickle.dump(texts, f)
f.close()
np.save(Constants.fnTranslationInvariance, probs)
def showResults():
# 1. pixel relevance
pixelRelevance = np.load(Constants.fnPixelRelevance)
plt.figure('Pixel relevance')
plt.imshow(pixelRelevance, cmap=plt.cm.jet, vmin=-0.25, vmax=0.25)
plt.colorbar()
img = cv2.imread(Constants.fnAnalyze, cv2.IMREAD_GRAYSCALE)
plt.imshow(img, cmap=plt.cm.gray, alpha=.4)
# 2. translation invariance
probs = np.load(Constants.fnTranslationInvariance)
f = open(Constants.fnTranslationInvarianceTexts, 'rb')
texts = pickle.load(f)
texts = ['%d:' % i + texts[i] for i in range(len(texts))]
f.close()
plt.figure('Translation invariance')
plt.plot(probs, 'o-')
plt.xticks(np.arange(len(texts)), texts, rotation='vertical')
plt.xlabel('horizontal translation and best path')
plt.ylabel('text probability of "%s"' % Constants.gtText)
# show both plots
plt.show()
if __name__ == '__main__':
if len(sys.argv) > 1:
if sys.argv[1] == '--relevance':
print('Analyze pixel relevance')
analyzePixelRelevance()
elif sys.argv[1] == '--invariance':
print('Analyze translation invariance')
analyzeTranslationInvariance()
else:
print('Show results')
showResults()
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment