Skip to content
Snippets Groups Projects
Commit a7e85ba7 authored by Harald Scheidl's avatar Harald Scheidl
Browse files

single option for decoders

parent 00d7232d
Branches
No related tags found
No related merge requests found
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset.
This Neural Network (NN) model recognizes the text contained in the images of segmented words as shown in the illustration below. This Neural Network (NN) model recognizes the text contained in the images of segmented words as shown in the illustration below.
3/4 of the words from the validation-set are correctly recognized and the character error rate is around 11%. 3/4 of the words from the validation-set are correctly recognized, and the character error rate is around 10%.
![htr](./doc/htr.png) ![htr](./doc/htr.png)
...@@ -21,34 +21,24 @@ The input image and the expected output is shown below. ...@@ -21,34 +21,24 @@ The input image and the expected output is shown below.
``` ```
> python main.py > python main.py
Init with stored values from ../model/snapshot-76 Init with stored values from ../model/snapshot-39
Recognized: "Hello" Recognized: "Hello"
Probability: 0.8462573289871216 Probability: 0.42098119854927063
``` ```
Tested with:
* Python 2 (commit <= 97c2512) and Python 3
* TF 1.3, 1.10 and 1.12 (commit <= 97c2512)
* TF 1.14, 1.15, 2.3.1, 2.4 (commit >= ec00c1a)
* Ubuntu 16.04, 18.04, 20.04 and Windows 7, 10
## Command line arguments ## Command line arguments
* `--train`: train the NN on 95% of the dataset samples and validate on the remaining 5% * `--train`: train the NN on 95% of the dataset samples and validate on the remaining 5%
* `--validate`: validate the trained NN * `--validate`: validate the trained NN
* `--beamsearch`: use vanilla beam search decoding (better, but slower) instead of best path decoding * `--decoder`: select from CTC decoders "bestpath", "beamsearch", and "wordbeamsearch". Defaults to "bestpath". For option "wordbeamsearch" see details below
* `--wordbeamsearch`: use word beam search decoding (only outputs words contained in a dictionary) instead of best path decoding. This is a custom TF operation and must be compiled from source, more information see corresponding section below. It should **not** be used when training the NN
* `--dump`: dumps the output of the NN to CSV file(s) saved in the `dump` folder. Can be used as input for the [CTCDecoder](https://github.com/githubharald/CTCDecoder)
* `--batch_size`: batch size * `--batch_size`: batch size
* `--fast`: use LMDB to load images (faster than loading image files from disk)
* `--data_dir`: directory containing IAM dataset (with subdirectories `img` and `gt`) * `--data_dir`: directory containing IAM dataset (with subdirectories `img` and `gt`)
* `--fast`: use LMDB to load images (faster than loading image files from disk)
* `--dump`: dumps the output of the NN to CSV file(s) saved in the `dump` folder. Can be used as input for the [CTCDecoder](https://github.com/githubharald/CTCDecoder)
If neither `--train` nor `--validate` is specified, the NN infers the text from the test image (`data/test.png`). If neither `--train` nor `--validate` is specified, the NN infers the text from the test image (`data/test.png`).
## Integrate word beam search decoding ## Integrate word beam search decoding
It is possible to use the word beam search decoder \[4\] instead of the two decoders shipped with TF. It is possible to use the word beam search decoder \[4\] instead of the two decoders shipped with TF.
...@@ -69,9 +59,7 @@ Further, the (manually created) list of word-characters can be found in the file ...@@ -69,9 +59,7 @@ Further, the (manually created) list of word-characters can be found in the file
Beam width is set to 50 to conform with the beam width of vanilla beam search decoding. Beam width is set to 50 to conform with the beam width of vanilla beam search decoding.
## Train model ## Train model with IAM dataset
### IAM dataset
Follow these instructions to get the IAM dataset \[5\]: Follow these instructions to get the IAM dataset \[5\]:
...@@ -97,16 +85,9 @@ The database LMDB is used to speed up image loading: ...@@ -97,16 +85,9 @@ The database LMDB is used to speed up image loading:
Using the `--fast` option and a GTX 1050 Ti training takes around 3h with a batch size of 500. Using the `--fast` option and a GTX 1050 Ti training takes around 3h with a batch size of 500.
### Other datasets
Either convert your dataset to the IAM format (look at `words.txt` and the corresponding directory structure) or change the class `DataLoaderIAM` according to your dataset format.
More information can be found in [this article](https://medium.com/@harald_scheidl/27648fb18519).
## Information about model ## Information about model
### Overview ### Overview
The model \[1\] is a stripped-down version of the HTR system I implemented for my thesis \[2\]\[3\]. The model \[1\] is a stripped-down version of the HTR system I implemented for my thesis \[2\]\[3\].
What remains is what I think is the bare minimum to recognize text with an acceptable accuracy. What remains is what I think is the bare minimum to recognize text with an acceptable accuracy.
It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer. It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer.
...@@ -122,7 +103,6 @@ The illustration below gives an overview of the NN (green: operations, pink: dat ...@@ -122,7 +103,6 @@ The illustration below gives an overview of the NN (green: operations, pink: dat
### Analyze model ### Analyze model
Run `python analyze.py` with the following arguments to analyze the image file `data/analyze.png` with the ground-truth text "are": Run `python analyze.py` with the following arguments to analyze the image file `data/analyze.png` with the ground-truth text "are":
* `--relevance`: compute the pixel relevance for the correct prediction * `--relevance`: compute the pixel relevance for the correct prediction
...@@ -136,20 +116,13 @@ For more information see [this article](https://towardsdatascience.com/6c04864b8 ...@@ -136,20 +116,13 @@ For more information see [this article](https://towardsdatascience.com/6c04864b8
## FAQ ## FAQ
* I get the error message "... TFWordBeamSearch.so: cannot open shared object file: No such file or directory": if you want to use word beam search decoding, you have to compile the custom TF operation from source
1. I get the error message "Exception: No saved model found in: ... ": unzip the file `model.zip`. All files contained must be placed directly into the `model` directory and **not** in some subdirectory created by the unzip-program. * Where can I find the file `words.txt` of the IAM dataset: it is located in the subfolder `ascii` on the IAM website
2. I get the error message "... TFWordBeamSearch.so: cannot open shared object file: No such file or directory": if you want to use word beam search decoding, you have to compile the custom TF operation from source. * I want to recognize text of line (or sentence) images: this is not possible with the provided model. The size of the input image is too small. For more information read [this article](https://medium.com/@harald_scheidl/27648fb18519) or have a look at the [lamhoangtung/LineHTR](https://github.com/lamhoangtung/LineHTR) repository
3. I get the error message "... ModuleNotFoundError: No module named 'editdistance'": you have to install the mentioned module by executing `pip install editdistance`. * I get an error when running the script more than once from an interactive Python session: do **not** call function `main()` in file `main.py` from an interactive session, as the TF computation graph is created multiple times when calling `main()` multiple times. Run the script by executing `python main.py` instead
4. Where can I find the file `words.txt` of the IAM dataset: it is located in the subfolder `ascii` of the IAM website.
5. I want to recognize text of line (or sentence) images: this is not possible with the provided model. The size of the input image is too small. For more information read [this article](https://medium.com/@harald_scheidl/27648fb18519) or have a look at the [lamhoangtung/LineHTR](https://github.com/lamhoangtung/LineHTR) repository.
6. I need a confidence score for the recognized text: after recognizing the text, you can calculate the loss value for the NN output and the recognized text. The loss simply is the negative logarithm of the score. See [this article](https://medium.com/@harald_scheidl/27648fb18519).
7. I use a custom image of handwritten text, but the NN outputs a wrong result: the NN is trained on the IAM dataset. The NN not only learns to recognize text, but it also learns properties of the dataset-images. Some obvious properties of the IAM dataset are: text is tightly cropped, contrast is very high, most of the characters are lower-case. Either you preprocess your image to look like an IAM image, or you train the NN on your own dataset. See [this article](https://medium.com/@harald_scheidl/27648fb18519).
8. I get an error when running the script more than once from an interactive Python session: do **not** call function `main()` in file `main.py` from an interactive session, as the TF computation graph is created multiple times when calling `main()` multiple times. Run the script by executing `python main.py` instead.
9. How to get support for this repository: I do not provide any support for this repository (also not via mail).
## References ## References
\[1\] [Build a Handwritten Text Recognition System using TensorFlow](https://towardsdatascience.com/2326a3487cd5) \[1\] [Build a Handwritten Text Recognition System using TensorFlow](https://towardsdatascience.com/2326a3487cd5)
\[2\] [Scheidl - Handwritten Text Recognition in Historical Documents](https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742) \[2\] [Scheidl - Handwritten Text Recognition in Historical Documents](https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742)
......
...@@ -110,24 +110,23 @@ def infer(model, fnImg): ...@@ -110,24 +110,23 @@ def infer(model, fnImg):
def main(): def main():
"main function" "main function"
# optional command line args
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument('--train', help='train the NN', action='store_true') parser.add_argument('--train', help='train the NN', action='store_true')
parser.add_argument('--validate', help='validate the NN', action='store_true') parser.add_argument('--validate', help='validate the NN', action='store_true')
parser.add_argument('--beamsearch', help='use beam search instead of best path decoding', action='store_true') parser.add_argument('--decoder', choices=['bestpath', 'beamsearch', 'wordbeamsearch'], default='bestpath',
parser.add_argument('--wordbeamsearch', help='use word beam search instead of best path decoding', help='CTC decoder')
action='store_true')
parser.add_argument('--dump', help='dump output of NN to CSV file(s)', action='store_true')
parser.add_argument('--fast', help='use lmdb to load images', action='store_true')
parser.add_argument('--data_dir', help='directory containing IAM dataset', type=Path, required=False)
parser.add_argument('--batch_size', help='batch size', type=int, default=100) parser.add_argument('--batch_size', help='batch size', type=int, default=100)
parser.add_argument('--data_dir', help='directory containing IAM dataset', type=Path, required=False)
parser.add_argument('--fast', help='use lmdb to load images', action='store_true')
parser.add_argument('--dump', help='dump output of NN to CSV file(s)', action='store_true')
args = parser.parse_args() args = parser.parse_args()
# set chosen CTC decoder
if args.decoder == 'bestpath':
decoderType = DecoderType.BestPath decoderType = DecoderType.BestPath
if args.beamsearch: elif args.decoder == 'beamsearch':
decoderType = DecoderType.BeamSearch decoderType = DecoderType.BeamSearch
elif args.wordbeamsearch: elif args.decoder == 'wordbeamsearch':
decoderType = DecoderType.WordBeamSearch decoderType = DecoderType.WordBeamSearch
# train or validate on IAM dataset # train or validate on IAM dataset
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment