Document some more config options for tesseract

Clarify also the name(s) of the generated OCR result file(s): Tesseract does not create a file named outbase.txt by default. Fix also a sentence in the language section. Signed-off-by: Stefan Weil <sw@weilnetz.de>
jiapei100 · Oct 5, 2018 · 383dcf7 · 383dcf7
1 parent e03ee93
commit 383dcf7
Showing 1 changed file with 17 additions and 4 deletions.
diff --git a/doc/tesseract.1.asc b/doc/tesseract.1.asc
@@ -34,7 +34,9 @@ IN/OUT ARGUMENTS
 
 'outputbase'::
 	The basename of the output file (to which the appropriate extension
-	will be appended).  By default the output will be named 'outbase.txt'.
+	will be appended).  By default the output will be a text file
+	with `.txt` added to the basename unless there are one or more
+	'configfile' options which explicitly specify the desired output.
 
 'stdout'::
 	Instruction to sent output data to standard output
@@ -88,8 +90,19 @@ OPTIONS
 	contains a list of variables and their values, one per line, with a
 	space separating variable from value.  Interesting config files
 	include: +
-  * hocr - Output in hOCR format instead of as a text file.
-  * pdf  - Output in pdf instead of a text file.
+  * `hocr` - Output in hOCR format (file extension `.hocr`).
+  * `pdf` - Output PDF (file extension `.pdf`).
+  * `tsv` - Output TSV (file extension `.tsv`).
+  * `txt` - Output plain text (file extension `.txt`).
+  * `get.images` - Write images.
+  * `logfile` - Write debug file `tesseract.log`.
+  * `lstm.train` - Used for LSTM training.
+  * `makebox` - Output box file.
+  * `quiet` - Write debug file to /dev/null.
+
+It is possible to select several config files, for example
+`tesseract image.png demo hocr pdf txt` will create three output files
+`demo.hocr`, `demo.pdf` and `demo.txt` with the OCR results.
 
 *Nota Bene:*   The options `-l lang` and `--psm N` must occur
 before any 'configfile'.
@@ -122,7 +135,7 @@ LANGUAGES
 
 The currently available traineddata files for tesseract 4.0
 for the following languages are in
-(in https://github.com/tesseract-ocr/tessdata_fast):
+https://github.com/tesseract-ocr/tessdata_fast:
 
 *afr* (Afrikaans),
 *amh* (Amharic),