Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch #102

liangzimei · 2019-10-17T08:58:21Z

Thanks for your great work , it does work.
However, in practice, to handle very long text (train set has no such long samples) in inference phase, we often train a model keeping the ratio and padding rather than fixed input(e.g. 32*100).
When there is no rectified module, it works successfully. when adding a rectified module, keeping ratio and padding is difficult.
Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch?
Thanks in advance……

Canjie-Luo · 2019-10-18T09:14:46Z

Thanks for your support!

Actually, I proposed MORAN to address small range deformation of text. As your irregular text is very long, I am afraid that the text in a semicircle is too difficult to rectify. You may need a curve text detector.

The output size of the rectification network of MORAN is not fixed (different from ASTER, which fix the number of the points). Theoretically, the rectification network of MORAN is able to be trained with fix input, and generalize well on the text with variable length.

For long text, a CRNN-based recognizer trained using CTC loss usually performs better. (It is reported by several papers that attention mechanism performs well only on short text.)

liangzimei · 2019-10-22T02:37:52Z

Thanks for your reply. i will then try a curve text detector and use its outline to rectify.

liangzimei · 2019-11-04T15:04:22Z

@Canjie-Luo hello, sorry to bother you, can you give some links about the papers saying that attention mechanism performs well only on short text? thanks……

Canjie-Luo · 2019-11-04T15:34:55Z

[ICDAR 2019] A Comparative Study of Attention-based Encoder-Decoder Approaches to Natural Scene Text Recognition.pdf

jake221 · 2019-11-28T12:33:33Z

Theoretically, the rectification network of MORAN is able to be trained with fix input, and generalize well on the text with variable length.

thanks for your great work and your sharing. I try to utilize your code to recognize image with variable width. Such as this picture with size 32*487:

And I modify the demo.py (with your pretrained model) in the following aspect:

if torch.cuda.is_available():
cuda_flag = True
MORAN = MORAN(1, len(alphabet.split(':')), 256, 32, 800, BidirDecoder=True, CUDA=cuda_flag)
MORAN = MORAN.cuda()
else:
MORAN = MORAN(1, len(alphabet.split(':')), 256, 32, 800, BidirDecoder=True, inputDataType='torch.FloatTensor', CUDA=cuda_flag)

resize image

image = Image.open(img_path).convert('L')
scale = image.size[1] * 1.0 / 32
w = int(image.size[0] / scale)

converter = utils.strLabelConverterForAttention(alphabet, ':')
transformer = dataset.resizeNormalize((w, 32))
image = transformer(image)

if cuda_flag:
image = image.cuda()
image = image.view(1, *image.size())
image = Variable(image)
text = torch.LongTensor(1 * 50)
length = torch.IntTensor(1 * 5)
text = Variable(text)
length = Variable(length)

max_iter = 100
t, l = converter.encode('0'*max_iter)
utils.loadData(text, t)
utils.loadData(length, l)
output = MORAN(image, length, text, text, test=True, debug=True)

However, I still got the worse output such as this:

Left to Right: ronaltherlyth

Could you tell me where I am wrong?

Canjie-Luo · 2019-12-06T02:14:36Z

Can you give the rectified image?

liangzimei closed this as completed Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch #102

Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch #102

liangzimei commented Oct 17, 2019

Canjie-Luo commented Oct 18, 2019

liangzimei commented Oct 22, 2019

liangzimei commented Nov 4, 2019

Canjie-Luo commented Nov 4, 2019

jake221 commented Nov 28, 2019

Canjie-Luo commented Dec 6, 2019

Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch #102

Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch #102

Comments

liangzimei commented Oct 17, 2019

Canjie-Luo commented Oct 18, 2019

liangzimei commented Oct 22, 2019

liangzimei commented Nov 4, 2019

Canjie-Luo commented Nov 4, 2019

jake221 commented Nov 28, 2019

resize image

Canjie-Luo commented Dec 6, 2019