Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch #102

Closed
liangzimei opened this issue Oct 17, 2019 · 6 comments

Comments

@liangzimei
Copy link

Thanks for your great work , it does work.
However, in practice, to handle very long text (train set has no such long samples) in inference phase, we often train a model keeping the ratio and padding rather than fixed input(e.g. 32*100).
When there is no rectified module, it works successfully. when adding a rectified module, keeping ratio and padding is difficult.
Is there any rectified methods processing arbitrary length samples without fixed input(e.g. 32x100) in a batch?
Thanks in advance……

@Canjie-Luo
Copy link
Owner

Thanks for your support!

Actually, I proposed MORAN to address small range deformation of text. As your irregular text is very long, I am afraid that the text in a semicircle is too difficult to rectify. You may need a curve text detector.

The output size of the rectification network of MORAN is not fixed (different from ASTER, which fix the number of the points). Theoretically, the rectification network of MORAN is able to be trained with fix input, and generalize well on the text with variable length.

For long text, a CRNN-based recognizer trained using CTC loss usually performs better. (It is reported by several papers that attention mechanism performs well only on short text.)

@liangzimei
Copy link
Author

Thanks for your reply. i will then try a curve text detector and use its outline to rectify.

@liangzimei
Copy link
Author

@Canjie-Luo hello, sorry to bother you, can you give some links about the papers saying that attention mechanism performs well only on short text? thanks……

@Canjie-Luo
Copy link
Owner

[ICDAR 2019] A Comparative Study of Attention-based Encoder-Decoder Approaches to Natural Scene Text Recognition.pdf

@jake221
Copy link

jake221 commented Nov 28, 2019

Theoretically, the rectification network of MORAN is able to be trained with fix input, and generalize well on the text with variable length.

thanks for your great work and your sharing. I try to utilize your code to recognize image with variable width. Such as this picture with size 32*487:
image

And I modify the demo.py (with your pretrained model) in the following aspect:

if torch.cuda.is_available():
cuda_flag = True
MORAN = MORAN(1, len(alphabet.split(':')), 256, 32, 800, BidirDecoder=True, CUDA=cuda_flag)
MORAN = MORAN.cuda()
else:
MORAN = MORAN(1, len(alphabet.split(':')), 256, 32, 800, BidirDecoder=True, inputDataType='torch.FloatTensor', CUDA=cuda_flag)

resize image

image = Image.open(img_path).convert('L')
scale = image.size[1] * 1.0 / 32
w = int(image.size[0] / scale)

converter = utils.strLabelConverterForAttention(alphabet, ':')
transformer = dataset.resizeNormalize((w, 32))
image = transformer(image)

if cuda_flag:
image = image.cuda()
image = image.view(1, *image.size())
image = Variable(image)
text = torch.LongTensor(1 * 50)
length = torch.IntTensor(1 * 5)
text = Variable(text)
length = Variable(length)

max_iter = 100
t, l = converter.encode('0'*max_iter)
utils.loadData(text, t)
utils.loadData(length, l)
output = MORAN(image, length, text, text, test=True, debug=True)

However, I still got the worse output such as this:

Left to Right: ronaltherlyth

Could you tell me where I am wrong?

@Canjie-Luo
Copy link
Owner

Can you give the rectified image?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants