This is the model pipeline in our self-designed application--"Mlator", which is aimed to help manga fans and publishers to overcome the language barrie and lower the cost of translation respectively.
This example image is from <<Q.E.D.iff-proven end-11>> Episode 1 © Motohiro Katou.
First, train a object detection model that helps us locate the text in the bubble. Here we thanks to Manga109 providing us with large amount of high quality annotated dataset.As the following image shows, the identified areas are marked with orange bounding boxes, and content in the box would be processed by the next step.
Next we use a state-of-the-art OCR engine to parse the image segment we identified in step 1 into machine-readable text. Besides, a few tricks are needed to help the model parse vertically-oriented Japanese text and stylized comic fonts.
All the extracted Japanese text is translated to English. This is a crucial stage in the process, since a quality translation is what allows readers to enjoy the results.
If we simply use the bounding boxes as our translated text background, some of the boxes would leak beyond the bounds of the bubble, which make the page uncomfortable to read. It would be the best if the bubble is used for background, that's why we need to remove the original text.
Finally, the English text is broken up into lines of an appropriate length and resized to comfortably fit their corresponding speech bubble. At this point, the comics are translated and ready for reading!