Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于大小写敏感问题 #84

Closed
jiaying96 opened this issue Jul 11, 2019 · 2 comments
Closed

关于大小写敏感问题 #84

jiaying96 opened this issue Jul 11, 2019 · 2 comments

Comments

@jiaying96
Copy link

jiaying96 commented Jul 11, 2019

训练集和测试集明明是区分大小写的,为什么代码里要全部转为小写字母,测试集的lmdb是否做过处理导致测试集(如cute80)的标签全部为小写字母(真实标签里是有大写字母的)

dataset.py中label = str(txn.get(label_key.encode()).decode('utf-8'))训练集得到的有大写和小写字母,测试集得到的是标注的小写字母,可是原本是lsbel是有大写和小写字母的,请问是在什么时候转为小写字母的,为什么要进行这样的操作?

@Canjie-Luo
Copy link
Owner

您好。这个设置是简单依照前人工作来做的,之前的论文是不区分大小写进行评估,为了公平对比,我们也不区分。另外,ICDAR官方也提供了大小写不敏感的排名,也是一个重要指标。所以我这个repo里边是没有区分大小写的。如果您想要区分大小写,可以稍微修改一下代码,数据集的原始文件也都有,是不难做的。

@wuxiaolianggit
Copy link

关于大小写不敏感的问题,是不是把chara = text_tmp[i][j].lower()改成chara = text_tmp[i][j]就可以解决了 @Canjie-Luo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants