• 问答
  • 技术
  • 实践
  • 资源
【冠军】ICDAR 2019 自由字型文本识别竞赛冠军方案(附源码)精选
技术讨论

作者:Jyouhou
GitHub项目:ICDAR2019-ArT-Recognition-Alchemy
论文链接:https://arxiv.org/abs/1908.11834


本文是ICDAR 2019 自由字型文本识别竞赛的冠军团队PKU Team Zero的竞赛方案详解


ICDAR2019-ArT-Recognition-Code for PKU Team Zero

Introduction

This is the code repository for our algorithm that ranked No.1 on ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (Latin scripts). Our team name is PKU Team Zero.

For more technical details, please refer to our paper: Alchemy: Techniques for Rectification Based Irregular Scene Text Recognition . We hope our efforts will act as a stepping stone to better recognition algorithms.

file

Team

  • Shangbang Long (龙上邦), master student at CMU MLD.
  • Yushuo Guan (关玉烁), master student at Peking University, EECS.
  • Bingxuan Wang (王炳宣), junior undergraduate student at Peking University, Yuanpei College.
  • Kaigui Bian (边凯归), associate professor at Peking University, EECS.
  • Cong Yao (姚聪), algorithm team leader at MEGVII (Face++) Inc.

Competition Ranking

For full List, click here.

Method Result Total words Correct words
PKU Team Zero (ours) 74.30% 35284 26216
CUTeOCR 73.91% 35284 26078
CRAFT (Preprocessing) + TPS-ResNet 73.87% 35284 26063
serial_rec 72.89% 35284 25717

Experiment Replication

Environment

Find the enclosed environment file, and use the following command to install:

conda env create -f environment.yml

You will need Anaconda to do so.

Data

In this section, we introduce how to prepare data for experiments. For our datasets, please refer to the Pretrained Models and Data section below.

All datasets should be placed under the dataset folder. The datasets should be arranged as follows:

- File Tree

|-dataset
|    |-dataset_name
|         | Label.json
|         |-IMG
|                1. jpg
|                2. jpg
|                ...

- Label.json File

The Structure of Label.json should be:

[
    {
        "img": "IMG/x.jpg",
        "word": str,
        "poly_x": [int, int, int, ...], 
        "poly_y": [int, int, int, ...],
        "chars": list(list(list(int)))
    }
]

The attributes are:

  • img: path of the image file
  • str: text content of the image
  • poly_x: x coordinates of the bounding polygon, if exists.
  • poly_y: y coordinates of the bounding polygon, if exists.
  • chars: a 3-D 2x4xN array (list) representing bounding boxes of N characters. The 3 dimensions are: x/y coordinates, 4 corners, N characters.

How to Replicate the Experiments Presented in Our Paper:

After the paper is announce, you can refer to the paper for more details. To run experiments, find the training scripts under the corresponding folders, and call from the root folder, e.g. bash Experiment/Experiment1/Exper_1_CRNN_all_synth.sh.

Exp 1:

Experiments W.R.T. new synthetic datasets as described in Section 3.1.2-3.1.3 can be found in Experiment/Experiment1.

Exp 2:

Experiments W.R.T. mixing synthetic datasets and real world data as described in Section 3.2 can be found in Experiment/Experiment2.

Exp 3&4:

Experiments with model modifications as described in Section 4 can be found in Experiment/Experiment3 and Experiment/Experiment4.

ICDAR 2019-ArT:

To replicate our ICDAR 2019 models, readers can use scripts in Experiment/Experiment6.

Pretrained Models and Data

Models

We will release some pretrained models shortly.

Data

We will release all datasets we used for the convenience of the research community.

However, as they are large, we only release the following ones for now. We will update soon.

Dataset name Description Link
RectTotal Total-Text rectified by TextSnake Google Drive
CurvedSynth(20K) The newly proposed synthetic dataset we used Google Drive

You can download and put these one under the dataset folder to start trying our code.

Curved SynthText Engine

file

As would be discussed in detail in the paper, the Curved SynthText Engine we modified from the original SynthText is our trump card. We also opensource this engine: Jyouhou/CurvedSynthText.

Using synthetic images from this engine, we can expect $10+\%$ improvement on Total-Text using a very simple algorithm.

RectTotal

file

We propose to evaluate algorithms on images rectified by TextSnake, as an investigation of key factors in text recognition. The TextSnake paper refers to the following one:

@inproceedings{long2018textsnake,
  title={Textsnake: A flexible representation for detecting text of arbitrary shapes},
  author={Long, Shangbang and Ruan, Jiaqiang and Zhang, Wenjie and He, Xin and Wu, Wenhao and Yao, Cong},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={20--36},
  year={2018}
}

We release the Rectified Total-Text (RectTotal) dataset for further research.

Citation

If our paper and code help you in your research and understand the text recognition better, you are highly encouraged (though not required) to cite our paper:

@article{long2019ArT,
  title={Alchemy: Techniques for Rectification Based Irregular Scene Text Recognition},
  author={Long, Shangbang and Guan, Yushuo and Wang, Bingxuan and Bian, Kaigui and Yao, Cong},
  journal={arXiv preprint arXiv:1908.11834},
  year={2019}
}


更多Awsome Github资源请关注:【Awsome】GitHub 资源汇总


推荐阅读:
【电子书】李沐大神《动手学深度学习》
视觉 SLAM 十四讲
ICCV 2019 野生东北虎再识别挑战赛双赛道冠军方案(含开源代码)
白翔团队文本检测最新论文 Mask TextSpotter-PAMI 版

file
△ 扫一扫关注 极市平台
每天推送最新CV干货

  • 1
  • 0
  • 4779
收藏
暂无评论
Find me
大咖

一个大的公司 ·

  • 13,246

    关注
  • 246

    获赞
  • 48

    精选文章
近期动态
  • 哈工大深圳研究生院CV汪,请原谅我这一生放纵不羁爱CV~
文章专栏
  • Awsome-Github 资源列表
作者文章
更多
  • 通过可微分神经渲染数据增强(附 GitHub 源码及论文下载)CVPR2021 目标检测
    442
  • CVPR2021 深度框架训练,不是所有数据增强都可以提升最终精度
    719
  • GitHub Star 7.2K,超级好用的 OCR 数据合成与半自动标注工具,强烈推荐!
    991
  • 李宏毅强化学习完整笔记!开源项目《LeeDeepRL-Notes》发布
    1.0k
  • 在 Pytorch 中构建流数据集
    582