Chinese text-line detection from web videos with fully convolutional networks

Chun Yang†,
Wei-Yi Pei†,
Long-Huang Wu and
Xu-Cheng YinEmail author

†Contributed equally

Big Data Analytics20183:2

https://doi.org/10.1186/s41044-017-0028-2

Received: 8 December 2017

Accepted: 27 December 2017

Published: 5 January 2018

Abstract

Background

In recent years, video becomes the dominant resource of information on the Web, where the text within video usually carries significant semantic

information. Video text extraction and recognition plays an essential role in web multimedia understanding and retrieval for big visual data analytics and applications. To deal with challenging backgrounds and embedding noises, most conventional approaches usually tend to design sophisticated pre-processing and post-progressing steps before and after text detection. In this paper, we present a simple yet powerful pipeline that directly and uniformly detects Chinese text lines for embedded captions from web videos.

Results

In this Chinese text-line detection system, a fully convolutional network with local context is adopted to localize via an end-to-end learning way. The produced caption predictions are with the word level that could be directly fed into the character classifier. Text-line construction is then performed by heuristic strategies. A variety of experiments are conducted on several real-world web video datasets and demonstrated the effectiveness and efficiency of our proposed method.

Conclusion

The proposed system can directly detect the English word and Chinese characters in the caption text-lines without word or character segmentation with the high performance on real-world web video datasets.