HOME     |     Teaching     |     Research     |     Open Courseware     |      Biography     |    

 

Arabic OCR     |     Useful References     |     Publications     |     Dataset PATS-A01     |      Dataset PATS-A02     |    

     
Arabic OCR
Useful References
Publications
Dataset PATS-A01
Dataset PATS-A02
 

Dataset PATS-A01

The first Printed Arabic Text Set A01 (PATS-A01) consists of 2766 text line images. The text of 2751 line images of this set was selected from two standard classic Arabic books. The text of the remaining 15 line images are added from our minimal Arabic script (see publications). The line images are available in eight fonts: Arial, Tahoma, Akhbar, Thuluth, Naskh, Simplified Arabic, Andalus, and Traditional Arabic.

AkhbarText.txt

Font

Correctness %

Accuracy %

Line images

Size Ground Truth

Arial

99.94

99.90

Arial

8.8M ArialText.txt

Tahoma

99.92

99.68

Tahoma

9.7M TahomaText.txt

Akhbar

99.43

99.34

Akhbar

7.1M AkhbarText.txt

Thuluth

98.85

98.78

Thuluth

10.1M ThuluthText.txt

Naskh

98.19

98.09

Naskh

8.4M NaskhText.txt

Simplified Arabic

99.84

99.70

Simplified Arabic

8.9M SimplifiedText.txt

Traditional Arabic

98.87

98.83

Traditional Arabic

7.2M TraditionalText.txt

Andalus

99.99

97.86

Andalus

8.9M AndaluslText.txt
   
  Please notice that the ground truth text lines are ordered according to the numbering used in the names of the line images.