KAIST Hanja DB1

 

Built by
Cognitive Systems Group, AI laboratory, Computer Science Department, KAIST, 1996
 
Data Collection
 
The images are written by little more than 200 peoples. The 800 most frequently used character classes in names of Korean, which covers 96.6% of usage, are chosen to be collected. The subjects wrote Chinese characters on sheets containing 800 fields each of which is for one character. The sheets are scanned with flatbed scanner, and segmented into characters.
The characters are filtered and sorted according to quality manually. Among 800 classes, 17 classes are discarded and 783 class are remained.  
 
Quality
 
Database Structure
 
The database contains 783 files, each of which has 200 samples
 
Data Format
The data is stored in KS format designed by Samsung Electronic Company.
File is composed of file header and data array
File header is 4 bytes string 'KS  ' indicating KS format.
Data array is a sequence of character image, each of which is as follows.
 

Name

Size in bytes

Description

code

2

KS-5601 code

width

1

width of the image

height

1

height of the image

IsOK

1

0 means poor quality image

1 means good quality image

image

(width+7)/8*height

1bit/pixel, row-major coding

 
Examples
 
 
 
Download small size samples
Contact to