+ All Categories
Home > Documents > Oracle Bone Character Recognition Project Milestone

Oracle Bone Character Recognition Project Milestone

Date post: 02-Jan-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
7
Oracle Bone Character Recognition Project Milestone Hanh Nguyen, Jing Wei Pan 1. Algorithm and Results 1.1. Preprocessing 1.1.1. Foreign Object Removal Our collection of rubbings contains red arrows manually put on by the last user of the dataset. In order to revert the images to their original condition, the first step in preprocessing is removing the red arrows by covering the red pixels with an average of its closest grey neighbors. Figure 1: Red arrow is removed with minimal disturbance to other pixels 1.1.2. Noise Reduction Approach1: The image is first smoothed with a gaussian filter, then converted to binary with a 0.5 threshold (optimal threshold varies by a margin of 0.1 for different rubbings). Specks are removed by darkening small objects. Approach2: The image is first smoothed with a gaussian filter, then sharpened with a Laplacian filter. Conversion to binary in this approach uses a much lower threshold (0.1) that does not need to vary. Specks are removed by darkening small objects as well. 1
Transcript
Page 1: Oracle Bone Character Recognition Project Milestone

Oracle Bone Character Recognition

Project Milestone

Hanh Nguyen, Jing Wei Pan

1. Algorithm and Results

1.1. Preprocessing

1.1.1. Foreign Object RemovalOur collection of rubbings contains red arrows manually put on by the last user ofthe dataset. In order to revert the images to their original condition, the first step inpreprocessing is removing the red arrows by covering the red pixels with an averageof its closest grey neighbors.

Figure 1: Red arrow is removed with minimal disturbance to other pixels

1.1.2. Noise ReductionApproach1: The image is first smoothed with a gaussian filter, then converted tobinary with a 0.5 threshold (optimal threshold varies by a margin of 0.1 for differentrubbings). Specks are removed by darkening small objects.

Approach2: The image is first smoothed with a gaussian filter, then sharpened witha Laplacian filter. Conversion to binary in this approach uses a much lower threshold(0.1) that does not need to vary. Specks are removed by darkening small objects aswell.

1

Page 2: Oracle Bone Character Recognition Project Milestone

Figure 2: Image to the left is the processed result of approach 1. Image to the right is the processedresult of approach 2. Aside from decreased noise near shell edges, there is little difference betweenthe two approaches. We can choose either output to proceed to the next step of preprocessing

1.1.3. Contour RemovalThe contour of a shell adds noise to the processed image. We solve this problem byfirst identifying the contour by its black:white ratio in a small square matrix, thensuppressing all pixels around a contour point. Black markings in the white space ofeach rubbing are eliminated along the way.

Figure 3: Left: original rubbing; Center: isolated contour; Right: contour erased

1.1.4. Crack RemovalA common feature of ancient turtles shells is that they often come in broken pieces.These cracks, along with man-made markings are noisy information that interfere

2

Page 3: Oracle Bone Character Recognition Project Milestone

with character isolation. To locate cracks, we needed to find continuous lines on arubbing. To differentiate cracks with character strokes, we filter the lines by theirlength.

Figure 4: Left: Rubbing with contour removed; Center left: Taking the first derivative in the xand y direction; Center right: Zero-crossings of the first derivative; Right: oversized lines, in otherwords, cracks that need to be removed

Once the cracks are located, their coordinates are mapped on the original imageso that any surround light pixels are suppressed. An earlier attempt was made toisolate cracks with the embedded Matlab Canny Edge detector. Since the resultsfrom Canny Edge were not ideal, we resorted to using this modified version of theMarr Hildreth detector.

3

Page 4: Oracle Bone Character Recognition Project Milestone

Figure 5: Left: Rubbing with contour removedRight: Rubbing with cracks removed

1.1.5. Character IsolationIndividual characters are found by locating continuous white regions on the pro-cessed image. Small regions and regions that do not have character spacing(solidblocks) are filtered out. If two regions are located closely to each other and theirneighboring dimensions match, then they are grouped as a single character.

Figure 6: Left: Character Isolation on preprocessed image; Right: Character Isolation on theoriginal rubbing

4

Page 5: Oracle Bone Character Recognition Project Milestone

On this particular rubbing, 28 characters are found by the algorithm, 21 of whichare true positives. The remaining 8 characters are noise data that could not beeliminated by the algorithm. No characters are missed.

1.1.6. Next StepsThe current preprocessing algorithm reduces noise as much as possible to producea clean template for character isolation. This approach is dangerous since the ac-curacy of the output is entirely dependent on the efficiency of noise reduction. Weplan to implement another character extraction algorithm by using SIFT. We alsowant to attempt weeding out the noise around shell edges using cross correlationsince they have a common pattern. The goal for the rest of the term is to makepreprocessing more robust to different rubbings.

1.2. Shape contextThe basic idea is to pick n points on the contours of a shape. For each point pi on theshape, define hi(k) = #{q 6= pi : (q − pi) ∈ bin(k)} to be the shape context of pi. Thebins are normally taken to be uniform in log-polar space.

Figure 7: Finding boundary points with n = 100

The cost of matching a point pi on the 1st shape and qj on the 2nd shape is defined as

Cij = C(pi, qj) =1

2

K∑k=1

[hi(k)− gj(k)]2

hi(k) + gj(k)

where h and g are shape contexts of p and q respectively. Given the cost matrix, we canfind a one-to-one matching that minimizes the total cost of matching

H(π) =∑i

C(pi, qπ(i)

)in O(n3) time by using the Hungarian method.

5

Page 6: Oracle Bone Character Recognition Project Milestone

Figure 8: Finding correspondence

We then proceed to estimate the plane transformation T : R2 → R2. using thin platespline (TPS).

Figure 9: Thine plate spline transformation

Finally, we can compute the shape distance

Dsc(P,Q) =1

n

∑p∈P

arg minq∈Q

C(p, T (q)) +1

m

∑q∈Q

arg minp∈P

C(p, T (q))

and assign the unknown character to the known character with lowest shape distance.

6

Page 7: Oracle Bone Character Recognition Project Milestone

Figure 10: Matching characters

2. Next Steps

• Pre-processing

– Explore alternative approaches for noise reduction and feature extraction

– Reduce the number of hardcoded thresholds with training

• Shape context

– Improve current implementation to reduce run time

– Optimize parameters for best results

– Implement other shape matching algorithm(s) for benchmarking

• Integrating and testing

References

[1] S. Belongie, J. Malik, and J. Puzicha, “Shape Matching and Object Recognition UsingShape Contexts,” IEEE Transactions of Pattern Analysis and Machine Intelligence, vol.24, no. 24, pp. 509-521, 2002.

7


Recommended