Date post: | 14-Jan-2016 |
Category: |
Documents |
Upload: | melvin-stables |
View: | 218 times |
Download: | 0 times |
Scott Clements, Monash University Software Engineering, Copyright 2003.
Web Document Analysis-Improving Search Technology using Image
Processing
Scott Clements Bachelor of Software Engineering
Monash Universitywww.csse.monash.edu.au/~sdcle1/
Supervisor: Dr. Sid Ray
Scott Clements, Monash University Software Engineering, Copyright 2003.
Interests and Expertise
Dr. Sid Ray• Image Processing expert
Scott Clements•Internet Technology
•Software Engineering
•Database Management
•Interface Design
Scott Clements, Monash University Software Engineering, Copyright 2003.
Union of Expertise
Engineering a product which uses:
Image processing & Internet Technology
Scott Clements, Monash University Software Engineering, Copyright 2003.
Primary Goals
• To improve search quality using Image Processing.
• To investigate– Image histogram matching to find similar images– Colour predominance in images
Scott Clements, Monash University Software Engineering, Copyright 2003.
Secondary Goals
• To Engineer a product which has industry potential.
–Project Management–Interface Design–Database Management –Information Retrieval
Scott Clements, Monash University Software Engineering, Copyright 2003.
Background• Popular search technology [mcbryan94, brin98, pinkerton00]
– Text based– Quality of results can be poor– Difficult to find images
• Multimedia search technology [ogle95, smith97]
– Text, image and video based– Poor interface design
•Aimed at Image Processing experts
– Good use of Databases Management systems
Scott Clements, Monash University Software Engineering, Copyright 2003.
Software Engineering Methods
Stages• Initial program: Grey-scale image matching• Refinement 1: Colour image matching• Refinement 2: Colour predominance image matching
S y s te m R e fine m e nt s ta ge s
Integratio n
I m p lem en t
T es t
D o c u m en t R es u lts an dF in d in g s
Initial p ro gram
Scott Clements, Monash University Software Engineering, Copyright 2003.
Image processing technique
Data types:
• Histogram Data
• Colour Predominance Data
I m ag e I m ag e p r o c es s in g D ata
In p u t O u tp u t
Scott Clements, Monash University Software Engineering, Copyright 2003.
System Architecture
P re -p ro ce s s in g
- Im a ge a na ly s is
- F o rm a t info rm a tio n
- A d d info rm a tio n tod a ta b a s e
D ata b a s e
P o s t-p ro ce s s in g
- S e nd q u e ry info rm a tio n
- P ro c e s s q u e ryinfo rm a tio n
- C a lc u la te s e a rc hre s u lts
- R e tu rn s e a rc h re s u ltsUs e r
Scott Clements, Monash University Software Engineering, Copyright 2003.
System Architecture continued
P re -p ro ces s in g
- C
- M o na s h Im a ge Lib .
- P H P /H T M LD atab as e
M y SQ L
P o s t-p ro ces s in g
- P H P / H T M L
Us e r
Scott Clements, Monash University Software Engineering, Copyright 2003.
Colour histogram matching•Method:
–Using: •Group 16 configuration
•Total difference Algorithm
•Requirements–Database design–Histogram analysis
•Investigate:–Interface design–Relevance Feedback
Scott Clements, Monash University Software Engineering, Copyright 2003.
Histograms (Group 16 Configuration)
Colour Histograms
-Count the number of occurrences of each colour intensity
-256 intensities for each RGB component. (24bit image)
-Insert this information into the database
Problem: Excessive amount of information
Solution: Convert to Group 16 Configuration.
Scott Clements, Monash University Software Engineering, Copyright 2003.
Database Design
r 2 r e d
id entific atio nr1r2r3r4r5r6r7r8... .r15r16
r 2 bl ue
id entific atio nb 1b 2b 3b 4b 5b 6b 7b 8... .b 15b 16
r 2 g r e e n
id entific atio ng1g2g3g4g5g6g7g8... .g15g16
r2
id entific atio nnam e...**R es erved S p ac e**
Scott Clements, Monash University Software Engineering, Copyright 2003.
Algorithm
•Aim: To find other similar images
•Method: Compare each of the histograms with the query histogram
•Algorithm: Total difference
Scott Clements, Monash University Software Engineering, Copyright 2003.
Total Difference Algorithm
-Query Image versus images in the database
-Compare each histogram-Find the positive difference between each histogram (Total Difference)-Convert 0-300% range to a similarity rating between 0-100%
-Return the results which are within a user defined similarity rating
Scott Clements, Monash University Software Engineering, Copyright 2003.
Interface Design
Scott Clements, Monash University Software Engineering, Copyright 2003.
Relevance Feedback
User Feedback:–Clicking the similarity button–Proving interest in a particular image
Relevance–Sorting results:
• most similar to least similar
Scott Clements, Monash University Software Engineering, Copyright 2003.
Results and Findings
Method Accuracy
Grey-scale Histogram matching 64%
Colour Histogram Matching 84%
•Test Set: –Real life photos–Computer generated images
•Weakness–Grey-scale histogram matching. (Unacceptable results)–Images with many different colours–Spatial Arrangements–Needing to resize the images. (standardisation for histograms)
Scott Clements, Monash University Software Engineering, Copyright 2003.
Colour predominance
• Assign each pixel a colour value (if possible)
• Found that RGB was not suitable in this case
• HSB was much easier to find colour ranges
• Method: Using an image program find the Hue, Saturation and Brightness ranges for each colour.
Scott Clements, Monash University Software Engineering, Copyright 2003.
Algorithm DesignAnalysis• Count each occurrence of a certain colour
• Convert the occurrence result to a percent of predominance between 0-100%
Query• Query the database to find images which have predominant colours.
Scott Clements, Monash University Software Engineering, Copyright 2003.
Database Refinement
r 3 r e d
id entific atio nr1r2r3r4r5r6r7r8... .r15r16
r 3 bl ue
id entific atio nb 1b 2b 3b 4b 5b 6b 7b 8... .b 15b 16
r 3 g r e e n
id entific atio ng1g2g3g4g5g6g7g8... .g15g16
r3
id entific atio nnam e...**R es erved S p ac e**
r 3 pr e do m i nanc e
id entific atio nredm agentap urp leb luec yangreenyello wo ranged arkb right
Scott Clements, Monash University Software Engineering, Copyright 2003.
Interface design
Scott Clements, Monash University Software Engineering, Copyright 2003.
Interface design continued
Scott Clements, Monash University Software Engineering, Copyright 2003.
Relevance Feedback
• Not fully suitable for Colour predominance
• Use a subset of Relevance Feedback to improve useability
• Sort the result from most to least relevant
Scott Clements, Monash University Software Engineering, Copyright 2003.
Results and Findings
•Test set: –Real life photos–Computer generated images
–Easy method to understand for users–Less information stored in the database–Accurate and efficient method to use
Algorithm Similarity results
Colour Predominance 86%
Scott Clements, Monash University Software Engineering, Copyright 2003.
Conclusion and ApplicationsSmall to Medium sized system • Example: local image database• Colour histogram matching• Colour predominance
Medium to Large system • Example: Internet search engine• Only Colour predominance
–More efficient–Less information to store about images–Easy to understand
Scott Clements, Monash University Software Engineering, Copyright 2003.
Future Research• Parallelism in image analysis• Alternative image data for histogram matching (e.g. HSB)• Replace or extend Monash Image Library (MIL) to directly support popular internet image formats.• Improve the documentation for colour image manipulation in MIL.•More extensive testings of colour predominance•Addition of predominance levels
Scott Clements, Monash University Software Engineering, Copyright 2003.
Questions?
Are there any questions?