Jeff Hwang, Sean Shi - Stanford University

Classification of Photographs based on Perceived Aesthetic QualityJeff Hwang, Sean Shi

Department of Electrical Engineering, Stanford University

Learning Pipeline Feature Extraction

Experimental Setup Results

Features

• Blur: variance of the Laplacian • Graininess: entropy • Detail: proportion of edge pixels in

filtered image • Saliency: proportion of salient pixels • Subject Detection: proportion of strong

edge pixels • Contrast: standard deviation of

grayscale intensities • Color Distribution: histogram of RGB

tuples • Shade: histogram of distinct hues • Exposure: average luminance • Vividness: average saturation

Dataset Scraped 1700 images from photo.net, 3000 images from DPChallenge.com, photographs rated between 1-7 and 1-10 respectively. We only consider photographs in the bottom and top 10th percentiles and balanced the number of highly rated and poorly rated photographs

Classifier Tuning Selected regularization, gamma, and kernel parameters of SVM via grid search. Selected number of trees used in random forests (RF) empirically. Selected subsampling parameter and number of trees in gradient tree boosting (GBRT) empirically.

Spatial Correlation of Features

Extract features from each tile in image. Allow learning algorithm to infer relationships between the tiles.

Abstract

Allow websites with community-sourced images to maintain the desired quality of content by programmatically filtering out bad images.

Allow photo feeds such as Instagram to circulate system-learned high quality photographs in addition to user curated content.

Provide aspiring photographers with real-time visual feedback to help improve their photographic skills.

How to distinguish between good and bad?We explore automated aesthetic evaluation of photographs using machine learning and image processing techniques. We theorize that the spatial distribution of certain visual elements within an image correlates with its aesthetic quality. To this end, we model each photograph as a set of tiles, extract visual features from each tile, and train a classifier on the resulting features along with the images’ aesthetics ratings. Our model achieves a 10-fold cross-validation classification success rate of 85.03%

Motivation

Misclassified Photographs

SVM RF GBRT

photo.net 78.71% 78.58% 80.88%DPChallenge 84.00% 83.15% 85.03%

85.80% 14.20%

15.73% 84.27%

Predicted Label

ActualLabel

True Positive

False Positive True Negatives

False Negatives

GBRT: 500 predictors, =0.910-fold Cross-Validation Accuracy:

85.03%

1

0

1 0

Backward Feature Selection

10-fold CV accuracy across different learning algorithms and datasets.

Shortcomings of System • Cannot glean semantic meaning • Cannot infer more complex global

features such as leading lines • Color resolution too coarse • Saliency detection inconsistent

http://photo.net

http://dpchallenge.com

Date post:	07-Nov-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Jeff Hwang, Sean Shi - Stanford University

Documents