Computational Ceramicology: Matching Image Outlines to ...

Computational Ceramicology: Matching Image Outlines to Catalog Sketches

Supplementary Material

(a) (b)

Figure 5: (a) A profile, characterizing only the rim of the ves-sel. (b) The extracted profile and rotation axis, with “miss-ing” areas manually annotated in black.

A Synthetic training dataGenerating high-quality data with as much similarity aspossible to real data is critically important for training asim2real classification model. To generate synthetic train-ing data using the sketches extracted from the catalogs, ourprocess follows the following four steps (I–IV):I. Extraction of sketch lines from catalogs. Extraction ofthe profile from the catalog sketch is done by tracing theedges of the black area denoting the profile of the vessel(the left half of Fig 1(c)). Note that some profile drawingsare incomplete, as they are used to capture distinctions be-tween subtypes that share similar vessel bodies, or becausenot enough is known about the full shape. For these, man-ual annotations mark the edges that are in fact “missing”.See Fig 5(a,b). Finally, the scale is extracted from the rulerpresent somewhere on the catalog page. The result of thisextraction process can be seen in Fig 6(a).II. Efficient generation of synthetic fracture faces. Wedeveloped a direct method for generating synthetic sherdoutlines, without first reconstructing a 3D model. It is basedon the observation that every point rotated around the rota-tion axis forms a circle in 3D, perpendicular to the rotationaxis. We consider the sketch as placed on the xz plane, withthe rotation axis aligned with the z axis. Every outline point(px, py) defines the 3D geometric location satisfying the twoequations x2 + y2 = p2x and z = py; see Fig 6(b). To gen-erate a fracture, we consider a random 3D plane P that in-tersects the model (Fig 6(c)), where the angle between theplane and the z-axis is kept below 20°, to simulate the moredistinctive real-world fractures, which are almost vertical.

We then compute the intersection of plane P with theO(n) circles defined above, n being the number of outlinepoints, an operation that can be performed in constant timeper circle, thus generating an outline of a 3D sherd in lineartime. We note the following: (i) The process is done for boththe inner profile and outer profile, resulting in two curves,each annotated as being either inside or outside. (ii) If planeP is not tangent to the circle, there are two intersection points

per circle. Since most sherds are not presented with bothsides of the cut (Fig 6(d)), we pick the same side for all in-tersections, by using the same root of the quadratic formula(using only the positive square root of the discriminant).

To project the sherd back into 2D, we compute the planecoordinates of all the circle intersection points. To form theclosed polygon, we connect the intersection points in thesame order of their originating points on the profile, whileskipping circles that have no intersection with the plane. Toadd further realism to the generated fracture, we need to re-duce its extent to match the dimensions of real potsherds.Therefore, we cut the resulting polygon using two almost-horizontal lines—one at the upper part of the polygon andone at the lower part (also shown in Fig 6(d)).

The entire generation process can be done in linear time,without the need to simplify the outline, thus providing asignificant improvement over the method proposed in (Ban-terle et al. 2017). The process is also much easier to imple-ment, and does not require 2D envelope computations, frac-ture type analysis, and other complexities of that method,making it fast enough for generating training data on the fly.

All synthetic outlines are translated so as to have theircenter of mass at the origin. The typology of pottery is suchthat a certain feature may be associated with one class oranother based on the size of the vessel, making scale in-formation crucial for proper classification. Therefore, unlikemost recently reported applications of point clouds, we donot normalize each input to fit the unit sphere.III. Point sampling. As we are developing a point-cloudbased architecture, discrete points along the outlines fromthe previous steps must be sampled in order to create a suit-able input. Since the drawings are scanned in high resolu-tion, to capture as many details as possible, artifacts result-ing from the printing process may be visible (see Fig 7(a)).As a result, some of these artifacts may be reflected in thetraced outline (see Fig 7(b)). Therefore, sampling pointsalong the outline at such fine resolutions may capture fea-tures that are mere artifacts, and not related to the actualpottery. To avoid reflecting these artifacts in the point-cloud,it is necessary to limit the sampling resolution of the out-lines.

Some recent point-cloud architectures, including Point-Net++ (Qi et al. 2017b) and PointCNN (Hua, Tran, and Ye-ung 2018), work on a fixed number of points, while others,including PointNet, require the same number of points ineach sample for training efficiently. However, in the case ofpotsherd identification, using the same number of points forall potsherds is detrimental, since to be able to distinguishsmall features we must sample along the outline at fine res-olution (every 2–3 mm) on varying sizes of potsherds. Us-ing a number of points sufficient to capture the details of

(a) (b) (c) (d)

Figure 6: Sketch processing.(a) A sketch with the inner and outer profiles and rotation axis. (b) The rotation process. The innerand outer profiles are positioned for rotation around the rotation axis. (c) A cutting plane P through the 3D pottery. (d) Thecomplete fracture face. In practice, only one of two edges (colored orange and blue) is present in most excavated sherds. Wefurther cut the top and bottom of the fracture, using two lines, to create a sherd with more realistic edges and size.

larger sherds also on smaller ones would lead to samplingresolutions of 0.5mm (or less), which may start reflectingthe printing artifacts mentioned above; see Fig 7(c). Fur-thermore, even if no such artifacts are present in the tracingprocess, learning such fine features would lead to a loss ofrobustness to small defects in the clay or to tracing errors.

To overcome this issue, we allow for the sampling offewer points in cases the outline is not long enough. We setthe number of points while training to a fixed value (K), andalways sample randomly min{K, length/resolution} pointsfrom each outline, as shown in Fig 7(d). If fewer than Kpoints are sampled, we randomly repeat some points to reachK points. The network employs max-pooling, as detailed inSect. Network Architecture, and seems to be able to over-come this inconsistency in sampling.

Finally, we note that while we restricted the training tolow sampling resolutions (up to 512 points, with maximalresolution of one point every 2 mm on the outline), in eval-uating on real data, we use 1024 randomly sampled points,with a lower limit of 1mm on the resolution. In this way, weenjoy the efficiency benefit of training on a fixed number ofpoints, avoiding overfitting to small details during training,while enjoying the added information at test time.IV. Data augmentation and generalization to real data.Since learning is performed on purely synthetic data, strongaugmentation is a key factor in obtaining the robustness re-quired to generalize to real data. Aside from adding randomuniform jitter to each point, it is critical to also consider hu-man errors throughout the data acquisition process. A typicalphotography setup is depicted in Fig 8.

When photographing potsherds, it is important that thefracture be aligned with the image—where the sherd’s verti-cal axis (that is, the rotation axis) is aligned with the verticalaxis of the image and the fracture surface is kept parallelto the horizontal plane, to minimize distortions in the ac-quired fracture shape. (See Fig 1(a).) Note that the user hasno difficulty in approximating the vertical axis z, since themanufacturing process creates shapes with dominant circlesaround z. The ability of the users to properly align the ver-tical axis (aligning both the vertical axis to the rotation axis,

and the fracture surface to the image plane) was verified infield trials.

Despite the intuitive ability of archeologists to align thefracture correctly, this alignment is inexact, since it is a man-ual process. To achieve robustness in the face of errors in thisalignment process, we simulate a small random 3D rotation(the angle is sampled from a normal distribution with µ = 0°and σ = 10°) on each fracture before projecting it onto a 2Doutline.

Another concern with regard to data acquisition qualityarises from the nature of the field work. With one hand op-erating the camera and another hand holding the potsherd, aruler that is used for inferring scale information is often lefton the table and not held at the same distance as the frac-ture surface; see Fig 1(a). This seemingly small difference indistance from the camera, when combined with close rangephotography, has been empirically shown to lead to scalecomputations that cause sherds to appear up to 50% largerthan their actual size. To achieve robustness to this sort ofissue, we add a random scale factor, which is sampled froma Gaussian distribution with a variance of 0.8 and a mean of1.2 (not 1.0 since the scale extraction process is biased).

B Hyperparameters of CareLossIn the paper, we introduced CareLoss, a new samplereweighting scheme. The scheme consists of two weights: aweight uwhich is applied to each sample based on its groundtruth label, and a weight v that is based on the predictedlabel. The weights u and v contain as hyperparameters thecoefficients αu and αv , and are updated using a moving av-erage with momentum γ every b batches, thus totalling fourhyperparameters.

In this appendix, we analyze the effect of changing thevalues of these parameters, and demonstrate the stability ofthe training under a wide range of parameter values. All ex-periments were performed on OutlineNet with augmentedinput data, using the same datasets as in the experiments sec-tion. The learning rate and the optimizer are the same as inthe experiments section. The results for all experiments arereported after 200K batches.

B.1 The Hyperparameters αu and αv

Recalling the definitions from the main sections, the weightsu and v are defined as follows:

u(f, yi) := exp(−αuψ(f, yi)) (6)v(f, yi) := exp(+αvρ(f, yi)) (7)

u(f, yi) :=u(f, yi)∑j u(f, j)

(8)

v(f, yi, yi) :=1

η

(1 + Imiss(yi, yi)

v(f, yi)∑j v(f, j)

)(9)

The weights u(f, j) can be viewed as the result of a soft-max over the mean accuracy of the different classes, with−αu as the exponent multiplier. The weights v(f, j, j′) havea slightly more elaborate formulation, with +αv as the expo-nent multiplier. Setting both α parameters to values closer tozero reduces the effect of the class (either ground truth oneor the predicted one) on the weights of the samples. Nev-ertheless, in v higher weights are assigned to misclassifiedsamples regardless of their class, since Eq. 9 treats correctlyclassified and misclassified samples differently. Therefore,αv and αu behave differently near zero, and we thereforetest different ranges for the two.

We carry out the analysis over all 12 combinations ofαu ∈ {2, 4, 6, 8} and αv ∈ {0, 5, 10}. We use the samevalues of b and γ as in all other CareLoss experiments in thepaper (b = 50 and γ = 0.8). The results are presented in Ta-ble 4(a). As can be seen, most metrics report lower accuracywhen both αu and αv are high, possibly due to the accumu-lated effect of both, which results in strong weight changesbetween samples in the same batch. When both are low, wealso observe lower accuracy due to the reduced effect of theweighting. Better performance is obtained when balancingboth αu and αv at the middle of the suggested ranges.

B.2 Moving average parametersBoth u and v are weights that depend on empiric estimationsof the model performance (class accuracy and false-positiverates), thus requiring the continuous evaluation of perfor-mance metrics for updating the weights. To perform the up-dates, we keep track of the confusion matrix of the modelover the training, updating the weights every b batches witha momentum of γ. While higher values of γ and lower valuesof b may allow the loss function to respond faster to changesin the model performance, these are also more likely to in-troduce sharper changes in the loss function, which couldmake the model harder to train.

We carry out the analysis over all 12 combinations ofb ∈ {10, 50, 100, 500} and γ ∈ {0.5, 0.8, 0.9}. We usethe same values of αu and αv as in all other CareLoss ex-periments in the paper (αu = 6 and αv = 5). The resultsare presented in Table 4(b). As can be seen, for most valuesof b, setting a low value for γ (0.5) results in lower accuracyin all reported scenarios. This can potentially be explainedby the more rapid shifts that are created in the loss functionwhen the moving average has less momentum. Interestingly,

we also experience peak performance with γ = 0.9 andb = 10, suggesting that rapid updates that are regulated bya strong momentum may perform better than accumulatingresults and performing less frequent updates. This may beexplained by the ability to better react to bad training behav-iors, before they have time to “sink in” while the CareLossweights are waiting for the next update.

B.3 Summary of parameter sensitivity analysisThe analysis shows that the model exhibits similar trainingresults on a wide range of parameter values, and not onlywith those chosen in the comparisons of the paper. In thefirst part, we showed that it is better to balance between αu

and αv than to try and maximize both of them. In the sec-ond part, we showed that strong momentum on the CareLossweights, which stabilizes the loss function, helps achievebetter results. Additionally, when using strong momentum,we observe that it might be better to perform more frequentupdates to the weights of CareLoss instead of trying to ac-cumulate performance statistics over time and update onlyperiodically.

C Source code and reproducibilityFor the sake of reproducibility and promoting research inAI for ceramicology, we are making the full code of Out-lineNet available; training, classification, evaluation, and al-ternatives – all are fully available in the attached supplemen-tary. The repository is accompanied by a README file withfull details, and the code will be released as open-sourcealong with the publication.

Table 4: The accuracy of the OutlineNet model using different parameters for the CareLoss function. (a) Accuracy measureswhen changing the hyperparameters αu and αv . (b) Accuracy measures for different parameters of the moving average.

(a)

(b)

(a) (b) (c) (d)

Figure 7: Propagation of artifacts as a function of samplingresolution. (a) A scan of a drawing from the catalog, depict-ing only the rim of a vessel, and scanned at high resolution.Printing artifacts are clearly visible. (b) Accurate tracingof the drawing propagates some of the printing artifacts asrough edges. (c) Fixed-count sampling, matching the num-ber of points required to achieve 2mm resolution on some ofthe larger potsherds. Due to the sample density, the tracingartifacts are still present. (d) A resolution-limited sampling,sampling every 2mm at the scale of the real pottery. Mostartifacts are no longer visible.

Figure 8: Examples of archeological data acquisition pro-cesses. Left: in the field; right: in the lab. Images are takenfrom above, with both the scale reference ruler and the pot-sherd positioned on a table.

Date post:	11-Dec-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Computational Ceramicology: Matching Image Outlines to ...

Documents