Aquarium Family Fish Species Identification System Using ... · accuracy, validation accuracy, and...

Aquarium Family Fish Species IdentificationSystem Using Deep Neural Networks

Nour Eldeen M. Khalifa1,2(&) , Mohamed Hamed N. Taha1,2 ,and Aboul Ella Hassanien1,2

1 Information Technology Department, Faculty of Computers and Information,Cairo University, Giza, Egypt

{nourmahmoud,mnasrtaha,aboitcairo}@cu.edu.eg2 Scientific Research Group in Egypt (SRGE), Giza, Egypt

http://www.egyptscience.net

Abstract. In this paper, a system for aquarium family fish species identificationis proposed. It identifies eight family fish species along with 191 sub-species.The proposed system is built using deep convolutional neural networks (CNN).It consists of four layers, two convolutional and two fully connected layers.A comparative result is presented against other CNN architectures such asAlexNet and VggNet according to four parameters (number of convolution andfully connected layers, the number of epochs in training phase to achieve 100%accuracy, validation accuracy, and testing accuracy). Through the paper, it isproven that the proposed system has competitive results against the otherarchitectures. It achieved 85.59% testing accuracy while AlexNet achieves85.41% over untrained benchmark dataset. Moreover, the proposed system hasless trained images, less memory, less computational complexity in training,validation, and testing phases.

Keywords: Deep learning � Deep neural � Fish identificationConvolutional neural networks

1 Introduction

Fish species observation and identification in the aquarium are considered veryinformative for tourists. The aquarium is equipped with a camera and when a fishpasses in front of it, an identification system is triggered to classify the fish and displayinformation on the screen as illustrated in Fig. 1 and considered one of the mainmotivation of this research. Also, this research area is important for academicresearchers like ocean scientists and biologists. Commercial applications like fishfarming depend on fish species observation to achieve their benefits. This involvestime-consuming and destructive measures to get physical samples and visual census.However, these approaches are still common.

Fish species recognition is a challenging issue for research. Great challenges forfish recognition appear in the special properties of underwater videos and images. Priorfish recognition, researchers were limited to constrained environments before fishrecognition [1]. The focus of the most recognition research is on ground objects.

© Springer Nature Switzerland AG 2019A. E. Hassanien et al. (Eds.): AISI 2018, AISC 845, pp. 347–356, 2019.https://doi.org/10.1007/978-3-319-99010-1_32

http://orcid.org/0000-0001-8614-9057

http://orcid.org/0000-0003-0200-2918

http://orcid.org/0000-0002-9989-6681

http://crossmark.crossref.org/dialog/?doi=10.1007/978-3-319-99010-1_32&domain=pdf



However, there is a great demand for underwater object recognition. In the last twodecades, many machine learning and image processing algorithms have been proposedfor underwater species classification [2].

Convolution operation is famous in the computer vision and signals processingcommunity. The convolutional operation is frequently used by conventional computervision, especially for noise reduction and edge detection [3].

The idea of a Convolutional Neural Network (CNN) is not recent. In [4], CNNachieved great results for handwritten digit recognition. However, they slowly fell outof favor due to memory and hardware constraints, besides the lack of a large amount oftraining data. They were unable to scale to much larger images. With the huge increasein the processing power, memory size and the availability of powerful GPUs and largedatasets, it was possible to train larger and more complex models. The machinelearning Researchers had been working on learning models which included learningand extracting features from images. This leads to the start of the first deep learningmodel. AlexNet [5], Vgg-16 and Vgg-19 [6] are considered the famous deep convo-lutional neural networks.

Deep Learning has achieved significant results and a huge improvement in visualdetection and recognition with a lot of categories [7]. Raw data images are used be deeplearning as input without the need of expert knowledge for optimization of segmen-tation parameter or feature design.

Prior researchers do not achieve satisfying results. Firstly, most of the fish imageswere under constrained conditions. Secondly, the datasets were probably small.Thirdly, the accuracy is very unsatisfying under constrained and unconstrainedconditions.

Early methods for fish species classification were performed in controlled envi-ronments only. In [8], dead fish samples in the laboratory were used for classificationsbased on color and shape. Storbeck and Daan in [9], proposed the use of laser light for3D modeling of fish to measure fish features like height, length, and thickness of somespecies. Unconstrained classification of underwater fish is a very difficult and chal-lenging task. The similarity in color, shape, and texture of different fish is considered

Fig. 1. The design of aquarium family fish species identification system

348 N. E. M. Khalifa et al.

another challenge in the classification of species. [9, 10] proposed two classicalmethods for fish species classification in unconstrained environments, based on textureand shape in nature.

2 Deep Convolutional Neural Networks

Deep learning is a data-driven method. Both the distinctive features and the classifierare trained simultaneously. Deep neural networks can learn the hierarchical represen-tation of data. Besides that, data representation is improved with the increase in thenumber of layers [11].

A filter bank layer, a nonlinear transformation, and a feature pooling layer are themain stages of feature extraction. They are very common in several object recognitionsystems [12].

CNN typically consists of several layers that act as the layers mention before inobject recognition systems. The convolutional filter bank can be used for local patternsextraction. Each convolutional layer in the CNN is followed by a nonlinearity pro-cessing layer [13]. A nonlinear processing layer works on forming a nonlinear complexmodel through capturing the nonlinearity dynamics of input data. The goal of featurepooling layer is to decrease the resolution of feature maps [14].

3 Related Works

Training on datasets with large variations of background and objects in the imagesgives the CNN ability to extract information for objects of interest based on their color,texture, and shape. So, any visual pattern could be captured and learned easily by thesuitable network. As the number of example for specific object increases, the networkgeneralization capability increases. This capability of generalization gives the trainednetwork the ability to classify information that is not used for training [15]. AlexNet,Vgg-16, and Vgg-19 are examples of pre-trained deep CNN. The knowledge insideeach of those deep CNN can be used by researchers for training and testing on moredatasets. Knowledge transfer of deep CNN is one of the main advantages that improvethe usability and accuracy of the deep neural [16].

3.1 Alexnet

Alexnet is considered one of the most famous high-performing deep convolution neuralnetworks. It has been trained on 1.2 million images. It can classify 1000 different object.The network has nearly 60 million parameters with about 650,000 neurons [5]. Alexnetstructure consists of five convolution layers in addition to two fully connected layers.

The first layer is a convolutional layer. It filters the 224 � 224 � 3 input imageswith 96 kernels. The size of each kernel is 11 � 11 � 3 with a stride of 4 pixels. Afterpooling and normalizing the output of the first layer, it becomes input to the secondconvolutional layer. It filters the input with 256 kernels of size 5 � 5 � 48 then appliespooling and normalization on the output. The third layer has 384 kernels of size

Aquarium Family Fish Species Identification System 349

3 � 3 � 256. It takes the response of the second layer after pooling and normalizationon it. There are no pooling or normalization between the third, fourth and fifth layers.They are connected one to another. The fourth and fifth convolutional layers have 354kernels of size 3 � 3 � 192 and 256 kernels of size 3 � 3 � 192 respectively. Theoutput from the fifth layer is pooled and become input for the sixth layer. The sixth andseventh layers consist of 4096 fully connected neurons. The last layer has 1000 fullyconnected neurons.

3.2 VGG-16 and VGG-19

VGG-19 [6] is another famous example of a deep CNN. Stacks of smaller sizedconvolutional filters are considered interesting features in VGG design. The use of verysmall same size convolutional filters in all network layers is the unique thing in itsarchitecture. VGG network depth was increased by adding more convolutional layers.The philosophy of deeper-is-better is applied in the VGG net design. One disadvantageof these very deep networks is that they become very difficult to train [17].

4 Fish Dataset

The dataset used in this research are taken from QUT Robotics fish dataset [18]. Thisdataset consists of 3,960 images and contains real-world images of fish captured inconditions defined as “controlled”, “out-of-the-water” and “in situ.” The “controlled”images consist of several types of fish specimens, with their fins spread, taken against aconstant background with controlled illumination. The “in situ” images are takenunderwater in the fish natural habitat with no control over background or illumination.The “out-of-the-water” images consist of fish specimens, which are taken out of thewater with a varying background and limited control over the illumination conditions.

In this research, eight family species of fish were selected according to the avail-ability of the captured images. The size of the images varies in width and height.Table 1 illustrates the fish dataset description with some sub-species. Also, sometraining, validation, and testing images were provided. Testing images are taken from adifferent benchmark dataset LifeClef2015 [19]. LifeClef’15 dataset contains more than20,000 images of fish divided into 15 classes of species, details of which is given in[19]. For each species, this dataset has a different number of available images. In thisresearch, the same eight training classes were selected.

5 The Proposed Deep CNN System

The architecture of the deep network proposed for the aquarium family fish speciesidentification is introduced in detail in Figs. 2 and 3. Figure 2 illustrates an abstractgraphical representation of the proposed system, while Fig. 3 provides a detailedarchitecture. The proposed system is a simple version of the AlexNet [5]. The selectionof AlexNet as it contains a minimum number of layers and accepted accuracy fortraining and validation over 90%.


The proposed new version is adapted and reduced from AlexNet to limit thenumber of parameters, computational complexity (in training, validation and testingphases), and memory. It consists of 10 layers, made up of two convolutional layers forfeatures extraction, followed by two fully connected layers for classification.

The first layer is the input layer. The second layer is considered the convolutionlayer. The third layer, a Rectified Linear Unit (ReLU) is which used as nonlinearactivation function, followed by the fourth layer (convolution layer). A moderatepooling is performed, with subsampling applied after the previous convolution. Thefully connected layer has 256 neurons, respectively, with ReLU activation function,while the last fully connected layer has 8 neurons and uses a soft-max layer to obtainclass memberships as illustrated in Fig. 3.

Visualizing the feature extraction phase in the proposed deep neural architecturewill give a better understating, Fig. 4 shows the different images resulted from applyingfirst convolution layer and RELU to the input image. The second Visualizing thefeature extraction phase in the proposed deep neural architecture will give a betterunderstating, Fig. 4 shows the different images resulted from applying first convolution

Table 1. Fish family species distribution for training, validation, and testing with sampleimages.

Fish Family Species Sample Image

Number of Sub-Species

Total Images

Training Set

Validation Set

Testing Set

------------ QUT dataset ------------ LifeClef’15

Bodianus 9 111 64 18 29

Coris 8 96 67 19 10

Epinephelus 29 286 188 54 44

Halichoeres 16 215 132 38 45

Lethrinus 12 143 91 26 26

Lutjanus 20 325 204 58 63

Pseudanthias 16 201 133 38 30

Thalassoma 9 144 89 25 30

Total 119 1521 968 276 277


layer and RELU to the input image. The second convolution layer and its RELU wouldproduce more details (more features) from the output images after the first convolutionand RELU layer as illustrated in Fig. 5.

Fig. 3. Detailed component architecture for the proposed deep CNN system

Fig. 2. Abstract view of the proposed deep CNN architecture


6 Experiment Environment

The proposed system was implemented using a commercial software package(MATLAB), the implementation was GPU specific. All experiments were conductedon a server with Intel Xeon E5-2620 processor (2 GHz) and 96 GB Ram with Titan XGPU.

7 Experimental Results

To evaluate the proposed system, a different untrained fish benchmark dataset (Life-Clef’15) was used for testing. It was compared against AlexNet, Vgg-16, and Vgg-19.The parameters used for comparison were (the number of convolution and fully con-nected layers, the number of epochs in training phase to achieve 100% accuracy,validation accuracy {QUT training dataset} and testing accuracy {LifeClef’15}).Table 2 illustrates the comparative results for family fish species identification.

The first comparative parameter is the number of convolution and fully connectedlayers; the proposed system has the least number of layers against other architecturesthat means less computational complexity (in training, validation and testing phases),and memory.

Fig. 4. Typical first convolutional and RELU layer features visualization

Fig. 5. Typical second convolutional and RELU layer features visualization.


The second comparative parameter is the number of epochs in training phase toachieve 100% accuracy, the proposed system has the maximum number of epochs butthe large number of epochs was expected as the proposed system network never trainedbefore unlike the other architectures, so it takes more epochs to achieve 100% accuracyin training phase.

The third comparative parameter is the validation accuracy (QUT training dataset)which was used in training phase. The proposed system achieved 97.10%, this accu-racy is very competitive with AlexNet which achieved 98.63% accuracy, but Alexnetwas already trained before in a million images, same as vgg-16 and vgg-19.

The fourth comparative parameter is testing accuracy on (LifeClef’15 benchmark).The proposed system achieved 85.59% accuracy for untrained testing data. Thisaccuracy outperforms AlexNet accuracy which achieved 85.41%. It is a small margin,but again all the other architectures were trained before and took a long time runningtheir architectures for days in Matlab to build their network learning weights.

Although the proposed system doesn’t outperform the other architectures validationaccuracy, also, it doesn’t outperform vgg-16 and vgg-19 in testing accuracy (as thosearchitectures already loaded with learning), but it achieved better testing accuracyagainst AlexNet. The proposed system had less number of layers and tailored for fishfamily species identification and take less time in classification process and used in realtime applications for fish family species identification in aquarium.

8 Conclusions

Real time aquarium fish identification system according to family species is animportant topic. It will help tourists to know more information about fish pass in frontof them. In this research, a proposed system for the deep neural network is introduced.The proposed system is a simple version for AlexNet. It consists of 4 layers, twoconvolution layers, and two fully connected layers. It can identify and classify eightfamily fish species with 119 sub-species. A comparative result is introduced, and itshows that the proposed system could not outperform vgg-16 and vgg-19 in validation

Table 2. Comparative results for fish family species identification

Model Number ofconvolution andfully connectedlayers

# Epochs intraining phase toachieve 100%accuracy

Validationaccuracy (QUTtrainingdataset)

Testingaccuracy(LifeClef’15)

AlexNet 7 Epoch 13 98.63% 85.41%Vgg16 16 Epoch 10 99.04% 87.86%Vgg19 19 Epoch 8 99.64% 89.89%Proposedsystem

4 Epoch 21 97.10% 85.59%


and testing accuracy as they considered a large deep neural network with alreadytrained million images while the proposed system trained on 1521 image. The proposedsystem outperforms AlexNet with a small margin in testing accuracy. It achieves85.59% while AlexNet achieves 85.41% and have less trained images, less memory,less computational complexity in training, validation, and testing phases.

Acknowledgements. We gratefully acknowledge the support of NVIDIA Corporation with thedonation of the Titan X GPU used for this research.

References

1. Fouad, M.M.M., Zawbaa, H.M., El-Bendary, N., Hassanien, A.E.: Automatic Nile Tilapiafish classification approach using machine learning techniques. In: 13th InternationalConference on Hybrid Intelligent Systems, HIS 2013, pp. 173–178. IEEE (2013)

2. Fouad, M.M., Zawbaa, H.M., Gaber, T., Snasel, V., Hassanien, A.E.: A fish detectionapproach based on BAT algorithm. In: The 1st International Conference on AdvancedIntelligent System and Informatics, AISI 2015, pp. 273–283. Springer, Beni Suef (2016)

3. Dominguez, A.: A history of the convolution operation [Retrospectroscope]. IEEE Pulse 6,38–49 (2015). https://doi.org/10.1109/MPUL.2014.2366903

4. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to documentrecognition. Proc. IEEE 86, 2278–2324 (1998). https://doi.org/10.1109/5.726791

5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutionalneural networks. In: Proceedings of the 25th International Conference on Neural InformationProcessing Systems, pp. 1097–1105. Curran Associates Inc. (2012)

6. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A.,Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognitionchallenge. Int. J. Comput. Vis. 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

7. Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: 2017 12th IEEEInternational Conference on Automatic Face and Gesture Recognition, FG 2017, pp. 650–657. IEEE (2017)

8. Strachan, N., Kell, L.: A potential method for the differentiation between haddock fish stocksby computer vision using canonical discriminant analysis. ICES J. Mar. Sci. 52, 145–149(1995). https://doi.org/10.1016/1054-3139(95)80023-9

9. Rova, A., Rova, A., Mori, G., Dill, L.M.: One fish, two fish, butterfish, trumpeter:recognizing fish in underwater video. In: IAPR Conference on Machine Vision Applications,Tokyo, Japan, pp. 404–407 (2007)

10. Spampinato, C., Giordano, D., Di Salvo, R., Chen-Burger, Y.-H.J., Fisher, R.B., Nadarajan,G.: Automatic fish classification for underwater species behavior understanding. In:Proceedings of the First ACM International Workshop on Analysis and Retrieval of TrackedEvents and Motion in Imagery Streams, ARTEMIS 2010, p. 45. ACM Press, New York(2010)

11. Khalifa, N.E.M., Taha, M.H.N., Hassanien, A.E., Selim, I.M.: Deep galaxy: classification ofgalaxies based on deep convolutional neural networks (2017). arXiv preprint arXiv:1709.02245

12. Sainath, T.N., Kingsbury, B., Mohamed, A., Ramabhadran, B.: Learning filter banks withina deep neural network framework. In: 2013 IEEE Workshop on Automatic SpeechRecognition and Understanding, pp. 297–302. IEEE (2013)


http://dx.doi.org/10.1109/MPUL.2014.2366903

http://dx.doi.org/10.1109/5.726791

http://dx.doi.org/10.1007/s11263-015-0816-y

http://dx.doi.org/10.1007/s11263-015-0816-y

http://dx.doi.org/10.1016/1054-3139(95)80023-9

http://arxiv.org/abs/1709.02245

http://arxiv.org/abs/1709.02245

13. Khalifa, N.E., Taha, M.H., Hassanien, A.E., Selim, I.: Deep Galaxy V2: robust deepconvolutional neural networks for galaxy morphology classifications. In: 2018 IEEEInternational Conference on Computing Sciences and Engineering, ICCSE, pp. 122–127.IEEE (2018)

14. Bui, H.M., Lech, M., Cheng, E., Neville, K., Burnett, I.S.: Object recognition using deepconvolutional features transformed by a recursive network structure. IEEE Access 4, 10059–10066 (2017). https://doi.org/10.1109/ACCESS.2016.2639543

15. Scott, G.J., England, M.R., Starms, W.A., Marcum, R.A., Davis, C.H.: Training deepconvolutional neural networks for land-cover classification of high-resolution imagery. IEEEGeosci. Remote Sens. Lett. 14, 549–553 (2017). https://doi.org/10.1109/LGRS.2017.2657778

16. Lima, E., Sun, X., Dong, J., Wang, H., Yang, Y., Liu, L.: Learning and transferringconvolutional neural network knowledge to ocean front recognition. IEEE Geosci. RemoteSens. Lett. 14, 354–358 (2017). https://doi.org/10.1109/LGRS.2016.2643000

17. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Proceedingsof the 28th International Conference on Neural Information Processing Systems, Montreal,Canada, pp. 2377–2385 (2015)

18. Anantharajah, K., Ge, Z., McCool, C., Denman, S., Fookes, C., Corke, P., Tjondronegoro,D., Sridharan, S.: Local inter-session variability modelling for object classification. In: IEEEWinter Conference on Applications of Computer Vision, pp. 309–316. IEEE (2014)

19. Joly, A., Goëau, H., Glotin, H., Spampinato, C., Bonnet, P., Vellinga, W.-P., Planqué, R.,Rauber, A., Palazzo, S., Fisher, B., Müller, H.: LifeCLEF 2015: multimedia life speciesidentification challenges. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.,San Juan, E., Capellato, L., Ferro, N. (eds.) Experimental IR Meets Multilinguality,Multimodality, and Interaction. Lecture Notes in Computer Science. Springer, Cham (2015)


http://dx.doi.org/10.1109/ACCESS.2016.2639543

http://dx.doi.org/10.1109/LGRS.2017.2657778



Date post:	21-May-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Aquarium Family Fish Species Identification System Using ... · accuracy, validation accuracy, and...

Documents