Precision Agriculture Data Ingestion Using Kafka

PRECISION AGRICULTURE SUPPORT

USING SCALA/SPARK Project Report

SRIRAM RV

SPRING SEMESTER

ADVISOR: PROFESSOR BRAD RUBIN

2

Table of Contents

1.0 PURPOSE OF PROJECT ....................................................................................................................... 4

2.0 PROJECT DESCRIPTION ..................................................................................................................... 4

2.0 Why Agriculture Data ......................................................................................................................... 5

3.0 DATASET .............................................................................................................................................. 5

3.1 Data Source: ........................................................................................................................................ 5

3.2 Details about Dataset: ......................................................................................................................... 5

3.3 Sample Data ........................................................................................................................................ 5

Weather data ......................................................................................................................................... 5

Moisture Data ........................................................................................................................................ 5

Image Data ............................................................................................................................................ 6

3.4 Schema ................................................................................................................................................ 6

Weather data ......................................................................................................................................... 6

Moisture Data ........................................................................................................................................ 6

3.5 Data Description: ................................................................................................................................ 7

Weather Data: ....................................................................................................................................... 7

Moisture Data ........................................................................................................................................ 7

4.0 PROJECT IMPLEMENTATION ........................................................................................................... 8

4.1 Data Ingestion using Kafka ................................................................................................................. 8

4.2 Kafka producer .................................................................................................................................... 8

4.4 Kafka Broker ....................................................................................................................................... 9

4.5 Kafka Consumer ............................................................................................................................... 10

5.0 ADDITIONAL TOOLS ........................................................................................................................ 10

5.1 Maven ............................................................................................................................................... 10

5.2 Scala Build tool ................................................................................................................................. 11

5.3 Git ..................................................................................................................................................... 11

6.0 OUTPUT INTERPRETATION ............................................................................................................ 12

7.0 IMPROVING THE KAFKA ARCHITECTURE ................................................................................. 12

7.1. Making kafka architecture more robust ........................................................................................... 12

7.2. Having dedicated Kafka Broker to improve performance ............................................................... 13

3

8.0 FUTURE RESEARCH ......................................................................................................................... 13

9.0. CONCLUSION .................................................................................................................................... 13

BIBLIOGRAPHY ....................................................................................................................................... 14

4

1.0 PURPOSE OF PROJECT

Big data tools over last few years has been focused on both structured and unstructured data.

However, image processing is one area where it needs more of attention and it has been my

area of interest too. With the help of this project, I will get an opportunity to experiment with

streaming images and weather data captured in the UST greenhouse, and also get a feel for

image processing with Scala/Spark on Hadoop more generally.

I will gain experience in technologies such as Scala, Spark, Spark streaming, and image

processing in the domain of food technology that will give me skills that I cannot otherwise

obtain in the GPS curriculum.

2.0 PROJECT DESCRIPTION

The purpose of the project is to stream real-time weather data captured by both direct sensors

and RGB images captured by the drones to perform image processing and weather data

analytics leveraging the Scala/Spark ecosystem on a Hadoop computing cluster. Since image

processing and streaming with Spark are knew technologies to GPS, part of the project will

focus on experimenting with different tools and find out more reliable way of storing images

and streamed data in HDFS.

The UST greenhouse will be growing plants for the Precision Agriculture project run by the UST

School of Engineering. The greenhouse has a local weather station that will be broadcasting

weather data such as temperature, humidity, light intensity, barometric pressure, position

(latitude/longitude), wind speed and direction and rainfall. The broadcast will be continuous at

10 second intervals (in CSV format) .The equipment in the greenhouse is a prototype for field

use which is useful for both analysis of plant health and creating a model for each of the six

plant species that will be grown. In addition, high resolution images will be taken of the plants

in the visible and near IR regions of the light spectrum. The periodicity of these images will be

every couple of days.

5

2.0 Why Agriculture Data

With the help of agricultural data, I will get an opportunity to experiment withstreaming images

and weather data captured in the UST greenhouse. Data captured in greenhouse is so much detailed

and gives me experience on working with data from food technology.

3.0 DATASET

3.1 Data Source:

The data source used for this project is the live streaming of weather and moisture data captured using

sensors through Arduino chip and Streamed using Kafka producer.

3.2 Details about Dataset:

The sensor data were captured for every second.

Total number of days of weather data stored in HDFS is 90 days.

Total number of days of moisture data stored in HDFS is 85days.

Total number of days of image data stored is 90 days.

3.3 Sample Data

Weather data

Fig 1: Sensor Weather data from Arduino

Moisture Data

6

Fig 2: Sensor data from Arduino

Image Data

Image data was captured every alternative day over a period of 90 days .

Fig 3: Images from the greenhouse

3.4 Schema

Weather data

Date Time Wind

direction

Wind

Speed

Humidity Temperature Rain Pressure Battery Light

Level

Table 1 : Weather Data Schema

Moisture Data

Date Time Moist

2

Moist

6

Moist

8

Moist

11

Moist

10

Moist

1

Moist

9

Moist

7

Moist

5

Temp Par

Table 2 : Weather Data Schema

7

3.5 Data Description:

Weather Data:

Date & time : Timestamp of the recording

Wind Direction: Direction of wind

Wind Speed: Speed of wind

Wind Gust: Gust of wind

Humidity: Percentage of water in air

Temperature: Temperature

Rain: Rain percentage

Pressure: Air pressure

Battery: Battery of Arduino

Light: Light exposure

Moisture Data

Moist 2: Moisture of plot 2









Temp: Soil temperature

PAR: Moisture metrics

8

4.0 PROJECT IMPLEMENTATION

4.1 Data Ingestion using Kafka

Kafka is the distributed messaging system which allows to transmit moisture and weather data from

Arduino chip to the HDFS. Kafka Architecture depends mainly on three components producer, broker and

consumer. Zookeeper is used to monitor the frequency of data following in and out of the Kafka broker.

The Below diagram is the architectural diagram of precision agriculture project. Kafka producer streams

the data that is produced in the greenhouse and sends it to the kafka broker. Kafka producer gets the

addresses of the broker thought zookeeper.

Fig 4: Kafka Architectural Diagram

4.2 Kafka producer

Kafka Producer is sender side of the Kafka distributed messaging system. Producer splits the messages to

their respective topics and sends to brokers based on topics. Producer also gets the address of the Kafka

brokers which is attached to the header of packet while sending the data.

The weather data, moisture data and image data differentiated using different topics such as “weather-data”,

”moisture-data” and “image-data”.

Below is the snippet to set up the Kafka producer with key and value set as string serialization. Bootstrap

server is the broker ID list of the Kafka broker.

9

Fig 5 : Configuring the kafka producer

Below is the snippet that is used to create message object which contains topic and messages to be sent to

the Kafka broker. Send function of Kafka producer binds the Kafka configuration instance with messages,

sends it to the broker.

Fig 6: Sending the message to kafka broker

4.4 Kafka Broker

Kafka Broker is the server side of the kafka distributed messaging system which is capable of handling

hundreds and hundreds of read and write operation per second. It can elastically expand without downtime.

Data Streams are partitioned and spread over a cluster of machines to allow data streams larger than

capability of single machine. The Kafka broker can be monitored using Zookeeper using port number

2181.By default Kafka broker comes with retention period of 168 hours.

10

Fig 7 : Monitoring the messages using Zookeeper

4.5 Kafka Consumer

Kafka Consumer is receiver side of the kafka distributed messaging system that fetches the data topic wise

from the brokers. Consumer runs in cluster and also stores the data in the HDFS for further processing.

Below is the sample consumer code which connects to the PA cluster. Topic set contains list of topics that

we are interested to fetch from the broker.

Fig 8 : Configuring Kafka Consumer

5.0 ADDITIONAL TOOLS

5.1 Maven

Maven was used as the dependency management to bring in all the jar from the server to the local repository.

This dependency injection help to develop the code from the windows environment .Maven helped to

specify the version of spark and kafka that was used and all the jar files related that version of spark was

stored in the local repository.

11

Fig 9 : Dependency Injection

5.2 Scala Build tool

Scala Build tool (SBT) was used to create the package and jar files which was transferred to cluster and vm

using winscp.

Fig 10 : SBT build

5.3 Git

Git is online code repository for storing all the code related to project. It offers all of the distributed

revision control and source code management (SCM). Git was used for precision agriculture project

repository to store the code online and share with team.

Below is the git link for the precision agriculture.

https://github.com/sri303030/Data-Ingestion-using-Kafka

https://github.com/sri303030/Data-Ingestion-using-Kafka

12

6.0 OUTPUT INTERPRETATION

The Streamed data with the help of consumer is send to the HDFS and stored as two different folder to

distinguish between weather data and moisture data.

Below is the output from the weather data folder

Fig 11: weather data folder

Below is output from the moisture data folder

Fig 12: Moisture Data Folder

7.0 IMPROVING THE KAFKA ARCHITECTURE

Kafka Architecture can be improved in two ways:

1. Making kafka architecture more robust.

2. Having dedicated Kafka Broker to improve performance

7.1. Making kafka architecture more robust

In precision agriculture project, both broker and consumer were running on the same system as the

requirement of data ingestion was to store data in HDFS. In order make the architecture more robust,

consumer system must be a remote system or cluster which have the access to kafka broker this way the

architecture will be more robust and in case of failure in kafka broker the data can be retrived from

consumer.

13

7.2. Having dedicated Kafka Broker to improve performance

Kafka Broker runs as part of the cluster in precision agriculture project . In order to avoid noise in the

cluster broker must be a dedicated system or set of systems. It also helps to eradicate the overhead that

kafka broker has got over hadoop environment and speeds up all the processes.

8.0 FUTURE RESEARCH

1. Implement the bridging between HDFS and SparkSQL and store table as persistent data in hive.

2. Implement real time machine learning using Spark Mllib

3. Connect the live data to the reporting tool and analyze live data and create useful reports.

9.0. CONCLUSION

Kafka is rapidly growing distributed messaging system having various application in the field of

engineering. Thus with the help precision agriculture project, agricultural data from greenhouse was

captured and streamed to hadoop environment using kafka and spark. This project also gave me exposure

to handle different big data problems in real time situation and helped me understand kafka architecture.

14

BIBLIOGRAPHY

http://kafka.apache.org/

Rahul Jain (2014) Real time Analytics with Apache Kafka and Apache Spark

Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A System for Real- time Twitter

Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Paper presented at the Proceedings of

the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of

Korea. http://www.aclweb.org/anthology/P12-3020

http://www.aclweb.org/anthology/P12-3020

Date post:	17-Jan-2017
Category:	Documents
Upload:	sriram30691
View:	37 times
Download:	0 times

Precision Agriculture Data Ingestion Using Kafka

Documents