+ All Categories
Home > Documents > Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using...

Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using...

Date post: 06-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
23
Visualizing Moon Phases and Crime Occurrence in Austin, Texas Kristin Sullivan University of Texas School of Information INF 385T Data Wrangling Fall 2015
Transcript
Page 1: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Visualizing Moon Phases and Crime Occurrence in Austin, Texas

Kristin Sullivan University of Texas School of Information INF 385T Data Wrangling Fall 2015

Page 2: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Table of Contents Table of Contents Summary Project Flowchart Datasets

Moon Phase Data Austin Crime Data

Database Design ER Diagram Relational Vocab Sample Tables

Database Creation Data Import

Importing Moon Data Inserting City Data Importing Crime Data and adding City / Moon Foreign Keys Running Multiple Python Commands using a Shell Script

Data Export SQL Queries for Visualization

Visualization using Tableau By Moon phase Period: Analysis: By Date of Moon Phase’s Fullest Expression: Analysis:

Page 3: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Summary This project attempts to demonstrate a connection between the occurrence of crimes in Austin, Texas between May 2014- November 2015 and the relevant moon phases in the same time period. My hypothesis is that more crimes occur when the moon is in its full period in comparison to other moon phase periods. The analysis is visualized using graphs in Tableau, which illustrates interesting patterns; however, it does not prove the original assumption. This document details my project workflow that starts with downloading datasets, creating a database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Project Flowchart

The above figure displays the workflow used for this project:

1. Data Sources: This project uses data sets from two sources. The APD Incident Report Data contains police crime incident reports in the City of Austin Texas over the course of 18 months. This data set displays incident number, crime type, date, time and location within the city. The other data file is for moon phases, which contains phase information for New Moon, First Quarter, Full Moon, and Last Quarter and the dates on which the phases occurred over the course of 2014-2016.

Page 4: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

2. Moon and crime comparison database: The above files were exported from a csv using Python.

3. Files extracted from SQL queries: I will make SQL queries to the database for overall count of crimes for each moon phase, as well each individual moon phase period. I will export the query results from MySQL into a csv file using a python script. The data from this file will be used for later visualization.

4. Visualization: Using Tableau and Adobe Illustrator, I will visualize how moon phases may have affected crimes in Austin, Texas and display the results in a line chart with a moon phase visual. I will create the chart in Tableau and export an image of the chart into Adobe Illustrator to add in moon phase imagery.

Datasets

Moon Phase Data Link: http://www.somacon.com/p570.php The moon phase data was generated from the site Somacon, which provided a list of moon phase dates and times for any year in a CSV format. The data originates from U.S. Naval Observatory Moon Phases Tool. The moon phase tool included the following columns (ones in bold indicate the columns relevant to this project):

● date - in month/day/year format ● time - in hour:minute AM/PM format ● phase - the moon phase spelled out ● phaseid - an ID from 1 to 4 for the moon phase ● datetime - the date/time in MySQL format - YYYY-MM-DD HH:MM:SS ● timestamp - the unix time stamp ● friendlydate - date in format like "January 1, 2011"

The selected moon phase date range for this project was January 1st, 2014 to January 1st, 2016.

Austin Crime Data Link: https://data.austintexas.gov/Public-Safety/APD-Incident-Extract-YTD/b4y9-5x39 The Austin, TX crime data was obtained through the City of Austin’s online data portal from the Austin Police Department Incident Extract. The data from this source represents calls for police service where a report was written. This data source is continuously being updated where information is available from whatever the present date is to going back 18 months. For this project, I downloaded the CSV on November 5th, 2015. The crime occurrence date range is from May 6th, 2014 to November 5th, 2015. This police incident extract included the following columns (ones in bold indicate the columns relevant to this project):

● Incident Report Number

Page 5: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

● Crime Type ● Date ● Time ● LOCATION_TYPE ● ADDRESS ● LONGITUDE ● LATITUDE ● Location 1

Database Design

ER Diagram

The above figure represents the ER diagram used for this project’s database. Crime Table: The Crime table is the connection between Moon and City, so that city’s id and moon’s id can be recorded as foreign keys in the Crime table. Moon Table: In this project model, the Moon table records moon phases as having a start and end period, which includes a date range between the fullest expression of a phase (start) and the start of the subsequent phase (end). This acknowledges that the sun’s illumination of the moon is a

Page 6: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

continuous process and would allow for crimes not falling on the date of the phase’s fullest expression to still have a corresponding moon.id. City Table: City is included as a distinct table in this project as there is potential to add in crime data sets from other cities and states besides Austin, TX.

Relational Vocab City: has many Crime Crime: belongs to City Moon has many Crime Crime belongs to Moon

Sample Tables

moon

ID phasetype phaseid start_datetime end_datetime

1 New Moon 1 1/1/14 6:14

2 First Quarter 2 1/7/14 22:39

3 Full Moon 3 1/15/14 23:52

4 Last Quarter 4 1/24/14 0:20

5 New Moon 1 1/30/14 16:38

6 First Quarter 2 2/6/14 14:22

7 Full Moon 3 2/14/14 18:53

8 Last Quarter 4 2/22/14 12:15

9 New Moon 1 3/1/14 3:00

10 First Quarter 2 3/8/14 8:27

Page 7: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

city

ID name state

1 Austin Texas

crime

ID type datetime moon_id city_id

1 AGG ASLT STRANGLE/SUFFOCATE

2015/01/31 9:35

5 1

2 CRIMINAL MISCHIEF

2015/02/02 19:30

5 1

3 ASSAULT W/INJURY-FAM/DATE VIOL

2015/02/27 19:26

8 1

4 DWI 2015/03/05 20:00

9 1

100 BMV 2015/01/31 23:30

5 1

101 UCR - THEFT OF BICYCLE

2015/01/31 23:30

5 1

102 CRIMINAL TRESPASS

2015/01/31 23:05

5 1

103 THEFT OF PROP > OR EQUAL $500 BUT <$1,500 (BY EMPLOYEE)

2015/01/31 23:00

5 1

104 ASSAULT -OFFENSIVE CONTACT

2015/01/31 23:00

5 1

Page 8: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Database Creation The project database was created using PHPmyAdmin. Overall database structure:

City Table:

Crime Table:

Page 9: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Moon Table:

Page 10: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Data Import The data from the csv data files was loaded into the database using two Python scripts (moon_csv_to_mysql.py and fk_crime_csv_to_mysql.py). Here are excerpts from the files:

Importing Moon Data excerpts from moon_csv_to_mysql.py:

Page 11: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.
Page 12: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Challenges:

● Reformatting Moon Datetime ○ The date time from the moonphase.csv was originally in the format “3/1/14

3:00”. I was able to reformat the date into SQL’s date time format using .strptime and .strftime. This code can be viewed in the above figure on Lines 40-41.

● Calculating Moon Phase End Date ○ The Moon table’s end_date was not a column in the original moonphase.csv.

End_date for a moon phase period was calculated by using Python to subtract 1 second from the subsequent moon phase’s start_date. This code can be viewed in the above figure on Lines 53-82.

■ Lines 53-60: First a SQL query was used to select each moon phase’s id and start date in order. The results from this query were recorded in the variable “results” (a list of dicts).

■ Line 67: An index was applied to the results and enumerate was used to read through this index sequentially.

■ Lines 73-75: By adding 1 to the index of results, end_date was first recorded as the next start_date in the list of dicts.

■ Line 77: After end_date was calculated from the correct next start_date, 1 second was subtracted from the newly defined end_date.

■ Lines 78-80: For the last start_date in the moon phase list of dicts, there was no subsequent start_date to use to calculate end_date. Therefore, an else statement was used to add 6.5 days (the length of a moon phase) to this final moon phase start_date.

■ Lines 87-94: Once end_date was correctly defined, a SQL UPDATE statement was used to update the project database with moon phase end dates with the corresponding moon ids.

Inserting City Data Since this project only includes data for one city- Austin, Texas, this information was inputted manually into PHPmyadmin.

Page 13: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Importing Crime Data and adding City / Moon Foreign Keys excepts from fk_crime_csv_to_mysql.py:

Page 14: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Challenges:

● Reformatting Crime Datetime

Page 15: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

○ The datetime from the austincrime.csv was originally split into two columns “Date” and “Time” in the format Date: 4/14/15 and Time: 500. These two columns were merged into one column “Occur_date” for the project database and the datetime format was accomplished using .strptime and .strftime. This code is visible in the above figure on Lines 41-47.

○ Several of the times recorded in the austincrime.csv were incorrectly recorded as a single digit (e.g., “0” for midnight, “1” for 1AM or perhaps 1PM). This was problematic for formatting datetimes by Hour/Minute. In order to work around this, an if statement was used to indicate that if a time length was less than 3 digits, record that time as “1200” or noon. This code is visible in the above figure on Lines 37-38.

● Inserting Moon Foreign Key ○ The moon_id in the Crime table is determined based off where a crime’s

occurrence date falls between a moon phase’s start and end date. This code is visible on Lines 54-62, and 78-89.

● Inserting City Foreign Key ○ The city_id in the Crime table is selected from id in the database’s city table

This code is visible on Lines 66-89.

Running Multiple Python Commands using a Shell Script A shell script was used to run the Python scripts in order (runMoonproject.sh). Here is an excerpt from the file:

Page 16: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Data Export The data for visualization was queried using a python script and exported into a csv file. Here is an excerpt of the script:

Page 17: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

SQL Queries for Visualization

Page 18: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.
Page 19: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Visualization using Tableau For visualization, I imported 2 csvs into Tableau that were created in my Data Export process. The following 6 visualizations demonstrate Austin’s crime in relation to both the moon phase’s period (i.e., date range between the fullest expression of a phase and the start of the subsequent phase - approximately 6.5 days) and the exact date of the moon phase’s fullest expression (e.g., date of a full moon). To examine crime by both the moon phase period (Viz 1-3) and exact expression date (Viz 4-6), I created 3 visualizations that display the same criteria for purposes of comparison. These visualizations illustrate crime in relation to moon phases over the course of April 2014 to November 2015. Entries in the original dataset between April 2014 to December 2015 are limited, which explains the sharp increase in crime starting in January 2015. I decided to skip exporting these images to Illustrator, as Tableau offers a dynamic and more coherent view of this data. To learn about Tableau, I consulted other iSchool students who have previously used the tool, watched online tutorials, and also played with arranging my data through trial and error.

By Moon phase Period: Viz 1: Crime Count in Austin, TX by Phasetype and Moon phase Period (start to end date) https://public.tableau.com/profile/kristin1567#!/vizhome/AustinTXCrimeCountbyMoonphasePeriods/Sheet1

Page 20: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Vis 2: Crime Count in Austin, TX by Moon phase Period (start to end date) https://public.tableau.com/profile/kristin1567#!/vizhome/AustinTXCrimeCountbyMoonphasePeriods/Sheet2

Viz 3: Overall Crime Count in Austin, TX by Moon phase Period https://public.tableau.com/profile/kristin1567#!/vizhome/AustinTXCrimeCountbyMoonphasePeriods/Sheet3

Page 21: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Analysis:  In viewing Austin’s crime by moon phase period, the highest incidents of crime occur when the moon is in its First Quarter period (33,554). However, the Full Moon period also hosts a similar number of crimes (33,524), which can be observed in Viz 3. A potentially interesting pattern can be observed in Viz 2, in which more crimes occur in the Full Moon periods at the beginning of 2015 and decrease halfway through the year, while crimes in the New Moon periods are low at the beginning of 2015, then increase halfway through the year. Extending the year range of the moon and crime data would be necessary in order to draw any clearer correlation.

By Date of Moon Phase’s Fullest Expression: Viz 4: Crime Count in Austin, TX by Phasetype and Date of Moon phase Full Expression https://public.tableau.com/profile/kristin1567#!/vizhome/crimecountonphasefullexpression/Sheet3

Page 22: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Viz 5: Crime Count in Austin, TX by Date of Moon phase Full Expression https://public.tableau.com/profile/kristin1567#!/vizhome/crimecountonphasefullexpression/Sheet1

Viz 6: Overall Crime Count in Austin, TX by Date of Moon phase Full Expression https://public.tableau.com/profile/kristin1567#!/vizhome/crimecountonphasefullexpression/Sheet2

Page 23: Visualizing Moon Phases and Crime Occurrence in Austin, Texas · database, importing data using Python, exporting data for visualization, and finally creating visualizations in Tableau.

Analysis:

In viewing Austin’s crime incidents by the moon phase’s fullest expression date, the Full Moon dates have significantly lower crime (2,018) in comparison to the other moon phase expressions. This can be observed in Viz 6. This goes against my original hypothesis that more crimes occur when the moon is Full. Rather, at least in 2014-2015, it appears that more crimes occur on the dates when the moon is in its New or First Quarter expressions. The addition of moon phase and Austin crime incident data from earlier years would be interesting in order to observe if this is a pattern over time. Performing a similar examination of the relationship between moon phases and crime occurrence with crime datasets from other cities could also establish if this is a pattern across multiple locations.


Recommended