Va Police Motor Stops
Load Libraries
library(tidyverse)
library(dplyr)
library(lubridate)
library(ggrepel)
covid <- read_csv(file =
"https://data.virginia.gov/api/views/bre9-aqqr/rows.csv?accessType=DOWNLOAD")
- Extractracting the Covid data for 4 cities Norfolk, Virginia Beach, Chesapeake, Portsmouth into a dataset named four_city.
cities <- c("Norfolk", "Virginia Beach", "Chesapeake", "Portsmouth")
four_city <- covid %>%
filter(Locality %in% cities)
dim(four_city)
## [1] 3824 7
- Plotting the total number of deaths by day for four cities in the same graph.
The combined total death of the four cities is shown below:
totaldeath_byday <- four_city %>%
mutate(`Report Date` = mdy(four_city$`Report Date`)) %>%
group_by(`Report Date`) %>%
summarise(total_deaths = as.integer(sum(`Deaths`))) %>%
ggplot() +
geom_line(mapping = aes(x =`Report Date`, y = total_deaths)) +
theme(axis.text.x = element_text(angle=90, v=.2)) +
labs(title="Reported total death by day for four cities")
totaldeath_byday
We can further divide the dataset into each of the four cities. Below is the total death by day for each of the four cities:
death_byday23 <- four_city %>%
mutate(`Report Date` = mdy(four_city$`Report Date`)) %>%
group_by(`Locality`,`Report Date`) %>%
summarise(total_deaths = as.integer(sum(`Deaths`))) %>%
ggplot() +
geom_line(mapping = aes(x =`Report Date`, y = total_deaths, color=Locality)) +
theme(axis.text.x = element_text(angle=90, v=.2)) +
labs(title="Reported total death by day for each city")
death_byday23
- Plotting the proportion of total number of deaths over the hospitalization for four cities n the same graph. The combined total death of the four cities is shown below:
totaldeath_byday <- four_city %>%
mutate(`Report Date` = mdy(four_city$`Report Date`)) %>%
group_by(`Report Date`) %>%
summarise(total_deaths = as.integer(sum(`Deaths`)),
total_hospit = as.integer(sum(`Hospitalizations`))) %>%
mutate(lagHospital = total_hospit - lag(total_hospit)) %>%
ggplot(mapping=aes(x=`Report Date`)) +
geom_col(mapping = aes(y = total_deaths)) +
geom_line(mapping= aes(y = total_hospit), show.legend = TRUE , color = "red") +
theme(axis.text.x = element_text(angle=90, v=.2)) +
labs(title="Reported total death (bar) over hospitalization (line)", y ="N cases")
totaldeath_byday
Community Policing Data
We will doing some analysis based on a data collection consisting of all traffic and investigatory stops made in Virginia as aggregated by Virginia Department of State Police. You can download the data set from here https://data.virginia.gov/Public-Safety/Community-Policing-Data-July-1-2020-to-May-31-2022/2c96-texw using this R code
police <- read_csv(file = "https://data.virginia.gov/api/views/2c96-texw/rows.csv?accessType=DOWNLOAD")
The dimension of the following data set:
dim(police)
## [1] 1882854 20
Each variable(features) of the data set is described as:
Column Name | Description Type |
---|---|
INCIDENT DATE | Indicates the date of the motor vehicle stop |
AGENCY NAME | Name of law enforcement agency |
JURISDICTION | Location of stop by city or county |
REASON FOR STOP | Indicates the initial reason for the motor vehicle/traffic stop |
PERSON TYPE | Indicates whether the person subject to the investigative detention stop is a Driver, Passenger, or Pedestrian/Individual |
RACE | Indicates the race of the Driver/Individual involved |
ETHNICITY | Indicates the ethnicity of the Driver/Individual involved |
AGE | Indicates the age of the Driver/Individual involved |
GENDER | Indicates the gender of the Driver/Individual involved |
ENGLISH SPEAKING | Indicates if the Driver/Individual speaks English |
ACTION TAKEN | Indicates the most serious action taken towards the Driver/Individual at the completion of the stop or as a result of the stop |
VIOLATION TYPE | Indicates if the violation was a local or commonwealth code (no longer collected as of July 1, 2021) |
SPECIFIC VIOLATION | Indicates the specific code section in connection with action taken |
VIRGINIA CRIME CODE | Indicates corresponding Virginia Crime Code (Optional) |
PERSON SEARCHED? | Indicates if the Driver/Individual was searched as a result of the stop |
VEHICLE SEARCHED? | Indicates if the vehicle was searched as a result of the stop |
ADDITIONAL ARREST? | Indicates a person OTHER THAN THE DRIVER was arrested as a result of the stop. (no longer collected as of July 1, 2021) |
PHYSICAL FORCE BY OFFICER | Indicates if the law-enforcement officer or State Police officer used physical force against the person |
PHYSICAL FORCE BY SUBJECT | Indicates if the subject used physical force against any officers |
RESIDENCY | Indicates the residency of the subject stopped |
2.Using ggplot2 to make a bar chart for total stops of these locations, mapping ACTION TAKEN to the color of the bar chart.
Finding the top 20 counties/cities in VA with the most stops:
total_20_stops <- police %>%
count(`JURISDICTION`) %>%
arrange(desc(n)) %>%
head(20)
cat("The top 20 JURISDICTION based on # of stops over VA are: \n")
## The top 20 JURISDICTION based on # of stops over VA are:
total_20_stops
We then filter the data set to only include the top 20 jurisdictions:
#Filter the data with the top 20 stops
to_graph_stops <- police %>%
filter(JURISDICTION %in% total_20_stops$JURISDICTION) %>%
group_by(JURISDICTION, `ACTION TAKEN`) %>%
summarise(n=n()) %>%
ggplot(aes(fill=`ACTION TAKEN`)) +
geom_bar(aes(x=JURISDICTION, y = n), stat="identity") +
theme(axis.text.x = element_text(angle=90, v=.2)) +
labs(title="Top 20 JURISDICTION of number of stops with Action Taken")
to_graph_stops
- Using ggplot2 to make a bar chart with decreasing order of the number of stops for each reason.
The percentage of the initial reasons for stops:
init_reasons <- police %>%
count(`REASON FOR STOP`, sort=TRUE) %>%
mutate(freq= ((n/sum(n)) * 100)) %>%
arrange(desc(n))
init_reasons
The number of stops for each reasons:
init_reasons %>% ggplot() +
geom_col(mapping=aes(x=reorder(`REASON FOR STOP`, - n), y=n)) +
theme(axis.text.x = element_text(angle=90, v=.2)) +
labs(title="Number of stops for each reason", x="Reasons for Stop", y="Number of Stops")
- Using ggplot2 to make a plot number of stops by date all over Virginia.
stops_by_date <- police %>%
mutate(`INCIDENT DATE` = as.Date(mdy_hms(police$`INCIDENT DATE`))) %>%
count(`INCIDENT DATE`, sort=TRUE) %>%
ggplot() +
geom_col(mapping = aes(x=`INCIDENT DATE`, y = n))
stops_by_date
- Using ggplot2 make a histogram of stops by age.
ggplot(data = police) +
geom_bar(mapping = aes(x=AGE))
Some of the data points have been set to AGE = 0. The distribution follows a normal left skew relationship between police stops and age. That is, a majority of the stops are for individuals between 20-30 years old. This may be attributed that most drivers are within this age group.
stops <- police[police$AGE == 0,]
cat("There are about", nrow(stops), "age misrepresented as 0's in the data set")
## There are about 38944 age misrepresented as 0's in the data set
- Using ggplot2 to create a pie chart of the stops by RACE, and labeling each piece of the chart by the percentage of the stops for each RACE.
stops_by_Race <- police %>%
count(`RACE`) %>%
mutate(freq = round((n/sum(n) * 100), digits=4)) %>%
mutate(cs = rev(cumsum(rev(freq))),
pos = freq/2 + lead(cs,1),
pos = if_else(is.na(pos), freq/2, pos)) %>%
ggplot(mapping=aes(x="", y=freq, fill=RACE)) +
geom_col() +
geom_label_repel(aes(label=paste0(freq, "%"), y=pos),
force=.5, nudge_x = .5) +
coord_polar(theta="y") +
labs(title="Percentage of stops for by RACE")
stops_by_Race
- Using ggplot2 to create a pie chart of the stops by GENDER, and labeling each piece of the chart by the percentage of the stops for each GENDER.
stops_by_Gender <- police %>%
count(`GENDER`) %>%
mutate(freq= round(n/sum(n)*100, digits=3)) %>%
mutate(cs = rev(cumsum(rev(freq))),
pos = freq/2 + lead(cs,1),
pos = if_else(is.na(pos), freq/2, pos)) %>%
ggplot(mapping=aes(x="", y=freq, fill=GENDER)) +
geom_col() +
geom_label_repel(aes(label=paste0(freq, "%"), y=pos),
force=.5, nudge_x = .5) +
coord_polar(theta="y") +
labs(title="Percentage of stops by Gender")
stops_by_Gender
- Using ggplot2 to create a stack bar chart for the number of stops by GENDER, stacked by ACTION TAKEN.
gender_stops <- police %>%
group_by(`GENDER`) %>%
ggplot() +
geom_bar(mapping=aes(x=`GENDER`, fill=`ACTION TAKEN`)) +
theme(axis.text.x = element_text(angle=90, v=.2)) +
labs(title="Most common action taken at the stops by GENDER")
gender_stops
A majority of the Action Taken, regardless of Gender, is Citations/Summons. The second most ACTION TAKEN is Warning.