Analysis of Criminal Statistics in North Carolina

UC Berkeley | Statistics for Data Science
With Kevin Kory and Joy First



Project Description

A deep-dive analysis of criminial statistics in North Carolina, circa 1987, and persuasive argumentation for policymakers on actions likely to result in the diminution of county-level and state-level criminal behavior.

Skills

Data Analysis, Data Cleaning, Linear Regression, Statistical Inference, Written Communication, Decision Support

Tools

R, LaTeX, ggplot, Plotly, Jupyter


Motivation

Crime and it's predictive signals are complicated. The final project for the Fall 2019 semester of U.C. Berkeley graduate data science course (W203) asked us to conduct a deep-dive analysis on criminal statistics from North Carolina as captured in 1987, to prepare and evaluate the effectiveness of linear models on said data, and to provide policymakers with a set of data-driven recommendations.

The agenda of the group assignment was to evaluate student effectiveness in dealing with a wide range of common patterns when applying a statistical treatment to a problem: critically applying the concepts of statitics to data analysis, identifying and treating errata, cleverly leveraging outside resources without guidance, constructing linear models, identify assumptions and recognizing when they are violated, dealing with overfitting, and providing a persuasive argument to decision makers about what actions to take in a well-informed manner.

The final subssion for our documentation and summary presentation is available here. The GitHub repo for our project contains all code artifacts, including LaTeX, R, and source data. This assignment received a near perfect score - omitting only one insight about the IID nature of the 10 missing counties, and was instrumental in my receiving an A+ grade for the semester.



See the complete project in my GitHub repository.