NYC Data Analysis

Introduction View result map See data chart GitHub repository

Results

We have tried different approaches to predict crime rate, including using Multiple Linear Regression, Support Vector Machine Regression, and Neural Networks.

Below are the result maps. For the ease of viewing the results, we have bundled them by state. If you would like to see more detailed data, please go to this project's GitHub repository.

Actual Crime Rate Per State

Predicted Crime Rate Per State - Linear Regression

Predicted Crime Rate Per State - Support Vector Machine Regression

Predicted Crime Rate Per State - Neutral Networks (2 layers: hidden and output)

Conclusion

As you can see from the above, we used Multiple Linear Regression, Support Vector Machine Regression and Neural Networks approaches to predict crime rate in U.S. communities.

Throughout the training process, we found many dominant factors, including the followings: (from the most dominant to less dominant)

percentage of kids in family housing with two parents
percentage of kids born to never married
percentage of population that is caucasian
percentage of males who are divorced
percent of persons in dense housing
percentage of population that is african american
percentage of people living in areas classified as urban
percent of housing occupied
percentage of moms of kids under 18 in labor force
percent of vacant housing that is boarded up
percent of officers assigned to drug units
number of homeless people counted in the street

Based on the above maps and detailed data, even though it's hard to predict the actual crime rate per community, we still got a good amount of reasonable results. If we go into detail, we can also see that Support Vector Machine Regression is the most robust approach among those techniques. Neural Networks approach seems not as robust as we expected. For more detailed charts, please see our data chart page.