Exploring Criminalistic Colombian Data with plotly#

In this notebook we are going to explore some graphing functions of plotly. For this we will use data from homicides registered in Colombia by the national police.

First we will need to clean the data so that it matches the info from the .json polygon data. After this, plotly will be used to graph the map with the different frequencies of homicides by region in Colombia.

# Data Structure
import json
import pandas as pd
from unidecode import unidecode
from urllib.request import urlopen

# Graph Modules Plotlty 
import plotly.graph_objects as go

Modifying the data#

The only thing we have to do in this case is to replace some names of regions of Colombia so that they match the data labels of the geometry file .json, so we will only use pandas and unidecode to remove accents from some words

df = pd.read_excel('homicidios2021.xlsx')
df.columns = [i.lower().strip() for i in df.columns]
df.head(5)
# import os
# os.listdir()
departamento municipio fecha genero *agrupa_edad_persona codigo_dane cantidad
0 AMAZONAS LETICIA (CT) 2021-01-23 MASCULINO ADULTOS 91001000 1
1 AMAZONAS LETICIA (CT) 2021-03-25 MASCULINO ADULTOS 91001000 1
2 AMAZONAS LETICIA (CT) 2021-06-06 MASCULINO ADULTOS 91001000 1
3 AMAZONAS LETICIA (CT) 2021-07-04 MASCULINO ADULTOS 91001000 1
4 AMAZONAS LETICIA (CT) 2021-10-31 MASCULINO ADULTOS 91001000 1
departamentos = []
for i in range(len(df)):
  if df.loc[i]['departamento'] == 'CUNDINAMARCA' and df.loc[i]['municipio'] == 'BOGOTÁ D.C. (CT)':
    departamentos.append('SANTAFE DE BOGOTA D.C')
  else:
    departamentos.append(df.loc[i]['departamento'])

df['departamento'] = departamentos
df['departamento'].replace({'GUAJIRA':'LA GUAJIRA','VALLE':'VALLE DEL CAUCA',
                            'SAN ANDRES':'ARCHIPIELAGO DE SAN ANDRES PROVIDENCIA Y SANTA CATALINA'},inplace=True)
df['departamento'].replace({old:unidecode(old) for old in df['departamento'].unique()},inplace=True)
df['departamento'].replace({'NARINO':'NARIÑO'},inplace=True)

contador = df['departamento'].value_counts()
contador
ANTIOQUIA                1943
VALLE DEL CAUCA          1770
CAUCA                     732
SANTAFE DE BOGOTA D.C     679
ATLANTICO                 571
NORTE DE SANTANDER        499
NARIÑO                    475
CUNDINAMARCA              408
BOLIVAR                   371
TOLIMA                    361
CORDOBA                   338
MAGDALENA                 338
SANTANDER                 310
HUILA                     290
META                      289
CHOCO                     260
CESAR                     258
SUCRE                     220
RISARALDA                 210
QUINDIO                   176
CALDAS                    171
PUTUMAYO                  165
LA GUAJIRA                159
ARAUCA                    157
CAQUETA                   115
BOYACA                     84
CASANARE                   81
SAN ANDRES                 30
GUAVIARE                   28
AMAZONAS                   23
VICHADA                    11
GUAINIA                     5
VAUPES                      1
Name: departamento, dtype: int64

Time To Use Plotly#

The homicide frequency data by region is ready, we simply implement plotly using polygon data to draw the map of Colombia (data extracted from the Github profile of john-guerra)

with urlopen('https://gist.githubusercontent.com/john-guerra/43c7656821069d00dcbc/raw/be6a6e239cd5b5b803c6e7c2ec405b793a9064dd/Colombia.geo.json') as response:
    counties = json.load(response)

locs = contador.index

for loc in counties['features']:
    loc['id'] = loc['properties']['NOMBRE_DPT']

fig = go.Figure(go.Choroplethmapbox(
                     geojson=counties,
                      locations=locs,
                      z=contador.values,
                      colorscale='deep',
                      colorbar_title="Number of homicides"),)
fig.update_layout(mapbox_style="carto-positron",
                        mapbox_zoom=5,
                        mapbox_center = {"lat": 4.570868, "lon": -74.2973328},
                        width=800, height=1000,
                        margin=dict(l=0,r=0,b=100,t=40,pad=4
    ))
fig.show()

An Interesting Ad#

This same type of statistics about homicides in the Colombian case is present for other types of crimes. I invite you to visit my Colombian crime statistics portal that I built with Streamlit and python