INSEE : Nombre de mort par tranche d’age

J’ai donc fait un nouveau graphique, du nombre de mort par tranche d’age avec les données de l’INSEE. (Tranche de 10 ans)

Le graphique est faux sur la fin 2020, il ne sera juste que vers mi-fevrier 2021. Quand l’INSEE aura publié les données.

Voici donc mon process :

Etape 1 : Téléchargement des données de l’INSEE : https://www.insee.fr/fr/information/4190491

Etape 2 : je mets tout sur un même fichier:

# cat deces-* Deces_2020_M* | grep -v "nomprenom" > Full.csv
# wc -l Full.csv 
  25528867 Full.csv

Etape 3 : Je fais tourner un premier programme en Python :

# cat parse2.py 
import csv
import datetime
from dateutil.relativedelta import relativedelta

with open('Full.csv', 'rt') as f:
    csv_reader = csv.reader(f, quotechar='"', delimiter=';', quoting=csv.QUOTE_ALL, skipinitialspace=True)

    for line in csv_reader:
        #print(line[2])
        if (len(line[2]) == 8) and (not (str(line[2]).endswith("00"))):
            try:
               start_date = datetime.datetime.strptime(line[2],"%Y%m%d");
            except:
               print("error1",line[2])
        #print(line[6])
        if (len(line[6]) == 8) and (not (str(line[6]).endswith("00"))):
             try:
               end_date = datetime.datetime.strptime(line[6],"%Y%m%d");
               age = relativedelta(end_date, start_date).years
               #print(line[6])
               year = end_date.year
               month = end_date.month
               if  age <= 10:
                   print year,",",month,", 0to10"
               if 10 < age <= 20:
                   print year,",",month,", 10to20"
               if 20 < age <= 30:
                   print year,",",month,", 20to30"
               if 30 < age <= 40:
                   print year,",",month,", 30to40"
               if 40 < age <= 50:
                   print year,",",month,", 40to50"
               if 50 < age <= 60:
                   print year,",",month,", 50to60"
               if 60 < age <= 70:
                   print year,",",month,", 60to70"
               if 70 < age <= 80:
                   print year,",",month,", 70to80"
               if 80 < age <= 90:
                   print year,",",month,", 80to90"
               if 90 < age : print year,",",month,", more90" except: print("error2",line[6]) # python parse2.py > age2.csv

Etape 4 : J’ordonne et je fais le ménage (je garde que les année 20xx):

# cat clear.bash 

cat age2.csv | grep -v "error" | sort -n | uniq -c > sort-age2.csv
echo "Date,Number" > 0to10.csv
grep "0to10" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 0to10.csv
echo "Date,Number" > 10to20.csv
grep "10to20" sort-age2.csv  | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 10to20.csv
echo "Date,Number" > 20to30.csv
grep "20to30" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 20to30.csv
echo "Date,Number" > 30to40.csv
grep "30to40" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 30to40.csv
echo "Date,Number" > 40to50.csv
grep "40to50" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 40to50.csv
echo "Date,Number" > 50to60.csv
grep "50to60" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 50to60.csv
echo "Date,Number" > 60to70.csv
grep "60to70" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 60to70.csv
echo "Date,Number" > 70to80.csv
grep "70to80" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 70to80.csv
echo "Date,Number" > 80to90.csv
grep "80to90" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> 80to90.csv
echo "Date,Number" > more90.csv
grep "more90" sort-age2.csv | awk '{if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print $2 "-0" $4 "," $1 ;} }' | grep "^20" | sort -n >> more90.csv

# ./clear.bash

Ou bien je garde uniquement > 1975 avec clear2.bash

 

cat clear2.bash 

cat age2.csv | grep -v "error" | sort -n | uniq -c > sort-age2.csv
echo "Date,Number" > 0to10.csv
grep "0to10" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }'  | sort -n >> 0to10.csv
echo "Date,Number" > 10to20.csv
grep "10to20" sort-age2.csv  | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }' | sort -n >> 10to20.csv
echo "Date,Number" > 20to30.csv
grep "20to30" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }' | sort -n >> 20to30.csv
echo "Date,Number" > 30to40.csv
grep "30to40" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }' | sort -n >> 30to40.csv
echo "Date,Number" > 40to50.csv
grep "40to50" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }'  | sort -n >> 40to50.csv
echo "Date,Number" > 50to60.csv
grep "50to60" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }'  | sort -n >> 50to60.csv
echo "Date,Number" > 60to70.csv
grep "60to70" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }' | sort -n >> 60to70.csv
echo "Date,Number" > 70to80.csv
grep "70to80" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }' | sort -n >> 70to80.csv
echo "Date,Number" > 80to90.csv
grep "80to90" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }'  | sort -n >> 80to90.csv
echo "Date,Number" > more90.csv
grep "more90" sort-age2.csv | awk '{if ($2 > 1975) {if ($4 > 9) {print  $2 "-" $4 "," $1 ;} else if ($4 < 10) {print  $2 "-0" $4 "," $1 ;}} }'  | sort -n >> more90.csv

Etape 5 : Je dessine :

# cat draw2.py 
import plotly.graph_objects as go

import pandas as pd

fig = go.Figure()

df = pd.read_csv('./0to10.csv')
fig.add_trace(go.Scatter(x=df['Date'], y=df['Number'],name='10 to 20'))

df2 = pd.read_csv('./10to20.csv')
fig.add_trace(go.Scatter(x=df2['Date'], y=df2['Number'],name='10 to 20'))

df3 = pd.read_csv('./20to30.csv')
fig.add_trace(go.Scatter(x=df3['Date'], y=df3['Number'],name='20 to 30'))

df4 = pd.read_csv('./30to40.csv')
fig.add_trace(go.Scatter(x=df4['Date'], y=df4['Number'],name='30 to 40'))

df5 = pd.read_csv('./40to50.csv')
fig.add_trace(go.Scatter(x=df5['Date'], y=df5['Number'],name='40 to 50'))

df6 = pd.read_csv('./50to60.csv')
fig.add_trace(go.Scatter(x=df6['Date'], y=df6['Number'],name='50 to 60'))

df7 = pd.read_csv('./60to70.csv')
fig.add_trace(go.Scatter(x=df7['Date'], y=df7['Number'],name='60 to 70'))

df8 = pd.read_csv('./70to80.csv')
fig.add_trace(go.Scatter(x=df8['Date'], y=df8['Number'],name='70 to 80'))

df9 = pd.read_csv('./80to90.csv')
fig.add_trace(go.Scatter(x=df9['Date'], y=df9['Number'],name='80 to 90'))

df10 = pd.read_csv('./more90.csv')
fig.add_trace(go.Scatter(x=df10['Date'], y=df10['Number'],name='more 90'))


fig.show()

#  python3 draw2.py

Le résultat, j’ai un problème sur décembre 1989 :

Une réflexion sur « INSEE : Nombre de mort par tranche d’age »

  1. J’ai fait un programme pour étudier le même truc, et même soucis à cette période.

    Après analyse, il y a plein de doublons à cette période…

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Time limit is exhausted. Please reload CAPTCHA.