Thunderbird mbox to ( Influxdb, Postgresql, mysql ) to Grafana in Python

J’ai amélorié ( voir : https://www.cyber-neurones.org/2020/03/thunderbird-mbox-to-influxdb-and-postgresql-to-grafana-in-python/ ) le programme afin d’injecter sur MySQL ( MariaDB en vérité ). Le plus facile a manipuler sur Grafana c’est MariaDB.

Pour se connecter de Grafana à MariaDB :

Les requêtes SQL pour Grafana :

Par jours :

SELECT
UNIX_TIMESTAMP(date) AS time_sec,
domain as ‘metric’,
count(domain) as value
FROM thunderbird
WHERE
$__timeFilter(date)
GROUP BY DAY(date),MONTH(date),YEAR(date)
ORDER BY date

Par mois :

SELECT
UNIX_TIMESTAMP(date) AS time_sec,
domain as ‘metric’,
count(domain) as value
FROM thunderbird
WHERE
$__timeFilter(date)
GROUP BY MONTH(date),YEAR(date)
ORDER BY date

Par années :

SELECT
UNIX_TIMESTAMP(date) AS time_sec,
domain as ‘metric’,
count(domain) as value
FROM thunderbird
WHERE
$__timeFilter(date)
GROUP BY YEAR(date)
ORDER BY date

Les sources du programme :

( Source sur : https://github.com/farias06/Python/blob/master/parse_email_v2.py )

#! /usr/bin/env python3
# ~*~ utf-8 ~*~

# Readme :
# ARIAS FREDERIC 
# 
# influx user create -n arias -p arias -o cyberneurones-org 
#

import mailbox
import bs4
import glob
import os
import time
import codecs
import sys

#
#
from influxdb import InfluxDBClient
import re
from datetime import datetime

#
#
from email.utils import parsedate_to_datetime

#
#
import logging

#
#  pip3 install psycopg2
import psycopg2

#  pip3 install mysql-connector-python-rf
#  pip3 install mysql

# MariaDB [mysql]> create database thunderbird;
# Query OK, 1 row affected (0.00 sec)

# MariaDB [(none)]> CREATE USER 'arias'@'localhost' IDENTIFIED BY 'arias';
# Query OK, 0 rows affected (0.00 sec)

# MariaDB [(none)]> GRANT USAGE ON *.* TO 'arias'@'localhost' IDENTIFIED BY 'arias';
# Query OK, 0 rows affected (0.00 sec)

# MariaDB [(none)]>  GRANT ALL privileges ON thunderbird.* TO 'arias'@'localhost';
# Query OK, 0 rows affected (0.00 sec)

# MariaDB [(none)]> FLUSH PRIVILEGES;
# Query OK, 0 rows affected (0.00 sec)

import mysql.connector as mariadb

#########################

logger = logging.Logger('catch_all')

#########################

global nb_folder
nb_folder = 0;
global nb_email
nb_email = 0;
global nb_error
nb_error = 0;
global id_email
id_email = 0;

global flag_influxdb
flag_influxdb = False

global flag_postgresql
flag_postgresql = False

global flag_mysql
flag_mysql = True

global name_Table
name_DB = 'thunderbird'
name_Table = 'thunderbird'
my_login = 'arias'
my_password = 'arias'
my_host = '127.0.0.1'
Login = 'arias';
Folder = 'zy3zk9ms.default';

global client
if (flag_influxdb == True):
   client = InfluxDBClient(host=my_host, port=8086, username=my_login, password=my_password)
   client.drop_database(name_Table)
   client.create_database(name_Table)
   client.switch_database(name_Table)

global client2
if (flag_postgresql == True):
   client2 = psycopg2.connect("dbname="+name_DB+" user="+my_login+" password='"+my_password+"'")
   cursor2 = client2.cursor()
   sqlCreateTable = "create table "+name_Table+" (id bigint UNIQUE, mail varchar(128), name varchar(128), domain varchar (128), date timestamp);"
   cursor2.execute(sqlCreateTable)
   sqlCreateTable = "delete from "+name_Table;
   cursor2.execute(sqlCreateTable)
   client2.commit()

global client3
if (flag_mysql == True):
   client3 = mariadb.connect(user=my_login, password=my_password, database=name_DB)
   cursor3 = client3.cursor()
   #sqlCreateTable = "create table "+name_Table+" (id bigint UNIQUE, mail varchar(128), name varchar(128), domain varchar (128), date datetime);"
   #cursor3.execute(sqlCreateTable)
   sqlCreateTable = "delete from "+name_Table;
   cursor3.execute(sqlCreateTable)
   client3.commit()

#########################

def get_html_text(html):
    try:
        return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
    except AttributeError: # message contents empty
        return None

class GmailMboxMessage():
    def __init__(self, email_data):
        if not isinstance(email_data, mailbox.mboxMessage):
            raise TypeError('Variable must be type mailbox.mboxMessage')
        self.email_data = email_data

    def parse_email(self):
        global client
        global client2
        global id_email
        global name_Table
        global cursor2
        global cursor3
        global flag_influxdb
        global flag_postgresql
        global flag_mysql
        email_date = self.email_data['Date']
        email_from = self.email_data['From']
        email_to = self.email_data['To']
        email_subject = self.email_data['Subject']
        if email_date is not None and email_from is not None:
            mail = re.search(r'[\w\.\-_]+@[\w\.\-_]+', email_from)
            if mail is not None:
                mailstr = mail.group(0)
            if mail is not None:    
                domain = re.search("@[\w\.\-_]+", email_from).group(0)
                domain = domain.replace('@', '')
                domain = domain.replace('>', '')
            if mail is not None:
                user = re.search("[\w\.i\-_]+@", email_from).group(0)
                user = user.replace('@', '')
                user = user.replace('<', '')
            local_time_str = datetime.fromtimestamp(parsedate_to_datetime(email_date).timestamp()).strftime('%Y-%m-%dT%H:%M:%S.%f%z')
            local_time_str2 = datetime.fromtimestamp(parsedate_to_datetime(email_date).timestamp()).strftime('%Y-%m-%d %H:%M:%S')
            timestamp = round(parsedate_to_datetime(email_date).timestamp() * 1000);
            if mail is not None:
                data = [{'measurement': 'thunderbirds', 'tags': { 'fullemail': 1, 'from': email_from, 'mail': mailstr, 'domain': domain, 'user': user}, 'id' : id_email, 'time': timestamp, 'date':local_time_str, 'fields': {"value": 1}}]
                sql = "INSERT INTO "+name_Table+" (mail, domain, name, id, date) VALUES ('" +mailstr+ "','" + domain+"','" +user+"','"+str(id_email)+"',TIMESTAMP '"+local_time_str2+"')";
                sql2 = "INSERT INTO "+name_Table+" (mail, domain, name, id, date) VALUES ('" +mailstr+ "','" + domain+"','" +user+"','"+str(id_email)+"','"+local_time_str2+"')";
                #print (sql2)
                if (flag_postgresql == True):
                   cursor2.execute(sql);
                   client2.commit();
                if (flag_mysql == True):
                   cursor3.execute(sql2);
                   client3.commit();
            else :
               data = [{'measurement': 'thunderbirds', 'tags': { 'fullemail': 0, 'from': email_from }, 'id' : id_email, 'time': timestamp, 'date':local_time_str, 'fields': {"value": 1}}] 
            #print (data);
            if (flag_influxdb == True):
               client.write_points(data, time_precision='ms')
            id_email = id_email+1

def mbox_reader(stream):
    data = stream.read()
    text = data.decode(encoding="utf-8")
    return mailbox.mboxMessage(text)

######################### End of library, example of use below

print("\nUsing glob.iglob()") 
for filename in glob.iglob('/home/'+Login+'/snap/thunderbird/common/.thunderbird/'+Folder+'/Mail/Local Folders/**/*', recursive=True): 
    print(filename);
    filename2, file_extension = os.path.splitext(filename);
    print(file_extension + " " + str(len(file_extension)));
    isFile = os.path.isfile(filename)
    if (file_extension != ".msf") and (file_extension != ".sbd") and isFile is True:
        mbox_obj = mailbox.mbox(filename);
        num_entries = len(mbox_obj)
        nb_folder = nb_folder + 1;
        try :
           for idx, email_obj in enumerate(mbox_obj):
               email_data = GmailMboxMessage(email_obj)
               email_data.parse_email()
               nb_email = nb_email + 1;
               print('Parsing email {0} of {1}'.format(idx, num_entries))
        except StopIteration:
           continue
        except Exception as e: 
           logger.error('Failed : '+ str(e))
           nb_error = nb_error+1;
           continue

print('The number of folder :'+str(nb_folder));    
print('The number of email :'+str(nb_email));
print('The number of error : '+str(nb_error));
print('The number in database : '+str(id_email));

Thunderbird mbox to Influxdb and Postgresql to Grafana in Python

J’ai donc fait un programme en python afin de faire un export des données de Thunderbird ( fichier mbox contenant les emails) vers Influxdb et Postgresql (dans un premier temps).

Le but du programme est de maitriser le python afin de faire ensuite des exports vers Grafana. Sur mes exemples j’ai mis en login/password arias/arias (je sais c’est pas secure mais c’est un exemple …).

Configuration GRAFANA:

Le programme :

( Source disponible ici : https://github.com/farias06/Python/blob/master/parse_email.py )

#! /usr/bin/env python3
# ~*~ utf-8 ~*~

# Readme :
# influx user create -n arias -p arias -o cyberneurones-org 

import mailbox
import bs4
import glob
import os
import time
import codecs
import sys
from influxdb import InfluxDBClient
import re
from datetime import datetime
from email.utils import parsedate_to_datetime
import logging
import psycopg2

logger = logging.Logger('catch_all')

#########################

global nb_folder
nb_folder = 0;
global nb_email
nb_email = 0;
global nb_error
nb_error = 0;
global id_email
id_email = 0;

global client
client = InfluxDBClient(host='127.0.0.1', port=8086, username='arias', password='arias')
client.drop_database('thunderbird')
client.create_database('thunderbird')
client.switch_database('thunderbird')

global client2
client2   = psycopg2.connect("dbname=thunderbird user=arias password='arias'")
cursor                = client2.cursor()
global name_Table
name_Table            = "thunderbird"
#sqlCreateTable = "create table "+name_Table+" (id bigint UNIQUE, mail varchar(128), name varchar(128), domain varchar (128), date timestamp);"
sqlCreateTable = "delete from "+name_Table;
cursor.execute(sqlCreateTable)
client2.commit()

#########################

def get_html_text(html):
    try:
        return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
    except AttributeError: # message contents empty
        return None

class GmailMboxMessage():
    def __init__(self, email_data):
        if not isinstance(email_data, mailbox.mboxMessage):
            raise TypeError('Variable must be type mailbox.mboxMessage')
        self.email_data = email_data

    def parse_email(self):
        global client
        global client2
        global id_email
        global name_Table
        global cursor
        email_date = self.email_data['Date']
        email_from = self.email_data['From']
        email_to = self.email_data['To']
        email_subject = self.email_data['Subject']
        if email_date is not None and email_from is not None:
            mail = re.search(r'[\w\.\-_]+@[\w\.\-_]+', email_from)
            if mail is not None:
                mailstr = mail.group(0)
            if mail is not None:    
                domain = re.search("@[\w\.\-_]+", email_from).group(0)
                domain = domain.replace('@', '')
                domain = domain.replace('>', '')
            if mail is not None:
                user = re.search("[\w\.i\-_]+@", email_from).group(0)
                user = user.replace('@', '')
                user = user.replace('<', '')
            local_time_str = datetime.fromtimestamp(parsedate_to_datetime(email_date).timestamp()).strftime('%Y-%m-%dT%H:%M:%S.%f%z')
            local_time_str2 = datetime.fromtimestamp(parsedate_to_datetime(email_date).timestamp()).strftime('%Y-%m-%d %H:%M:%S')
            timestamp = round(parsedate_to_datetime(email_date).timestamp() * 1000);
            if mail is not None:
                data = [{'measurement': 'thunderbirds', 'tags': { 'fullemail': 1, 'from': email_from, 'mail': mailstr, 'domain': domain, 'user': user}, 'id' : id_email, 'time': timestamp, 'date':local_time_str, 'fields': {"value": 1}}]
                sql = "INSERT INTO "+name_Table+" (mail, domain, name, id, date) VALUES ('" +mailstr+ "','" + domain+"','" +user+"','"+str(id_email)+"',TIMESTAMP '"+local_time_str2+"')";
                #print (sql)
                cursor.execute(sql);
                client2.commit();
            else :
               data = [{'measurement': 'thunderbirds', 'tags': { 'fullemail': 0, 'from': email_from }, 'id' : id_email, 'time': timestamp, 'date':local_time_str, 'fields': {"value": 1}}] 
            #print (data);
            client.write_points(data, time_precision='ms')
            id_email = id_email+1

def mbox_reader(stream):
    data = stream.read()
    text = data.decode(encoding="utf-8")
    return mailbox.mboxMessage(text)

######################### End of library, example of use below

print("\nUsing glob.iglob()") 
for filename in glob.iglob('/home/ZZZZZZZ/snap/thunderbird/common/.thunderbird/ZZZZZZZZZ.default/Mail/Local Folders/**/*', recursive=True): 
    print(filename);
    filename2, file_extension = os.path.splitext(filename);
    print(file_extension + " " + str(len(file_extension)));
    isFile = os.path.isfile(filename)
    if (file_extension != ".msf") and (file_extension != ".sbd") and isFile is True:
        mbox_obj = mailbox.mbox(filename);
        num_entries = len(mbox_obj)
        nb_folder = nb_folder + 1;
        try :
           for idx, email_obj in enumerate(mbox_obj):
               email_data = GmailMboxMessage(email_obj)
               email_data.parse_email()
               nb_email = nb_email + 1;
               print('Parsing email {0} of {1}'.format(idx, num_entries))
        except StopIteration:
           continue
        except Exception as e: 
           logger.error('Failed : '+ str(e))
           nb_error = nb_error+1;
           continue

print('The number of folder :'+str(nb_folder));    
print('The number of email :'+str(nb_email));
print('The number of error : '+str(nb_error));
print('The number in database : '+str(id_email));

L’execution du programme:

...
The number of folder :1321
The number of email :120384
The number of error : 125
The number in database : 113523

Oui j’ai plus de 113523 emails ( environ 30 Go) … misère. Pour information seul les softs open source sont stables avec autant d’email, sous Outlook c’est le crash.

Le résultat :

Municipales 2020 à Biot : Les résultats par bureau de vote.

Bureau Description Inscrits Votants Particip. Nul Dermit Deschaintres Debras
1 Salle des Associations 836 462 55,26% 17 214 48,09% 105 23,60% 126 28,31%
2 Accueil 829 488 58,87% 9 180 37,58% 92 19,21% 207 43,22%
3 Ecole de Danse 870 418 48,05% 11 230 56,51% 69 16,95% 108 26,54%
4 Ecole de musique 849 484 57,01% 10 283 59,70% 61 12,87% 130 27,43%
5 Moulin neuf Primaire 808 368 45,54% 11 184 51,54% 57 15,97% 116 32,49%
6 Annexe St Philippe 821 323 39,34% 2 172 53,58% 37 11,53% 112 34,89%
7 Complexe Pierre Operto 819 455 55,56% 13 232 52,49% 72 16,29% 138 31,22%
8 Moulin Neuf Maternelle 882 488 55,33% 12 291 61,13% 78 16,39% 107 22,48%
9 Ecole Eugene Olivari 731 309 42,27% 6 173 57,10% 48 15,84% 82 27,06%
Total 7445 1959 619 1126
Suffrages exprimés 3704
Score 52,89% 16,71% 30,40%

Pour mieux comprendre la répartition des bureaux :

Suivre l’évolution du Coronavirus COVID-19 en France

Voici le meilleur site : https://mapthenews.maps.arcgis.com/apps/opsdashboard/index.html#/5e09dff7cb434fb194e22261689e2887 . A noter qu’il ne faut pas confondre « Cas » et « Cas confirmés ». Le nombre de « Cas » est bien supérieur aux nombres de « Cas confirmés ». Le nombre de « Cas confirmés » est souvant fonction du dépistage. Et il n’est pas le même suivant les pays. Il faut lire par exemple : https://www.ouest-france.fr/sante/virus/coronavirus/coronavirus-cas-confirmes-deces-letalite-pourquoi-de-telles-differences-d-un-pays-l-autre-6787080

En France aussi, la capacité de tests n’est pas illimitée : le processus complet prend de quatre à cinq heures et n’est réalisé que dans des centres spécialisés. Selon le ministère de la Santé, la capacité actuelle est de 2 500 tests par jour mais devrait augmenter significativement dans les prochains jours…

À l’inverse, en Allemagne, la capacité de tests est de 12 000 par jour, et la politique consiste à faire des tests précoces. Le nombre de cas détectés dans le pays est donc important, près de 14 000, mais il s’agit à 70 % de personnes d’entre 20 et 50 ans,

Une capture d’écran du site :

16/03/2020 :

18/03/2020 :

21/03/2020 :