Education Nationale : Les sites utilisés par mon entourage .

Je viens de faire un petit sondage afin de voir les sites utilisés par les professeurs, j’ai ceci :

HOST= »https://www.atrium-sud.fr/ »
HOST= »https://www.index-education.com/fr/ »
HOST= »https://eu.bbcollab.com »
HOST= »https://www.youtube.com »
HOST= »https://sites.google.com »
HOST= »https://lycee.cned.fr/login/index.php »

Je vais donc lancer un petit Grafana afin de voir si les sites sont toujours UP.

A noter que j’ai du mal à comprendre l’utilisation des sites de Google, cela insite les élèves à se distraire … Misère.

Par exemple si je vais sur PeetTube à la place de Youtube https://joinpeertube.org/fr/ , je peux voir que l’académie de Lyon : https://tube.ac-lyon.fr/ est présente. Je ne vois AUCUNE publicité :

Pareil pour les sites web, pourquoi ne pas utiliser https://frama.site/ à la place de Google ?

Twitter Analytics : Migration des données vers MariaDB pour une utilisation dans Grafana (Version 2)

J’ai fait une version 2 qui permet de voir l’intéraction avec les utilisateurs : les sources sont disponibles ici : https://github.com/farias06/Grafana/blob/master/Twitter_CSV_insert_v2.py )

import csv
from datetime import datetime
import mysql.connector
import re
from mysql.connector import errorcode
from mysql.connector import (connection)

cnx = connection.MySQLConnection(user='twitter', password='twitter',
                                 host='127.0.0.1',
                                 database='TWITTERS')
cursor = cnx.cursor();
now = datetime.now().date();

#cursor.execute("DROP TABLE TWITTER;");
#cursor.execute("CREATE TABLE TWITTER (IDENTIFIANT varchar(30) UNIQUE,PERMALIEN varchar(200),TEXTE varchar(500),DATE datetime,IMPRESSION float,ENGAGEMENT float,TAUX_ENGAGEMENT float, RETWEET float,REPONSE float, JAIME float, CLIC_PROFIL float, CLIC_URL float, CLIC_HASTAG float, OUVERTURE_DETAIL float, CLIC_PERMALIEN float, OUVERTURE_APP int, INSTALL_APP int, ABONNEMENT int, EMAIL_TWEET int, COMPOSER_NUMERO int, VUE_MEDIA int, ENGAGEMENT_MEDIA int);");
#cursor.execute("CREATE TABLE TWITTER_USER (USER varchar(20),IDENTIFIANT varchar(30), DATE datetime, PRIMARY KEY (USER,IDENTIFIANT));");
cursor.execute("DELETE FROM TWITTER_USER")
cursor.execute("DELETE FROM TWITTER");
cnx.commit();

with open('input.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, quotechar='"')
    for row in reader:
        MyDate=row[3].replace(" +0000", ":00")
        MyTexte=row[2].replace("'", " ")
        MyTexte=MyTexte.replace(",", " ")
        MyC4=row[4].replace("Infinity", "0")
        MyC5=row[5].replace("Infinity", "0")
        MyC6=row[6].replace("Infinity", "0")
        MyC6=MyC6.replace("NaN", "0")
        MyC7=row[7].replace("Infinity", "0")
        User = re.findall(r'(?<=\W)[@]\S*', MyTexte)
        for MyUser in User:
            try :
                cursor.execute("INSERT INTO TWITTER_USER (IDENTIFIANT,USER,DATE) VALUES ('"+row[0]+"','"+MyUser+"','"+MyDate+"');");
            except mysql.connector.Error as err:
                print("Something went wrong: {}".format(err))
                if err.errno == errorcode.ER_BAD_TABLE_ERROR:
                    print("Creating table TWITTER_USER")
                else:
                    None
        try :
            cursor.execute("INSERT INTO TWITTER (IDENTIFIANT,PERMALIEN,TEXTE,DATE,IMPRESSION,ENGAGEMENT,TAUX_ENGAGEMENT,RETWEET,REPONSE, JAIME, CLIC_PROFIL, CLIC_URL, CLIC_HASTAG, OUVERTURE_DETAIL, CLIC_PERMALIEN, OUVERTURE_APP, INSTALL_APP, ABONNEMENT, EMAIL_TWEET, COMPOSER_NUMERO, VUE_MEDIA, ENGAGEMENT_MEDIA) VALUES ('"+row[0]+"', '"+row[1]+"', '"+MyTexte+"','"+MyDate+"', "+MyC4+", "+MyC5+", "+MyC6+", "+MyC7+", "+row[8]+","+row[9]+", "+row[10]+", "+row[11]+","+row[12]+","+row[13]+","+row[14]+","+row[15]+","+row[16]+","+row[17]+","+row[18]+","+row[19]+","+row[20]+","+row[21]+");");
        except mysql.connector.Error as err:
            print("Something went wrong: {}".format(err))
            if err.errno == errorcode.ER_BAD_TABLE_ERROR:
                print("Creating table TWITTER")
            else:
                None

cnx.commit();
cursor.close();
cnx.close();

Il faut donc faire la requête suivante :

SELECT
  UNIX_TIMESTAMP(date) as time_sec,
  COUNT(*) as value,
  USER as metric
FROM TWITTER_USER
WHERE $__timeFilter(date)
GROUP BY MONTH(date),YEAR(date), USER
ORDER BY date ASC

J’ai ceci , il suffit ensuite de cliquer sur un utilisateur :
Pour avoir la liste des users et le nombre d’intéraction :

 

SELECT USER, count(*) as value FROM TWITTER_USER GROUP BY USER ORDER BY value DESC;

Twitter Analytics : Migration des données vers MariaDB pour une utilisation dans Grafana

Les prérequis :

  • Python : Je fais du python3.
  • MariaDB
  • Grafana : J’utilise la version 5.0.0.
  • Linux : J’utilise Ubuntu 18

Voici donc ce que j’ai fait pour avoir mes données sous Grafana.

Etape 1 :

Je vais sur Twitter Analytics : https://analytics.twitter.com/about et je télécharge les fichiers CSV (By Tweet, et non By Day). Normalement j’ai un fichier CSV par mois.

Etape 2 :

Je concatène tous les fichiers CSV dans un seul fichier en supprimant les entetes :

cat tweet_activity_metrics_CYBERNEURONES_20* | grep -v "Identifiant du" > input.csv

Etape 3 :

Je lance le programme en Python pour mettre les données sous MariaDB. A noter que j’ai fait une base et un login pour Twitter avant :

-> create database TWITTERS;
-> CREATE USER 'twitter'@'localhost' IDENTIFIED BY 'twitter';
-> GRANT ALL PRIVILEGES ON TWITTERS.* TO 'twitter'@'localhost';
-> FLUSH PRIVILEGES;

Le programme en Python :

#! /usr/bin/env python3
# ~*~ utf-8 ~*~

import csv
from datetime import datetime
import mysql.connector
from mysql.connector import errorcode
from mysql.connector import (connection)

cnx = connection.MySQLConnection(user='twitter', password='twitter',
                                 host='127.0.0.1',
                                 database='TWITTERS')
cursor = cnx.cursor();
now = datetime.now().date();

#cursor.execute("DROP TABLE TWITTER;");
#cursor.execute("CREATE TABLE TWITTER (IDENTIFIANT varchar(30) UNIQUE,PERMALIEN varchar(200),TEXTE varchar(500),DATE datetime,IMPRESSION float,ENGAGEMENT float,TAUX_ENGAGEMENT float, RETWEET float,REPONSE float, JAIME float, CLIC_PROFIL float, CLIC_URL float, CLIC_HASTAG float, OUVERTURE_DETAIL float, CLIC_PERMALIEN float, OUVERTURE_APP int, INSTALL_APP int, ABONNEMENT int, EMAIL_TWEET int, COMPOSER_NUMERO int, VUE_MEDIA int, ENGAGEMENT_MEDIA int);");
cursor.execute("DELETE FROM TWITTER");

with open('input.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, quotechar='"')
    for row in reader:
        MyDate=row[3].replace(" +0000", ":00")
        MyTexte=row[2].replace("'", " ")
        MyTexte=MyTexte.replace(",", " ")
        MyC4=row[4].replace("Infinity", "0")
        MyC5=row[5].replace("Infinity", "0")
        MyC6=row[6].replace("Infinity", "0")
        MyC6=MyC6.replace("NaN", "0")
        MyC7=row[7].replace("Infinity", "0")
        try :
            cursor.execute("INSERT INTO TWITTER (IDENTIFIANT,PERMALIEN,TEXTE,DATE,IMPRESSION,ENGAGEMENT,TAUX_ENGAGEMENT,RETWEET,REPONSE, JAIME, CLIC_PROFIL, CLIC_URL, CLIC_HASTAG, OUVERTURE_DETAIL, CLIC_PERMALIEN, OUVERTURE_APP, INSTALL_APP, ABONNEMENT, EMAIL_TWEET, COMPOSER_NUMERO, VUE_MEDIA, ENGAGEMENT_MEDIA) VALUES ('"+row[0]+"', '"+row[1]+"', '"+MyTexte+"','"+MyDate+"', "+MyC4+", "+MyC5+", "+MyC6+", "+MyC7+", "+row[8]+","+row[9]+", "+row[10]+", "+row[11]+","+row[12]+","+row[13]+","+row[14]+","+row[15]+","+row[16]+","+row[17]+","+row[18]+","+row[19]+","+row[20]+","+row[21]+");");
        except mysql.connector.Error as err:
            print("Something went wrong: {}".format(err))
            if err.errno == errorcode.ER_BAD_TABLE_ERROR:
                print("Creating table TWITTER")
            else:
                None

cnx.commit();
cursor.close();
cnx.close();

# END 

Etape 4 :

Je configure Grafana pour avoir accès à MariaDB :

Etape 5 :

Faire les graphiques :

– Pour le nombre d’impression par jour :

SELECT
  UNIX_TIMESTAMP(date) as time_sec,
  SUM(IMPRESSION) as value
FROM TWITTER
WHERE $__timeFilter(date)
GROUP BY DAY(date),MONTH(date),YEAR(date)
ORDER BY date ASC


– Pour le nombre de « J’aime » par jour :

SELECT
  UNIX_TIMESTAMP(date) as time_sec,
  SUM(JAIME) as value
FROM TWITTER
WHERE $__timeFilter(date)
GROUP BY DAY(date),MONTH(date),YEAR(date)
ORDER BY date ASC


– Pour le nombre de « Clic sur profil » par jour :

SELECT
  UNIX_TIMESTAMP(date) as time_sec,
  SUM(CLIC_PROFIL) as value
FROM TWITTER
WHERE $__timeFilter(date)
GROUP BY DAY(date),MONTH(date),YEAR(date)
ORDER BY date ASC

– Pour le nombre d’Abonnement par jour :

SELECT
  UNIX_TIMESTAMP(date) as time_sec,
  SUM(ABONNEMENT) as value
FROM TWITTER
WHERE $__timeFilter(date)
GROUP BY DAY(date),MONTH(date),YEAR(date)
ORDER BY date ASC

Il est aussi possible de mettre plusieurs courbes sur le même graphe:

Thunderbird mbox to ( Influxdb, Postgresql, mysql ) to Grafana in Python

J’ai amélorié ( voir : https://www.cyber-neurones.org/2020/03/thunderbird-mbox-to-influxdb-and-postgresql-to-grafana-in-python/ ) le programme afin d’injecter sur MySQL ( MariaDB en vérité ). Le plus facile a manipuler sur Grafana c’est MariaDB.

Pour se connecter de Grafana à MariaDB :

Les requêtes SQL pour Grafana :

Par jours :

SELECT
UNIX_TIMESTAMP(date) AS time_sec,
domain as ‘metric’,
count(domain) as value
FROM thunderbird
WHERE
$__timeFilter(date)
GROUP BY DAY(date),MONTH(date),YEAR(date)
ORDER BY date

Par mois :

SELECT
UNIX_TIMESTAMP(date) AS time_sec,
domain as ‘metric’,
count(domain) as value
FROM thunderbird
WHERE
$__timeFilter(date)
GROUP BY MONTH(date),YEAR(date)
ORDER BY date

Par années :

SELECT
UNIX_TIMESTAMP(date) AS time_sec,
domain as ‘metric’,
count(domain) as value
FROM thunderbird
WHERE
$__timeFilter(date)
GROUP BY YEAR(date)
ORDER BY date

Les sources du programme :

( Source sur : https://github.com/farias06/Python/blob/master/parse_email_v2.py )

#! /usr/bin/env python3
# ~*~ utf-8 ~*~

# Readme :
# ARIAS FREDERIC 
# 
# influx user create -n arias -p arias -o cyberneurones-org 
#

import mailbox
import bs4
import glob
import os
import time
import codecs
import sys

#
#
from influxdb import InfluxDBClient
import re
from datetime import datetime

#
#
from email.utils import parsedate_to_datetime

#
#
import logging

#
#  pip3 install psycopg2
import psycopg2

#  pip3 install mysql-connector-python-rf
#  pip3 install mysql

# MariaDB [mysql]> create database thunderbird;
# Query OK, 1 row affected (0.00 sec)

# MariaDB [(none)]> CREATE USER 'arias'@'localhost' IDENTIFIED BY 'arias';
# Query OK, 0 rows affected (0.00 sec)

# MariaDB [(none)]> GRANT USAGE ON *.* TO 'arias'@'localhost' IDENTIFIED BY 'arias';
# Query OK, 0 rows affected (0.00 sec)

# MariaDB [(none)]>  GRANT ALL privileges ON thunderbird.* TO 'arias'@'localhost';
# Query OK, 0 rows affected (0.00 sec)

# MariaDB [(none)]> FLUSH PRIVILEGES;
# Query OK, 0 rows affected (0.00 sec)

import mysql.connector as mariadb

#########################

logger = logging.Logger('catch_all')

#########################

global nb_folder
nb_folder = 0;
global nb_email
nb_email = 0;
global nb_error
nb_error = 0;
global id_email
id_email = 0;

global flag_influxdb
flag_influxdb = False

global flag_postgresql
flag_postgresql = False

global flag_mysql
flag_mysql = True

global name_Table
name_DB = 'thunderbird'
name_Table = 'thunderbird'
my_login = 'arias'
my_password = 'arias'
my_host = '127.0.0.1'
Login = 'arias';
Folder = 'zy3zk9ms.default';

global client
if (flag_influxdb == True):
   client = InfluxDBClient(host=my_host, port=8086, username=my_login, password=my_password)
   client.drop_database(name_Table)
   client.create_database(name_Table)
   client.switch_database(name_Table)

global client2
if (flag_postgresql == True):
   client2 = psycopg2.connect("dbname="+name_DB+" user="+my_login+" password='"+my_password+"'")
   cursor2 = client2.cursor()
   sqlCreateTable = "create table "+name_Table+" (id bigint UNIQUE, mail varchar(128), name varchar(128), domain varchar (128), date timestamp);"
   cursor2.execute(sqlCreateTable)
   sqlCreateTable = "delete from "+name_Table;
   cursor2.execute(sqlCreateTable)
   client2.commit()

global client3
if (flag_mysql == True):
   client3 = mariadb.connect(user=my_login, password=my_password, database=name_DB)
   cursor3 = client3.cursor()
   #sqlCreateTable = "create table "+name_Table+" (id bigint UNIQUE, mail varchar(128), name varchar(128), domain varchar (128), date datetime);"
   #cursor3.execute(sqlCreateTable)
   sqlCreateTable = "delete from "+name_Table;
   cursor3.execute(sqlCreateTable)
   client3.commit()

#########################

def get_html_text(html):
    try:
        return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
    except AttributeError: # message contents empty
        return None

class GmailMboxMessage():
    def __init__(self, email_data):
        if not isinstance(email_data, mailbox.mboxMessage):
            raise TypeError('Variable must be type mailbox.mboxMessage')
        self.email_data = email_data

    def parse_email(self):
        global client
        global client2
        global id_email
        global name_Table
        global cursor2
        global cursor3
        global flag_influxdb
        global flag_postgresql
        global flag_mysql
        email_date = self.email_data['Date']
        email_from = self.email_data['From']
        email_to = self.email_data['To']
        email_subject = self.email_data['Subject']
        if email_date is not None and email_from is not None:
            mail = re.search(r'[\w\.\-_]+@[\w\.\-_]+', email_from)
            if mail is not None:
                mailstr = mail.group(0)
            if mail is not None:    
                domain = re.search("@[\w\.\-_]+", email_from).group(0)
                domain = domain.replace('@', '')
                domain = domain.replace('>', '')
            if mail is not None:
                user = re.search("[\w\.i\-_]+@", email_from).group(0)
                user = user.replace('@', '')
                user = user.replace('<', '')
            local_time_str = datetime.fromtimestamp(parsedate_to_datetime(email_date).timestamp()).strftime('%Y-%m-%dT%H:%M:%S.%f%z')
            local_time_str2 = datetime.fromtimestamp(parsedate_to_datetime(email_date).timestamp()).strftime('%Y-%m-%d %H:%M:%S')
            timestamp = round(parsedate_to_datetime(email_date).timestamp() * 1000);
            if mail is not None:
                data = [{'measurement': 'thunderbirds', 'tags': { 'fullemail': 1, 'from': email_from, 'mail': mailstr, 'domain': domain, 'user': user}, 'id' : id_email, 'time': timestamp, 'date':local_time_str, 'fields': {"value": 1}}]
                sql = "INSERT INTO "+name_Table+" (mail, domain, name, id, date) VALUES ('" +mailstr+ "','" + domain+"','" +user+"','"+str(id_email)+"',TIMESTAMP '"+local_time_str2+"')";
                sql2 = "INSERT INTO "+name_Table+" (mail, domain, name, id, date) VALUES ('" +mailstr+ "','" + domain+"','" +user+"','"+str(id_email)+"','"+local_time_str2+"')";
                #print (sql2)
                if (flag_postgresql == True):
                   cursor2.execute(sql);
                   client2.commit();
                if (flag_mysql == True):
                   cursor3.execute(sql2);
                   client3.commit();
            else :
               data = [{'measurement': 'thunderbirds', 'tags': { 'fullemail': 0, 'from': email_from }, 'id' : id_email, 'time': timestamp, 'date':local_time_str, 'fields': {"value": 1}}] 
            #print (data);
            if (flag_influxdb == True):
               client.write_points(data, time_precision='ms')
            id_email = id_email+1

def mbox_reader(stream):
    data = stream.read()
    text = data.decode(encoding="utf-8")
    return mailbox.mboxMessage(text)

######################### End of library, example of use below

print("\nUsing glob.iglob()") 
for filename in glob.iglob('/home/'+Login+'/snap/thunderbird/common/.thunderbird/'+Folder+'/Mail/Local Folders/**/*', recursive=True): 
    print(filename);
    filename2, file_extension = os.path.splitext(filename);
    print(file_extension + " " + str(len(file_extension)));
    isFile = os.path.isfile(filename)
    if (file_extension != ".msf") and (file_extension != ".sbd") and isFile is True:
        mbox_obj = mailbox.mbox(filename);
        num_entries = len(mbox_obj)
        nb_folder = nb_folder + 1;
        try :
           for idx, email_obj in enumerate(mbox_obj):
               email_data = GmailMboxMessage(email_obj)
               email_data.parse_email()
               nb_email = nb_email + 1;
               print('Parsing email {0} of {1}'.format(idx, num_entries))
        except StopIteration:
           continue
        except Exception as e: 
           logger.error('Failed : '+ str(e))
           nb_error = nb_error+1;
           continue

print('The number of folder :'+str(nb_folder));    
print('The number of email :'+str(nb_email));
print('The number of error : '+str(nb_error));
print('The number in database : '+str(id_email));