MacOS : Les fichiers temporaires … volumineux ! Merci com.apple.appstore .

Je viens de faire un audit :

$ uname -v
Darwin Kernel Version 19.0.0: Thu Oct 17 16:17:15 PDT 2019; root:xnu-6153.41.3~29/RELEASE_X86_6

$ sudo du -sh /private/var/folders/gy/578qjv7j22j4pzty537lyjjc0000gn/C/com.apple.appstore/
124G	/private/var/folders/gy/578qjv7j22j4pzty537lyjjc0000gn/C/com.apple.appstore/

$ uptime 
 8:39  up 13 days, 17:50, 3 users, load averages: 13,08 18,79 20,84

$ ls -l /private/var/folders/
total 0
drwxr-xr-x@  3 root  wheel    96 26 sep  2018 f8
drwxr-xr-x   3 root  wheel    96 19 nov  2013 gy
drwxr-xr-x@  3 root  wheel    96 16 sep 12:56 gz
drwxr-xr-x@  3 root  wheel    96 26 sep  2018 v3
drwxr-xr-x@ 38 root  wheel  1216  2 jan  2019 zz

$ sudo du -sh /private/var/folders/*
 21M	/private/var/folders/f8
127G	/private/var/folders/gy
 17M	/private/var/folders/gz
  0B	/private/var/folders/v3
286M	/private/var/folders/zz

$ sudo ls -l /private/var/folders/gy/578qjv7j22j4pzty537lyjjc0000gn/
total 0
drwxr-xr-x    26 myuser  staff     832 15 oct 10:53 0
drwx------   257 myuser  staff    8224 29 nov 08:19 C
drwx------  3623 myuser  staff  115936 29 nov 08:44 T

Simplement 124 G de fichier temporaire … le pc est lancé seulement depuis 13 jours. Comment faire le ménage ? Normalement il suffit de faire une reboot mais visiblement cela ne fonctionne pas ?!

MacOS : ImageOptim : Test sur ma librairie de photos.

Le site pour le téléchargement : https://imageoptim.com/command-line.html .

Petit exemple sur mon Mac :

$ du -sh Pictures/Bibliothèque\ Photos.photoslibrary/*
  0B	Pictures/Bibliothèque Photos.photoslibrary/Attachments
  0B	Pictures/Bibliothèque Photos.photoslibrary/Masks
 55G	Pictures/Bibliothèque Photos.photoslibrary/Masters
  0B	Pictures/Bibliothèque Photos.photoslibrary/Plugins
4,0K	Pictures/Bibliothèque Photos.photoslibrary/ProjectDBVersion.plist
3,7M	Pictures/Bibliothèque Photos.photoslibrary/Projects.db
805M	Pictures/Bibliothèque Photos.photoslibrary/database
4,0K	Pictures/Bibliothèque Photos.photoslibrary/iPhotoLock.data
3,5G	Pictures/Bibliothèque Photos.photoslibrary/private
 21G	Pictures/Bibliothèque Photos.photoslibrary/resources

J’ai donc 55 Go de photos sur mon Mac dans Masters …

La configuration : pas de suppression des informations EXIF ! et qualité à 100%.

J’ai lancé la commande :

/Applications/ImageOptim.app/Contents/MacOS/ImageOptim Pictures/Bibliothèque\ Photos.photoslibrary/Masters/

J’ai ensuite attendu que le logiciel fasse la compression, et voici le résultat :

$ du -sh Pictures/Bibliothèque\ Photos.photoslibrary/*
  0B	Pictures/Bibliothèque Photos.photoslibrary/Attachments
  0B	Pictures/Bibliothèque Photos.photoslibrary/Masks
 40G	Pictures/Bibliothèque Photos.photoslibrary/Masters
  0B	Pictures/Bibliothèque Photos.photoslibrary/Plugins
4,0K	Pictures/Bibliothèque Photos.photoslibrary/ProjectDBVersion.plist
3,7M	Pictures/Bibliothèque Photos.photoslibrary/Projects.db
805M	Pictures/Bibliothèque Photos.photoslibrary/database
4,0K	Pictures/Bibliothèque Photos.photoslibrary/iPhotoLock.data
3,5G	Pictures/Bibliothèque Photos.photoslibrary/private
 21G	Pictures/Bibliothèque Photos.photoslibrary/resources

Au total c’est 15 Go d’économisé, et ceci simplement sur Master.

Mac OS : Un peu de ménage …

J’ai essayé de détecter l’espace disque utilisé avec des du …

$ du -sh Library/* | grep "G"
 15G	Library/Application Support
2,2G	Library/Caches
 15G	Library/Containers
4,8G	Library/Developer
  0B	Library/GameKit
 28K	Library/Google
108M	Library/Group Containers
6,5G	Library/Mail
1,1G	Library/iTunes

$ du -sh Library/Containers/* | grep "G"
2,0G	Library/Containers/com.apple.BKAgentService
 36K	Library/Containers/com.apple.Grab
 24K	Library/Containers/com.apple.STMExtension.GarageBand
 32K	Library/Containers/com.apple.dt.GitHubEnterpriseHostBuiltInExtension
 28K	Library/Containers/com.apple.dt.GitHubHostBuiltInExtension
 28K	Library/Containers/com.apple.dt.GitLabHostBuiltInExtension
 28K	Library/Containers/com.apple.dt.GitLabSelfHostBuiltInExtension
2,8G	Library/Containers/com.apple.mail
4,5G	Library/Containers/com.dummyapp.FileRecorder
 36K	Library/Containers/com.google.GoogleDrive.FinderSyncAPIExtension
136K	Library/Containers/com.icvt.JPEGminiLite
2,5M	Library/Containers/com.jixipix.GrungetasticMac
608K	Library/Containers/com.macpaw.Gemini

$ du -sh Library/Containers/com.apple.mail/* | grep "G"
2,8G	Library/Containers/com.apple.mail/Data

$ du -sh Library/Containers/com.apple.mail/Data/* | grep "G"
2,8G	Library/Containers/com.apple.mail/Data/Library

$ du -sh Library/Containers/com.apple.mail/Data/Library/* | grep "G"
2,4G	Library/Containers/com.apple.mail/Data/Library/Mail Downloads

$ du -sh Library/Containers/com.apple.mail/Data/Library/Mail\ Downloads/ | grep "G"
2,4G	Library/Containers/com.apple.mail/Data/Library/Mail Downloads/

J’ai donc supprimer tous les fichiers dans Library/Containers/com.apple.mail/Data/Library/Mail Downloads/ . J’ai gagné 2,4 Go.

Ensuite :

$ du -sh Library/Containers/com.dummyapp.FileRecorder/Data/Library/Application\ Support/com.dummyapp.FileRecorder/Thumb/*
 87M	Library/Containers/com.dummyapp.FileRecorder/Data/Library/Application Support/com.dummyapp.FileRecorder/Thumb/6E393749C958543AB2718DEC924E84A3
192K	Library/Containers/com.dummyapp.FileRecorder/Data/Library/Application Support/com.dummyapp.FileRecorder/Thumb/A6AE8B9EA019A64D69AF7FAB46558618
263M	Library/Containers/com.dummyapp.FileRecorder/Data/Library/Application Support/com.dummyapp.FileRecorder/Thumb/B5B54A131779E1D2BBA3F4964CA89093
3,8G	Library/Containers/com.dummyapp.FileRecorder/Data/Library/Application Support/com.dummyapp.FileRecorder/Thumb/B80792E501352D7A363983A9C18896D7
171M	Library/Containers/com.dummyapp.FileRecorder/Data/Library/Application Support/com.dummyapp.FileRecorder/Thumb/E9A9CA3D23362CA6810F2030A6FD056E

$ ls -l Library/Containers/com.dummyapp.FileRecorder/Data/Library/Application\ Support/com.dummyapp.FileRecorder/Thumb/
total 0
drwxr-xr-x    439 User  staff    14048 17 mar  2013 6E393749C958543AB2718DEC924E84A3
drwxr-xr-x      3 User  staff       96 16 mar  2013 A6AE8B9EA019A64D69AF7FAB46558618
drwxr-xr-x   1391 User  staff    44512 16 mar  2013 B5B54A131779E1D2BBA3F4964CA89093
drwxr-xr-x  40690 User  staff  1302080 17 mar  2013 B80792E501352D7A363983A9C18896D7
drwxr-xr-x    896 User  staff    28672 17 mar  2013 E9A9CA3D23362CA6810F2030A6FD056E

Vu la date mars 2013 et l’espace …. j’ai supprimé ! J’ai gagné 4,3 Go.

Ensuite :

$ du -sh Library/Containers/com.apple.BKAgentService/Data/Documents/iBooks/*
2,0G	Library/Containers/com.apple.BKAgentService/Data/Documents/iBooks/Books
  0B	Library/Containers/com.apple.BKAgentService/Data/Documents/iBooks/Downloads
  0B	Library/Containers/com.apple.BKAgentService/Data/Documents/iBooks/Temporary
  0B	Library/Containers/com.apple.BKAgentService/Data/Documents/iBooks/Updates

Je j’ai rien fait car c’est ma bibliothèque .

On poursuit, mais rien à me mettre sous la dent :

$ du -sh Library/Application\ Support/* | grep "G"
3,6G	Library/Application Support/Bento
1,8M	Library/Application Support/GarageBand
7,3G	Library/Application Support/Garmin
2,2M	Library/Application Support/Gimp
6,5M	Library/Application Support/GitHub Desktop
692K	Library/Application Support/GoPro
374M	Library/Application Support/Google
 72K	Library/Application Support/Google Earth
8,9M	Library/Application Support/Growl
 12K	Library/Application Support/JGoodies
  0B	Library/Application Support/com.GoPro.goproapp
 52K	Library/Application Support/com.GoPro.goproapp.GoProAlertService
160K	Library/Application Support/com.GoPro.goproapp.GoProAnalyticsService
 40K	Library/Application Support/com.GoPro.goproapp.GoProDeviceService
  0B	Library/Application Support/com.GoPro.goproapp.GoProExporterService
8,0K	Library/Application Support/com.GoPro.goproapp.GoProIDService
4,0K	Library/Application Support/com.GoPro.goproapp.GoProMediaFolderService
135M	Library/Application Support/com.GoPro.goproapp.GoProMediaService
  0B	Library/Application Support/com.GoPro.goproapp.GoProMsgBus
849M	Library/Application Support/com.GoPro.goproapp.GoProMusicService
 12K	Library/Application Support/com.GoPro.goproapp.GoProPushNotificationService
  0B	Library/Application Support/com.GoPro.goproapp.GoProShareService
242M	Library/Application Support/com.GoPro.goproapp.GoProUpdateService
 72M	Library/Application Support/com.gopro.GoPro-Studio
$ du -sh Library/Application\ Support/Garmin/*
 57M	Library/Application Support/Garmin/BaseCamp
  0B	Library/Application Support/Garmin/Bookmarks
105M	Library/Application Support/Garmin/Devices
7,2G	Library/Application Support/Garmin/Express
  0B	Library/Application Support/Garmin/Garmin ANT Agent
  0B	Library/Application Support/Garmin/Garmin WebUpdater
 64K	Library/Application Support/Garmin/GarminConnect
4,0K	Library/Application Support/Garmin/InstallationClient-Id
  0B	Library/Application Support/Garmin/Symboles de waypoint personnalisés
 19M	Library/Application Support/Garmin/Training Center
4,0K	Library/Application Support/Garmin/VIRB Edit
$ du -sh Library/Application\ Support/Garmin/Express/*
4,0K	Library/Application Support/Garmin/Express/AccountDictionaryDatastore.plist
 51M	Library/Application Support/Garmin/Express/AppUpdates
  0B	Library/Application Support/Garmin/Express/Events
102M	Library/Application Support/Garmin/Express/Firmware
4,0K	Library/Application Support/Garmin/Express/IgnoredDevices.plist
208K	Library/Application Support/Garmin/Express/LanguagePacks
648K	Library/Application Support/Garmin/Express/Logs
6,6G	Library/Application Support/Garmin/Express/Maps
436M	Library/Application Support/Garmin/Express/MediaCache
336K	Library/Application Support/Garmin/Express/RegisteredDevices
$ du -sh Library/Application\ Support/Garmin/Express/Maps/*
6,6G	Library/Application Support/Garmin/Express/Maps/ActiveEU.2019.20

Conclusion :

Sur Mac c’est très facile de supprimer 10 Go de fichiers inutiles …

MacOS : Python : Suppression des doublons d’emails avec l’API Python Elasticsearch/Kibana (Version V3)

Finalement dans les 200.000 emails je pense avoir des doublons … je vais donc profiter de l’export vers Elastciseach/Kibana pour voir si j’ai des doublons. L’email qu’il va avoir la même taille et le même checksum MD5 sera considéré comme un doublons.

Voici donc la version V3 (sans la suppression de fichier : os.unlink(path) )

#!/usr/bin/env python3

import email
import plistlib
import hashlib
import re
import glob, os
import string
from datetime import datetime
from email.utils import parsedate_to_datetime
from email.header import Header, decode_header, make_header
from elasticsearch import Elasticsearch 

class Emlx(object):
        def __init__(self):
            super(Emlx, self).__init__()
            self.bytecount = 0
            self.msg_data = None
            self.msg_plist = None

        def parse(self, filename_path):
            with open(filename_path, "rb") as f:
                self.bytecount = int(f.readline().strip())
                self.msg_data = email.message_from_bytes(f.read(self.bytecount))
                self.msg_plist = plistlib.loads(f.read())
            return self.msg_data, self.msg_plist

def md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

if __name__ == '__main__':
   msg = Emlx()
   nb_parse = 0
   nb_error = 0
   save_space = 0
   list_email = []
   printable = set(string.printable)
   path_mail = "/Users/MonLogin/Library/Mail/V6/"
   es_keys = "mail"
   es=Elasticsearch([{'host':'localhost','port':9200}])
   for root, dirs, files in os.walk(path_mail):
      for file in files:
          if file.endswith(".emlx"):
             file_full = os.path.join(root, file)
             my_check = md5(root+'/'+file)
             my_count = list_email.count(my_check)
             list_email.append(my_check)
             message, plist = msg.parse(file_full)
             statinfo = os.stat(file_full)
             if (my_count > 0):
                save_space += int(statinfo.st_size)
                #os.unlink(root+'/'+file)
             my_date = message['Date']
             my_id = message['Message-ID']
             my_server = message['Received']
             my_date_str = ""
             if my_date is not None and my_date is not Header:
                 try:
                   my_date_str = datetime.fromtimestamp(parsedate_to_datetime(my_date).timestamp()).strftime('%Y-%m-%dT%H:%M:%S')
                 except :
                   my_date_str = ""
             my_email = str(message['From'])
             my_email = str(make_header(decode_header(my_email)))
             if my_email is not None:
                 my_domain = re.search("@[\w.\-\_]+", str(my_email))
                 if my_domain is not None:
                      my_domain_str = str(my_domain.group ());
                      my_domain_str = my_domain_str.lower()
             if my_email is not None:
                 my_name = re.search("[\w.\-\_]+@", str(my_email))
                 if my_name is not None:
                      my_name_str = str(my_name.group ());
                      my_name_str = my_name_str.lower()
             json = '{"checksum":"'+my_check+'","count":"'+str(my_count)+'","size":'+str(statinfo.st_size)
             if my_domain is not None:
                 #print(my_domain.group())
                 #print(my_name.group())
                 json = json+',"name":"'+my_name_str+'","domain":"'+my_domain_str+'"'
             else:
                 my_email = my_email.replace(",","")
                 my_email = my_email.replace('"','')
                 my_email = str(re.sub(r'[^\x00-\x7f]',r'', my_email)) 
                 my_email = my_email.lower()
                 json = json+',"name":"'+my_email+'","domain":"None"';
             if my_date is not None and len(my_date_str) > 1:
                 json = json+',"date":"'+my_date_str+'","id":'+str(nb_parse)
             else:
                 json = json+',"id":'+str(nb_parse)
             if my_server is not None and my_server is not Header:
                 ip = re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', str(my_server))
                 if ip is not None:
                    my_ip = ip.group()
                    json = json+',"ip":"'+str(my_ip)+'"'
                 else:
                    my_ip = ""
                 #ip = re.findall(r'\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\b',my_server)
                 #ip = re.findall( r'[0-9]+(?:\.[0-9]+){1,3}', my_server )
                 #ip = re.findall(r'[\d.-]+', my_server) 
             else:
                 json = json
             if my_id is not None and my_id is not Header:
                 try:
                    my_id =my_id.strip()
                    my_id =my_id.strip('\n')
                    json = json+',"Message-ID":"'+my_id+'","file":"'+file+'"}'
                 except:
                    json = json+',"file":"'+file+'"}'
             else:
                 json = json+',"file":"'+file+'"}'
             print(json)
             try:
                res = es.index(index=es_keys,doc_type='emlx',id=nb_parse,body=json)
             except:
                nb_error += 1   
             nb_parse += 1
             #print(plist)
   print(nb_parse)

A suivre pour la V4 !