Forensics is a study of evidence to help the police solving crimes. If we apply (Forensics) in Computer Sciences domain, crimes are mainly network attacks found more in emails; which become nowadays the most popular way of communication accessible via Internet. We receive in our Inboxes emails gangs without being aware of them. Therefore, it is necessary to build an automatic checking system to filter good emails from bad ones. In this paper, we propose a new emails processing approach using Singular Value Decomposition method (SVD) to optimize emails data before applying Data Mining techniques (Clustering) to extract bad emails located in the mail servers where the user’s inboxes are hosted. Our study is based on filtering Emails (bads and goods) by the clustering of optimized data compared with unoptimized one.
Accepté le :
DOI : 10.1051/ro/2015057
Mots-clés : Email, feronsics, spam, SVD, LSI, optimisation, data mining, clustering
@article{RO_2016__50_4-5_951_0, author = {Salhi, Dhai Eddine and Tari, Abdelkamel and Kechadi, M-Tahar}, title = {Clustering of optimized data for email forensics}, journal = {RAIRO - Operations Research - Recherche Op\'erationnelle}, pages = {951--963}, publisher = {EDP-Sciences}, volume = {50}, number = {4-5}, year = {2016}, doi = {10.1051/ro/2015057}, mrnumber = {3570541}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ro/2015057/} }
TY - JOUR AU - Salhi, Dhai Eddine AU - Tari, Abdelkamel AU - Kechadi, M-Tahar TI - Clustering of optimized data for email forensics JO - RAIRO - Operations Research - Recherche Opérationnelle PY - 2016 SP - 951 EP - 963 VL - 50 IS - 4-5 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/ro/2015057/ DO - 10.1051/ro/2015057 LA - en ID - RO_2016__50_4-5_951_0 ER -
%0 Journal Article %A Salhi, Dhai Eddine %A Tari, Abdelkamel %A Kechadi, M-Tahar %T Clustering of optimized data for email forensics %J RAIRO - Operations Research - Recherche Opérationnelle %D 2016 %P 951-963 %V 50 %N 4-5 %I EDP-Sciences %U http://www.numdam.org/articles/10.1051/ro/2015057/ %R 10.1051/ro/2015057 %G en %F RO_2016__50_4-5_951_0
Salhi, Dhai Eddine; Tari, Abdelkamel; Kechadi, M-Tahar. Clustering of optimized data for email forensics. RAIRO - Operations Research - Recherche Opérationnelle, Special issue - Advanced Optimization Approaches and Modern OR-Applications, Tome 50 (2016) no. 4-5, pp. 951-963. doi : 10.1051/ro/2015057. http://www.numdam.org/articles/10.1051/ro/2015057/
Clustering distributed data streams in peer-to-peer environments. Inf. Sci. 176 (2006) 1952–1985. | DOI
et al.,R. Bekkerman, Automatic categorization of email into folders: Benchmark experiments on Enron and SRI corpora (2004).
P. Bowes, Increased use of electronic communications tools among North American and European workers, press release (2000).
D. Clot, Méthodologies de fouille de données pour la modélisation dans les processus d’aide à la décision complexe: application à l’analyse des paramètres de déformation du coeur. Thèse de doctorat, Lyon 1 (2002).
S. Curtis, Pro open source mail: Building an enterprise mail solution (2006).
A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21 (2000) 1253–1278. | DOI | MR | Zbl
, and ,Communication networks from the Enron email corpus It’s always about the people. Enron is no different. Comput. Math. Organization Theory 11 (2005) 201–228. | DOI | Zbl
, and ,From data mining to knowledge discovery in databases. AI magazine 17 (1996) 37.
, and ,G.T. Fernando, Distributed systems: principles and paradigms. Edited by Andrew S. Tanenbaum, Maarten Van Steen Pearson Education, Inc., 2007 ISBN: 0-13-239227-5. J. Comput. Sci. Technol. 11 (2011) 115–116.
J.Y. Halpern and R. Fagin, Modelling knowledge and action in distributed systems: Preliminary report. Springer Berlin Heidelberg (1988). | Zbl
J. Han, M. Kamber and J. Pei, Data mining: concepts and techniques: concepts and techniques. Elsevier (2011). | Zbl
P. Hazel, Exim: The Mail Transfer Agent. O’Reilly Media, Inc. (2001).
D.T. Larose, Discovering knowledge in data: an introduction to data mining. John Wiley Sons (2014). | MR | Zbl
A. Mcdonald et al., Linux E-mail. Packt Publishing Ltd (2009).
A. Mirzal, Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition. Preprint (2010). | arXiv
D. Mullet and I. Managing, O’Reilly Media, Inc. (2000).
Latent semantic indexing: An overview. Techn. Rep. Infosys 240 (2000).
,The theory and computation of evolutionary distances: pattern recognition. J. Algorithms 1 (1980) 359–373. | DOI | MR | Zbl
,M. Sogrine, T. Kechadi and N. Kushmerick, Latent semantic indexing for text database selection. In: Proc. of the SIGIR 2005 Workshop on Heterogeneous and Distributed Information Retrieval (2005) 12–19.
R. Sureswaran et al., Active e-mail system SMTP protocol monitoring algorithm. In: Broadband Network Multimedia Technology, 2009. IC-BNMT’09. 2nd IEEE International Conference on. IEEE (2009) 257–260.
E. Triantaphyllou, Data Mining and Knowledge Discovery via Logic-Based Methods: Theory, Algorithms, and Applications. Springer Science Business Media (2010). | MR | Zbl
J. Tarhio and M. Tienari, Computer Science at the University of Helsinki 1991. University of Helsinki, Department of Computer Science (1991).
S. Tufféry, Data mining et statistique décisionnelle: l’intelligence dans les bases de données. Editions Technip (2005). | Zbl
G.J. Williams and S.J. Simoff (eds.). Data mining: Theory, methodology, techniques, and applications. Springer (2006).
Supporting collaborative task management in e-mail. Human Comput. Interaction 20 (2005) 49–88. | DOI
,S. Whittaker and C. SIdner, Email overload: exploring personal information management of email. In: Proc. of the SIGCHI conference on Human factors in computing systems. ACM (1996) 276–283.
Cité par Sources :