Record Details

Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage

NOPR - NISCAIR Online Periodicals Repository

View Archive Info
 
 
Field Value
 
Title Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage
 
Creator Pugazhendi, E
Sumalatha, M R
Harika, Lakshmi P
 
Subject Cloud Computing
Document Frequency
Document Retrieval
Document Weight
Dropbox Cloud
 
Description 260-269
The approach to optimize the data replication in public cloud storage when targeting the multiple instances is one of the challenging issues to process the text data. The amount of digital data has been increasing exponentially. There is a need to reduce the amount of storage space by storing the data efficiently. In cloud storage environment, the data replication provides high availability with fault tolerance system. An effective approach of deduplication system using weight based method is proposed at the target level in order to reduce the unwanted storage spaces in cloud. Storage space can be efficiently utilized by removing the unpopular files from the secondary servers. Target level consumes less processing power than source level deduplication. Multiple input text documents are stored into dropbox cloud. The top text features are detected using the Term Frequency (TF) and Named Entity Recognition (NER) and they are stored in text database. After storing the top features in database, fresh text documents are collected to find the popular and unpopular files in order to optimize the existing text corpus of cloud storage. Top Text features of the freshly collected text documents are detected using TF and NER and these unique features after the removing the duplicate features cleaning are compared with the existing features stored in the database. On the comparison, relevant text documents are listed. After listing the text documents, document frequency, document weight and threshold factor are detected. Depending on average threshold value, the popular and unpopular files are detected. The popular files are retained in all the storage nodes to achieve the full availability of data and unpopular files are removed from all the secondary servers except primary server. Before deduplication, the storage space occupied in the dropbox cloud is 8.09 MB. After deduplication, the unpopular files are removed from secondary storage nodes and the storage space in the dropbox cloud is optimized to 4.82MB. Finally, data replications are minimized and 45.60% of the cloud storage space is efficiently saved by applying the weight based deduplication system.
 
Date 2021-03-11T06:11:27Z
2021-03-11T06:11:27Z
2021-03
 
Type Article
 
Identifier 0975-1084 (Online); 0022-4456 (Print)
http://nopr.niscair.res.in/handle/123456789/56470
 
Language en_US
 
Rights CC Attribution-Noncommercial-No Derivative Works 2.5 India
 
Publisher NISCAIR-CSIR, India
 
Source JSIR Vol.80(03) [March 2021]