Record Details

Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage

NOPR - NISCAIR Online Periodicals Repository


Field	Value

Title	Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage

Creator	Pugazhendi, E Sumalatha, M R Harika, Lakshmi P

Subject	Cloud Computing Document Frequency Document Retrieval Document Weight Dropbox Cloud

Description	260-269 The approach to optimize the data replication in public cloud storage when targeting the multiple instances is one of the challenging issues to process the text data. The amount of digital data has been increasing exponentially. There is a need to reduce the amount of storage space by storing the data efficiently. In cloud storage environment, the data replication provides high availability with fault tolerance system. An effective approach of deduplication system using weight based method is proposed at the target level in order to reduce the unwanted storage spaces in cloud. Storage space can be efficiently utilized by removing the unpopular files from the secondary servers. Target level consumes less processing power than source level deduplication. Multiple input text documents are stored into dropbox cloud. The top text features are detected using the Term Frequency (TF) and Named Entity Recognition (NER) and they are stored in text database. After storing the top features in database, fresh text documents are collected to find the popular and unpopular files in order to optimize the existing text corpus of cloud storage. Top Text features of the freshly collected text documents are detected using TF and NER and these unique features after the removing the duplicate features cleaning are compared with the existing features stored in the database. On the comparison, relevant text documents are listed. After listing the text documents, document frequency, document weight and threshold factor are detected. Depending on average threshold value, the popular and unpopular files are detected. The popular files are retained in all the storage nodes to achieve the full availability of data and unpopular files are removed from all the secondary servers except primary server. Before deduplication, the storage space occupied in the dropbox cloud is 8.09 MB. After deduplication, the unpopular files are removed from secondary storage nodes and the storage space in the dropbox cloud is optimized to 4.82MB. Finally, data replications are minimized and 45.60% of the cloud storage space is efficiently saved by applying the weight based deduplication system.

Date	2021-03-11T06:11:27Z 2021-03-11T06:11:27Z 2021-03

Type	Article

Identifier	0975-1084 (Online); 0022-4456 (Print) http://nopr.niscair.res.in/handle/123456789/56470

Language	en_US

Rights	CC Attribution-Noncommercial-No Derivative Works 2.5 India

Publisher	NISCAIR-CSIR, India

Source	JSIR Vol.80(03) [March 2021]

ICAR Research Data Repository for Knowledge Management

Record Details

Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage

NOPR - NISCAIR Online Periodicals Repository