Deduplication using Modified Dynamic File Chunking for Big Data Mining

Taha Ahmed, Saja

doi:http://dx.doi.org/10.12785/ijcds/160105

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 16
→
Issue 01
→
View Item

Deduplication using Modified Dynamic File Chunking for Big Data Mining

Taha Ahmed, Saja

DOI: http://dx.doi.org/10.12785/ijcds/160105

ISSN: 2210-142X

Date: 2023-07-16

Abstract:

The unpredictability of data growth necessitates data management to make optimum use of storage capacity. An innovative strategy for data deduplication is proposed in this research study. The data is split into blocks of a predefined size by the fixed-size DeDuplication algorithm. The main drawback of this approach is that the preceding sections will be relocated from their original placements if additional sections are inserted into the forefront or center of a file. As a result, the generated chunks will have a new hash value, resulting in less DeDuplication ratio. To overcome this drawback, this study suggests multiple characters as content-defined chunking breakpoints, which mostly depend on file internal representation and have variable chunk sizes. The experimental result shows significant improvement in the redundancy removal ratio of the Linux dataset. So that a comparison is made between the proposed fixed and dynamic deduplication stating that double character chunking has less average chunk size and can gain a much higher deduplication ratio.

Show full item record