Abstract:
The unpredictability of data growth necessitates data management to make
optimum use of storage capacity. An innovative strategy for data deduplication is
proposed in this research study. The data is split into blocks of a predefined size by
the fixed-size DeDuplication algorithm. The main drawback of this approach is that
the preceding sections will be relocated from their original placements if additional
sections are inserted into the forefront or center of a file. As a result, the generated
chunks will have a new hash value, resulting in less DeDuplication ratio. To
overcome this drawback, this study suggests multiple characters as content-defined
chunking breakpoints, which mostly depend on file internal representation and have
variable chunk sizes. The experimental result shows significant improvement in the
redundancy removal ratio of the Linux dataset. So that a comparison is made
between the proposed fixed and dynamic deduplication stating that double character
chunking has less average chunk size and can gain a much higher deduplication ratio.