Abstract:
Text steganography is the art of hiding a secret message in a text. Conversely, text steganalysis is the art of detecting a hidden message in a text. In this work, we studied the detectability performance of a Markov chain (MC) based statistical text steganography technique. We started by analyzing Arabic texts of different types: economy, sports, international news. Then, the MC-based encoder was used to hide Arabic messages of various lengths. Subsequently, we extracted specific features from the stego-texts and the natural texts and applied them to a support vector machine (SVM) classifier. We noticed that detectability depends on the cover message type, the length of the concealed message, the embedding rate, and the extracted features. We noticed that lower the embedding rate and the smaller the text-size, less accurate is the classification. Moreover, the accuracy of an SVM classifier was less than 67% for 1 KB stego-texts generated with Arabic economy or sports cover texts with an embedding rate of 4 bpw. Besides, more than 62% of stego-texts were classified as natural texts for 1 KB text-sizes when we considered the word distribution feature.