Abstract:
Accurate identification of audio source recording devices is paramount in digital forensic
investigations, including topics like copyright protection, tamper detection, and audio source forensics.
This work presented a novel method for learning feature representations using temporal audio
characteristics, such as Mel Frequency Cepstral Coefficients (MFCC) and Constant-Q Transform
(CQT), obtained from segmented acoustic features. Subsequently creates a structured representation
learning model by combining Long Short-Term Memory Networks (LSTM) with Recurrent Neural
Networks (RNN). This model efficiently condenses spatial information, resulting in accurate
recognition, by utilizing temporal modelling and time-frequency representation. The performance of
the proposed methods is tested on 10-second audio signals recorded with four different audio recording
devices. The outcomes of the experiment show an amazing degree of accuracy with 96% in classifying
four types of recording audio source devices. This method promises improved efficacy in a variety of
forensic circumstances and represents a substantial development in audio forensic analysis. The
performance metrics of audio source recording using CQT-RNN and MFCC-RNN are compared, and
also compared with state-of-the-art methods. A user interface has been developed to facilitate the
recognition of the source device for test audio signals using the proposed method. Overall, this research
marks a substantial advancement in audio forensic analysis, providing a robust, accurate, and user friendly solution for the identification of audio source recording devices, and underscoring its potential
for widespread forensic applications.