Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.
In order to find data hiding with our zeros hidden method, we first analyzed the histogram of the DCT coefficients of an uncompressed image, compressed image without data, and compressed image with hidden data. The histogram of the DCT coefficients reveals the number of times each DCT coefficient value appears within the DCT matrix. From the analysis of an uncompressed image (Figure 1), the histogram has a smooth curve. In the histogram of compressed image (Figure 2), values before the threshold are dropped. Therefore, those values dropped to zero in the histogram. The histogram of compressed image with data (Figure 3) shows a similar shape to an uncompressed image. However, the values are much lower which makes sense since we are replacing the values that were originally going to be dropped with data. Therefore it is statically less likely to replace the dropped value with the same value.
Therefore, after analyzing the histogram of the different types of images, we did an analysis of the l2 norm in the DCT matrix. If the analysis results in no power in the one valued DCT coefficients, it is a compressed image. This is due to the fact that ones are the minimum value that can be dropped. If there is power in the ones, then the image is either uncompressed or contains hidden data. The key difference between the two is the magnitude of the power in the ones. Statistically, it is less likely that every dropped coefficient gets replaced with a one. Therefore, the magnitude of the power in the ones in an image with data is lower than a compressed image. An image with hidden data will on average fall below a certain threshold. This threshold is dependent on the image size. Figure 4 shows the plots of the power without data, the power with data, and the threshold. Clearly, the power without data is greater than the power with data. We found our detection program to have a 90% success rate but resulted in a false-positive 12% of the time.