The secret life of JPEG images









When a picture is taken with a camera, the raw data of the camera undergoes several processes before obtaining the final image. In particular, a digital image is denoised and demosaiced by the interpolation of missing colors in the Bayer pattern. Chromatic aberrations and optical distortions are corrected, and non-linear transformations are applied to enhance its contrast. The image is finally compressed to be stored and transmitted in a reasonable time. The JPEG (Joint Photographic Experts Group) standard is the most commonly used digital image compression format for photographs today. JPEG is a lossy compression method which reduces the image size to the price of image degradation. The main one is the appearance of an artifact in the form of squares, forming what is called the JPEG grid. The stronger the compression, the more visible this grid. Even when this signature is imperceptible to the naked eye in slightly compressed images, it is statistically significant and therefore detectable.

To these classic and automatic operations defining the image processing chain, manual global and local modifications can be performed by users. These can be alterations of color and brightness, or local retouching. These modifications can have various purposes including forgery. They are within the reach of everyone thanks to easy to use image editing software. Local forgery generally implies erasing, masking, cloning or inserting objects in an image. These local operations cause a rupture of homogeneity of the compression traces. The state of the art of forgery detection is based on these considerations to propose digital filters, i.e. operators able to highlight areas that appear to have been forged. However, a review of these methods shows that they suffer from shortcomings in the evaluation of the confidence that can be attributed to their detection. In the absence of any probabilistic and statistical modeling, it is not possible to quantify the confidence of the detections.

Through the algorithms presented in this thesis, we aim to recreate a complete compression history of the analyzed images. We provide a statistical validation of the detected inconsistencies in the image based on the a contrario detection methods, using large deviation arguments. This process allows us to define the detected events as such that could not occur by chance, being highly improbable in a normal image.

As a result, the proposed detection tools offer an automatic analysis of images, and do not require any interpretation or expertise in the field. The algorithms developed are published and made available online so that they can be used by the largest number of users, in particular by fact-checking journalists via the verification tool InVID-WeVerify developed by Agence France-Presse news agency. In order to make research reproducible, our scientific publications are delivered together with their source code and their online execution via the IPOL journal (Image Processing On Line, https://www.ipol.im).

In the final part of this thesis, we explore the evaluation of forgery detection methods themselves. We propose a methodology and a dataset to study the sensitivity of the detection tools to specific traces, as well as their ability to perform detection without semantic cues in the image. More than a simple evaluation tool, this methodology can be used to evaluate the strengths and weaknesses of each method.