Watermark Removal with Deep Image Priors

8 min readSep 25, 2021

Note

Before I began, I’d like to point out that this article is only meant to be utilized for educational purposes. I do not encourage violating original creator’s content and hard work.

Project Overview

CNNs are very common for image generation and restoration tasks. And it is believed that their great performance is because of their ability to learn realistic image priors from training on large datasets. This paper Deep Image Prior, shows that the structure of a generator alone is sufficient to provide enough low-level image statistics without any learning. Thus most of the image restoration tasks, for example, denoising, super-resolution, artefacts removal, watermark removal etc, can be done with highly realistic results without any training.

Generally, CNN’s excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. We show that, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, super-resolution, and inpainting — Authors

Following this, I was able to create a model that could quite easily remove any trace of the watermark from any image.

Introducing The Paper

CNNs are one the de facto inventions of the decades, for getting state of the art results in a variety of tasks in deep learning such as classification, generation and restoration. And the awesomeness of CNNs is imputed to their ability to learn statistics and features of the training domain.

However, the authors are arguing that the ability to learning alone is not sufficient to explain the good performance of CNNs. This is because learning or generalization requires both the model and the data to resonate with each other.

Now, this is not the case, because even if you throw random inputs coupled with random labels, the model would still overfit quite easily.

The authors especially target their statement for the image generation tasks.

We show that, contrary to the belief that learning is necessary for building good image priors, a great deal of image statistics are captured by the structure of a convolutional image generator independent of learning. This is particularly true for the statistics required to solve various image restoration problems, where the image prior is required to integrate information lost in the degradation processes. — Authors

The outputs (d) was obtained without any prior learning. Still the model is able to produce much cleaner results. Source: https://arxiv.org/abs/1711.10925

The Nub

To prove their point, the authors applied untrained CNNs to solve many restoration tasks. However, instead of following the common practice that is training the model on a large dataset, they optimized their generative model for a single degraded image. The model parameters are randomly initialized and optimized to maximize the likelihood for a given specific task.

We show that this very simple formulation is very competitive for standard image processing problems such as denoising, inpainting and super-resolution. — Authors

And following this, they found that an image reconstruction task can be easily done by forming the problem into conditional image generation. And they show that the only information that is required to solve such tasks is just the single degraded image, combined with the model that is suitable for that task.

This is particularly remarkable because no aspect of the network is learned from data; instead, the weights of the network are always randomly initialized, so that the only prior information is in the structure of the network itself. To the best of our knowledge, this is the first study that directly investigates the prior captured by deep convolutional generative networks independently of learning the network parameters from images. — Authors

So essentially, here’s how the structure of the framework would look like:

Choosing the task. (Denoising, Super Resolution, Inpainting etc)
Creating a model that is capable of completing the task. For instance, making sure that sufficient and normalized gradients would flow across the whole model etc.
Forming the task-specific objective function. For instance, in the case of super-resolution, the objective function would build such that the generator will produce an image, which when compressed (or degraded), matches the low resolution degrade image that we have.
And finally optimizing the model for that single degraded image.

Authors have experimented over a variety of tasks like Super Resolution, Image Denoising and Inpainting and across all these tasks, their framework can produce surprisingly good results.

Blind image denoising. Source: https://arxiv.org/abs/1711.10925

Image Inpainting. Source: https://arxiv.org/abs/1711.10925

Continuing To The Project

So I wanted to remove the watermark from a watermarked image. Removing the watermark is not a separate task, it is usually considered as part of inpainting. Now before proceeding let’s first formulate our objective function.

In this task, we have the watermarked image, let’s call it Xw, and the watermark that is applied to the image, let’s call it W. And we want to generate the original image, which does not contain any watermark, that is X.

Any watermarked image Xw can be represented as X * W. So essentially we want to generate an image Xo, such that ||(X — Xo) * W||² is minimized, where * is the Hadamard’s product.

When watermark is available

So as it’s clearly visible that, when the watermark is available to you than completing this task is a piece of cake.

Source: https://arxiv.org/abs/1711.10925

Here’s the sample progress of the generator I trained for the above-watermarked image when the watermark is available.

When the watermark is available, then it takes not more than 2 minutes to get results like this.

Essentially here, the model generates something that is multiplied with the watermark we have, which is then optimized to match the actual watermarked image.

Even though we have the watermark, but still quite interesting to note that the generator was almost successful at in-painting the pixels that are covered by the watermark. Which concludes the author’s argument, that a single image is enough to provide sufficient image statistics and priors to complete these restoration tasks.

When watermark is not available

The problem statement would become much interesting when we wouldn’t have any watermark :)

So can we remove the watermark when no info regarding the watermark is available to us? This makes so many questioned unanswered to the generator. Like:

Where is the watermark?
What exactly is the part of watermark?
And what exactly is not the part of watermark?

Basically, in order to minimize the objective function, I explained above, the watermark is a must to have. So either we modify the loss function that its independent of any watermark info, or we provide what the loss wants :)

Since the watermark removal is essentially is an inpainting task, so instead of finding a watermark and then remove it, why not simply in-paint the region where the watermark is?

The idea that came to me is actually quite simple, what if I just roughly highlights the region where the watermark is available, and use that mask in place of the watermark in the objective function?

For instance, for the watermarked image below (left), I will be providing the mask (middle) such that the watermark is completely covered by the mask overlay (right):

An example of watermarked image, which is when multiplied by the mask covers the watermark.

So now I can simply switch the unavailable watermark with the my hand drawn mask and continue as usual.

Let’s have a look at few more mask examples.

It hardly takes a minute to draw the each masks. And no need to be fancy, I’m simply using MS paint.

Few thoughts

Okay, so you might be thinking that it’s not a fully automated task anymore, since you would have to manually sit..tahh and highlight the watermarked region. Though I’m pretttty sure that after seeing the final results, you surely would enjoy that sitting :)

Moreover, think about how many problems we are solving by just simply proving some manual supervision to the model:

No training is required, so no need for any kind of data collection.
No need to train a watermark detection model. It’s hard to do than typical object detections.
Even if we can detect the watermark, it still doesn’t help that much, cuz the watermark can be drawn on the whole image, not on just a small region.
No need to train the generator on huge image datasets for learning image statistics.
No need to train the generator over an adversarial loss, which is already very difficult for producing high-resolution images like 1024 and more
And, all other solutions I’ve seen so far, which try to automate the whole procedure of detecting and removing the watermark, produces very visible artefacts.

Final Results

So the model that I’m using is exactly the same as described in the paper. For exact implementation detail, make sure to check out my GitHub repo at the end.

Okay then let’s have a look at some results.

Here’s a sample progress of the generator, when trained without any watermark.

Below are some of the final results.

Drawbacks of this technique

Even though the results are really cool but still there are few compromises:

When the watermark is spread across all over the image, then it becomes quite tedious to draw the mask. Not that it’s hard or something but it just becomes quite boring :[
The final results would very much depend on how well you are drawing the mask overlays. If you draw too roughly such that the mask becomes too thick, then the generated output would most likely contain too many artifacts.
This technique works great especially on the images such that the watermark is drawn over a rough background, like ground, sky and sea etc. But when the watermarks are present directly on top of the highly detailed region like a human face, then the artefacts would most likely be very visible.

But yeah, other than these cons, this stuff is really awesome.

Conclusion

The reasoning that Deep Neural Networks are capable of performing great on a variety of tasks is because they are good at determining underlying image priors from the training data, which is not an entirely accurate statement.
The model has to “resonate” with the generalization/learning that comes with the training.
Contrary to this belief, the authors of this paper show that a single image is enough to provide a sufficient amount of image statistics to complete most of the image restoration tasks. Like Denoising, Super Resolution, Image Inpainting etc.

Here’s the GitHub link to my project.

Thanks for reading.