Dropouts and Convolutions
Does dropout make sense in convolutional layers?
First, if you have no idea about dropout, I suggest you read this

The Question
Some time back I came across this blog which caught my attention, so I drafted a poll a couple of weeks back, asking whether dropout is effective when put between the convolutional layers of a CNN.
Here are the results for that:
A lot of people think that yes, dropout is an effective technique for regularizing the convolutional layers.
I think a lot of the people confused this with using a dropout in CNN (but in the fully connected layers).
Among those who were not in favor mostly thought that it doesn't make sense to use dropouts in convolutional layers.
"I think the dropout is ineffective for conv layers because convolution holds average information. So it makes more sense to apply dropout to FC layer."
"Dropout is a "good" regularization technique if it is used at the right place!
For example, Dropout after Fully-Connected layer makes sense. But dropout just after Convolutional doesn't make sense."
"using dropout in CNN layers is like ignoring edges, colors, meta-features of the image which is a NO NO.
This is what my intuition says,"
So does dropout make sense in convolutional layers?
I think it does not, not the standard dropout at least.
But this doesn't mean that we don't need to regularize the convolutional layers.
Batch Normalization has also been promoted over dropouts in the convolutional layers by many over the internet. Most people think that it does a better job of regularizing the layers. There's not a clear answer or experiment I could find which one is more effective.
A lot of people also think that convolutional layers do not have a lot of parameters such that they should be regularized.
But as quoted in Section 3.2 of this paper
"Since our network is fully convolutional and natural images exhibit strong spatial correlation, the feature map activations are also strongly correlated, and in this setting standard dropout fails."
We can infer that they want to remove the spatial correlation and the correlation in the feature map activations, that's why they need something like a dropout to regularize.
But they also say that the standard dropout fails, and they have proposed something called,
Spatial Dropout
Spatial Dropout is a special kind of dropout that promotes independence among the feature maps and is suggested to be used in the convolutional layers.
It drops out the entire feature map by making all the values in the channel zero.
The implementation is already available for both, in Keras as SpatialDropout and in Torch as Dropout2d.
We can see from the image above that the entire feature maps have been rendered zero.
If you want to have a wider perspective on this topic, I suggest you look at all the arguments that can be found on all the linked references above. For me, this was an interesting problem to think about and I learned quite a few things while trying to find an answer.