Tensorflow implementation of Recurrent Models of Visual Attention (Mnih et al. 2014), with additional research. Code based off of https://github.com/zhongwen/RAM.
| Model | Error |
|---|---|
| FC, 2 layers (64 hiddens each) | 6.78% |
| FC, 2 layers (256 hiddens each) | 2.65% |
| Convolutional, 2 layers | 1.57% |
| RAM, 4 glimpses, 12 x 12, 3 scale | 1.54% |
| RAM, 6 glimpses, 12 x 12, 3 scale | 1.08% |
| RAM, 8 glimpses, 12 x 12, 3 scale | 0.94% |
| Model | Error |
|---|---|
| FC, 2 layers (64 hiddens each) | 29.13% |
| FC, 2 layers (256 hiddens each) | 11.36% |
| Convolutional, 2 layers | 8.37% |
| RAM, 4 glimpses, 12 x 12, 3 scale | 5.15% |
| RAM, 6 glimpses, 12 x 12, 3 scale | 3.33% |
| RAM, 8 glimpses, 12 x 12, 3 scale | 2.63% |
| Model | Error |
|---|---|
| Convolutional, 2 layers | 16.22% |
| RAM, 4 glimpses, 12 x 12, 3 scale | 14.86% |
| RAM, 6 glimpses, 12 x 12, 3 scale | 8.3% |
| RAM, 8 glimpses, 12 x 12, 3 scale | 5.9% |
| Mean output | Sampled output |
|---|---|
| |
| |
| |
| |
| |









