Overview of the Topographic VAE with shifting temporal coherence. The combined color/rotation transformation in input space \(\tau_g\) becomes encoded as a \(\mathrm{Roll}\) within the equivariant capsule dimension. The model is thus able decode unseen sequence elements by encoding a partial sequence and rolling activations within the capsules. We see this completes a commutative diagram.

A matrix \(\mathbf{W}\) transforms data from \(\mathbf{X}\) to \(\mathbf{Z}\) space. The matrix \(\mathbf{R}\) is constrained to approximate the inverse of \(\mathbf{W}\) with a reconstruction loss \(||\mathbf{x} - \mathbf{\hat{x}}||^2\). The likelihood of the data is efficiently optimized with respect to both \(\mathbf{W}\) and \(\mathbf{R}\) by approximating the gradient of the log Jacobian determinant with the learned inverse.