# Self Normalizing Flows

A matrix $$\mathbf{W}$$ transforms data from $$\mathbf{X}$$ to $$\mathbf{Z}$$ space. The matrix $$\mathbf{R}$$ is constrained to approximate the inverse of $$\mathbf{W}$$ with a reconstruction loss $$||\mathbf{x} - \mathbf{\hat{x}}||^2$$. The likelihood of the data is efficiently optimized with respect to both $$\mathbf{W}$$ and $$\mathbf{R}$$ by approximating the gradient of the log Jacobian determinant with the learned inverse.