[<< | Prev | Index | Next | >>] Thursday, August 23, 2018
PDM Unchained
[Research log: breaking the PDM dimension barrier.]
The last term of equation 1 from my prior PDM post can be written in terms of a normal matrix product and the trace* operator (using the shorthand []ω for ∂ω, and recalling that dx/dz means the Jacobian matrix* of the function mapping z to x):
〈∂ωdzdx,dxdzT〉F=tr(dxdz[dzdx]ω) (1) In that form it's easier to explore what happens if we break our generative function z→x into two steps, z→y→x, with a hidden layer y:
tr(dxdz[dzdx]ω)=tr(dxdydydz[dzdydydx]ω)=tr(dxdydydz{[dzdy]ωdydx+dzdy[dydx]ω})=tr(dxdydydz[dzdy]ωdydx)+tr(dxdydydzdzdy[dydx]ω)=tr([dydxdxdy]dydz[dzdy]ω)+tr(dxdy[dydzdzdy][dydx]ω) (2) Now if y is the same dimensionality as x and z, and both of our mapping functions are fully invertible, then the Jacobians of the inverse functions are the inverses of the Jacobians, and so they cancel out and we get:
tr(dxdz[dzdx]ω)=tr(dydz[dzdy]ω)+tr(dxdy[dydx]ω) (3) Thus reaffirming that we can chain functions through simple summation of their independent gradients.
However, nothing in equation (2) actually relies on z→y or y→x being invertible, only that the composite function z→x is. For instance, y can be of a higher dimensionality than x and z, in which case all of our Jacobians become rectangular. Equation (2) still holds (but not equation (3)), with the would-be canceling terms acting instead to project the gradient into the subspace of y reachable by the auxiliary variable. I.e., the gradient for the y→x mapping is conditioned to lie within the subspace of y tended by z, and so with z→y by x.
Various implications to ponder.
Update 2018-09-04:
A significant constraint in the above is that the joint inverse mapping needs to be valid. For example, for a wider y layer, the z→y mapping needs to project to the same subspace as the x→y mapping, otherwise even though z→y→z and x→y→x may both be identity, x→y→z→y→x may not be! Specifically, be sure that:
dxdz=dxdydydz (4) A consequence of this, somewhat counter-intuitively, is that dy/dz needs to track the x→y mapping. Or, alternatively, we can create a distinct y′ for the generative direction, such that equation (2) becomes:
tr(dxdz[dzdx]ω)=tr(dxdy′dy′dz[dzdydydx]ω)=tr(dxdy′dy′dz[dzdy]ωdydx)+tr(dxdy′dy′dzdzdy[dydx]ω) (5) Again, taking care to insure that the round-trip mapping x→x is identity.
[<< | Prev | Index | Next | >>]