[<< | Prev | Index | Next | >>]

Thursday, August 23, 2018

PDM Unchained



[Research log: breaking the PDM dimension barrier.]

The last term of equation 1 from my prior PDM post can be written in terms of a normal matrix product and the trace* operator (using the shorthand `[]_omega` for `del_omega`, and recalling that `dx//dz` means the Jacobian matrix* of the function mapping `z` to `x`):

` (: del_omega dz/dx , {: dx/dz :}^T :)_F = "tr"(dx/dz [dz/dx]_omega)` (1)

In that form it's easier to explore what happens if we break our generative function `z->x` into two steps, `z->y->x`, with a hidden layer `y`:

` {: ("tr"(dx/dz [dz/dx]_omega),= "tr"(dx/dy dy/dz [dz/dy dy/dx]_omega)), (,="tr"(dx/dy dy/dz {[dz/dy]_omega dy/dx + dz/dy [dy/dx]_omega})), (,="tr"(dx/dy dy/dz [dz/dy]_omega dy/dx) + "tr"(dx/dy dy/dz dz/dy [dy/dx]_omega)), (,="tr"([dy/dx dx/dy] dy/dz [dz/dy]_omega) + "tr"(dx/dy [dy/dz dz/dy] [dy/dx]_omega)) :}` (2)

Now if `y` is the same dimensionality as `x` and `z`, and both of our mapping functions are fully invertible, then the Jacobians of the inverse functions are the inverses of the Jacobians, and so they cancel out and we get:

` "tr"(dx/dz [dz/dx]_omega) = "tr"(dy/dz [dz/dy]_omega) + "tr"(dx/dy [dy/dx]_omega)` (3)

Thus reaffirming that we can chain functions through simple summation of their independent gradients.

However, nothing in equation (2) actually relies on `z->y` or `y->x` being invertible, only that the composite function `z->x` is. For instance, `y` can be of a higher dimensionality than `x` and `z`, in which case all of our Jacobians become rectangular. Equation (2) still holds (but not equation (3)), with the would-be canceling terms acting instead to project the gradient into the subspace of `y` reachable by the auxiliary variable. I.e., the gradient for the `y->x` mapping is conditioned to lie within the subspace of `y` tended by `z`, and so with `z->y` by `x`.

Various implications to ponder.


Update 2018-09-04:

A significant constraint in the above is that the joint inverse mapping needs to be valid. For example, for a wider `y` layer, the `z->y` mapping needs to project to the same subspace as the `x->y` mapping, otherwise even though `z->y->z` and `x->y->x` may both be identity, `x->y->z->y->x` may not be! Specifically, be sure that:

` dx/dz = dx/dy dy/dz` (4)

A consequence of this, somewhat counter-intuitively, is that `dy//dz` needs to track the `x -> y` mapping. Or, alternatively, we can create a distinct `y'` for the generative direction, such that equation (2) becomes:

` {: ("tr"(dx/dz [dz/dx]_omega),= "tr"(dx/(dy') (dy')/dz [dz/dy dy/dx]_omega)), (,="tr"(dx/(dy') (dy')/dz [dz/dy]_omega dy/dx) + "tr"(dx/(dy') (dy')/dz dz/dy [dy/dx]_omega)) :}` (5)

Again, taking care to insure that the round-trip mapping `x->x` is identity.



[<< | Prev | Index | Next | >>]


Simon Funk / simonfunk@gmail.com