MoCoDA combines a model of past experience with causal knowledge to produce useful OOD samples.
Factorizing the transition dynamics, P(s,a,s’) =P(s,a)P(s’| s,a), we create augmented Q(s,a) with support expanded to new (s,a) where causal knowledge suggests P(s’|s,a) generalizes