TL; DR: BayesVLA decomposes the policy into a vision-action prior and a language-conditioned likelihood. The vision-action prior leverages visual information for action generation (seeing to act), ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results