First, you must be sure that the dog face can be considered as a rigid-object, else you'll have to use motion capture with multiple camera.
Then, is the depth inaccurate because of a lack of parallax ?
Or, if you prefer, have the dog face track a good residual, or are they considered bad by the solver ?
Did you really need to track both the nodal background and the moving face at the same time ?
If you can you send your data to support@realviz.com for a deeper analysis, it would help.