Hello,
The residual value as you mentionned it is only a pixel value, giving a bound of the error in pixels. This value is obtained by computing the distance between the placed marker (2d point) and the computed 2d point (re-projection in the image plane of the 3d computed locator).
Getting the correspondances in real world distances is a difficult problem, and can not be solved in a global way; i.e. you cannot give 1 unique error estimation for every objects in your scene.
Your calculation is correct, but has to be done for every measurements. Everywhere in the image this correspondance will change, because of depth and perspective changes of the elements to measure ...
Best,
Stef