You can download this notebook here.
For a student project, I have been playing around with stereo camera calibration. We have a set of cameras. The challenge is to obtain 3D kinematics by combining data from multiple views (i.e. simultaneously captured images from different perspectives).
Below, I document my attempts to relate the image space of the cameras (i.e. what is seen on each frame) to object space (i.e. the true 3D position of something).
Here is the usual extensive set of libraries I use below.
import numpy as NP # numeric operations
import pandas as PD # data organization
import cv2 as CV # open computer vision, i.e. image manipulation library
import sympy as SYM # symbolic python
import sympy.physics.mechanics as MECH # symbolic mechanics
import matplotlib as MP # plotting
import matplotlib.pyplot as MPP # plotting
from mpl_toolkits import mplot3d # 3D plots
# equal aspect for matplotlib 3D plots
# from https://stackoverflow.com/a/19248731
def AxisEqual3D(ax):
extents = NP.array([getattr(ax, f'get_{dim}lim')() for dim in 'xyz'])
sz = extents[:,1]  extents[:,0]
centers = NP.mean(extents, axis=1)
maxsize = max(abs(sz))
r = maxsize/2
for ctr, dim in zip(centers, 'xyz'):
getattr(ax, f'set_{dim}lim')(ctr  r, ctr + r)
Ingredients
We have a set of consumer level GoPro “Hero 5” in the lab, three of which were attached to a rack of metal tubes. All were set to narrow field of view (to minimize fisheye lens distortion). Videos of resolution \(1280\times 720\) @ 240 fps were triggered with a remote control.
The same video settings were used to film a regular grid for undistortion (procedure as described here), which was applied to all the material used below.
The remote control sync was not accurate, which I found out only after calibration. I will use an extra audio clue in the future.
The setup is lightweight, mobile, and thus suited for field work.
Calibration Data
As a test calibration object, I grabbed a monitor box that was lying around in the office.
To get true 3D points, I prepared a 3D model of the box by photogrammetry.
Then, I used the “PickPoints” function in meshlab
to get relative positions of the corner points and two extra landmarks on the box.
Taking simultaneous photos from the fixed cameras, I could also retrieve image points. Prior to calibration, those images were contrast enhanced and undistorted. All relevant images have to be prepared with the same procedure.
data = PD.read_csv('cam_calibration_data.csv', sep = ';').set_index('pt', inplace = False)
data
x  y  z  u_1  v_1  u_2  v_2  u_4  v_4  

pt  
0  0.235521  0.308646  0.101321  321.426491  111.782226  786.011187  252.220410  664.551136  65.475581 
1  0.000698  0.311595  0.094717  180.229182  143.665489  710.098655  287.899300  537.777207  91.285842 
2  0.004713  0.328370  0.089341  365.455760  315.986937  1073.719682  405.563724  814.857949  221.855397 
3  0.236395  0.329500  0.092263  522.594701  272.716794  1129.894956  363.811832  946.186629  186.935632 
4  0.005713  0.315304  0.349343  203.762067  394.935970  685.047519  527.023775  535.499831  356.979704 
5  0.001885  0.331354  0.345600  379.879141  596.104179  1025.894787  662.148082  792.084189  524.746399 
6  0.229327  0.330225  0.348522  524.872077  546.001908  1082.829186  606.731934  915.062491  479.198880 
7  0.000000  0.000000  0.000000  266.010343  279.548921  868.755846  397.213346  656.200757  211.227643 
8  0.132960  0.025250  0.097379  338.886374  191.490384  899.120859  313.709561  720.726409  125.446482 
Note that object points are in meters, and should have a meaningful origin. Here are the images for download: cam1, cam2, cam4.
cams = [1,2,4]
images = {cam: CV.imread(f'images/cam{cam}.png', 1) for cam in cams}
# drawing its lines on left image
fig = MPP.figure(dpi = 300)
for nr, (cam, img) in enumerate(images.items()):
ax = fig.add_subplot(1, len(images), nr+1, aspect = 'equal')
ax.imshow(img[:, :, ::1], origin = 'upper')
pts = data.loc[:, [f'u_{cam}', f'v_{cam}']].values
ax.scatter(pts[:, 0], pts[:, 1], s = 20 \
, marker = '+', facecolor = 'r', edgecolor = '0' \
, linewidth = 1, alpha = 0.6 \
)
ax.get_xaxis().set_visible(False)
ax.set_yticks([])
ax.set_xlabel(nr)
MPP.tight_layout()
MPP.show()
Of course, the points must always be labeled in the same order.
It would be better to use point labels as pandas
data frame indices to make sure only matching points are associated.
In addition to the points on the images, the true 3D relation of the point is known:
data.loc[:, ['x', 'y', 'z']].T
pt  0  1  2  3  4  5  6  7  8 

x  0.235521  0.000698  0.004713  0.236395  0.005713  0.001885  0.229327  0.0  0.132960 
y  0.308646  0.311595  0.328370  0.329500  0.315304  0.331354  0.330225  0.0  0.025250 
z  0.101321  0.094717  0.089341  0.092263  0.349343  0.345600  0.348522  0.0  0.097379 
This can be used for calibration.
The Successful Attempt: DLT Calibration à la Argus/Kwon
My initial temptation was to go to openCV. For several reasons, these attempts failed (see below).
On the search for alternatives, I stumbled upon the argus toolbox from Ty Hedrick’s lab (Jackson et al., 2016). The toolbox had previously been recommended to me by a colleague, François Druelle.
I went through to the procedures referenced in the argus “theory” section. The authors implement algorithms collected in the Motion Analysis library of YoungHoo Kwon. The Kwon3D website (Kwon, 2000) is a phenomenal resource for my research. I have already referenced the website in my force plate blog posts. I have now realized that my recent quest on solving inverse dynamics could have been facitilated by theory assembled on Kwon’s archives. And now, I also use the section on camera calibration via DLT for my purpose.
Theory
I start here with the naked maths. If that is of less interest to you, feel free to skip to the “numeric application” section below.
The website by Kwon excellently explains the DLT theory.
I will repeat it here in a sympy
framework, using the same symbols.
# object space coordinates
x, y, z = SYM.symbols(f'x:z', real = True)
x0, y0, z0 = SYM.symbols(f'x_0, y_0, z_0', real = True)
# image space coordinates
u, v = SYM.symbols(f'u, v', real = True)
u0, v0, d = SYM.symbols(f'u_0, v_0, d', real = True)
# reference frame DCM
r = SYM.symbols('r_{1:4}{1:4}', real = True)
T_io = SYM.Matrix([r[:3],r[3:6],r[6:]]).T
# colinearity
c = SYM.symbols(f'c', real = True)
We require two reference frames. One is the object space, the other is the image plane (extended to 3D towards the focal point).
# object space
object_space = MECH.ReferenceFrame('O')
# image plane
image_plane = MECH.ReferenceFrame('I')
image_plane.orient(object_space, 'DCM', T_io)
# static
image_plane.set_ang_vel(object_space, 0)
image_plane.set_ang_acc(object_space, 0)
# transform between spaces
object_to_image = SYM.simplify(image_plane.dcm(object_space)) # frame_to.dcm(frame_from)
image_to_object = SYM.simplify(object_space.dcm(image_plane))
SYM.pprint(object_to_image)
⎡r_{1}{1} r_{1}{2} r_{1}{3}⎤
⎢ ⎥
⎢r_{2}{1} r_{2}{2} r_{2}{3}⎥
⎢ ⎥
⎣r_{3}{1} r_{3}{2} r_{3}{3}⎦
There are several points that can be located in either of the two reference frames. Most notably, the “projection center” (i.e. focal point) of the image can be located in both frames and connects the two.
### define the origins
# ... of the object space
origin = MECH.Point('Ω')
# of the image plane
origin_image = MECH.Point('ρ')
### points in the world
# object point in the world
obj_point = MECH.Point('ο')
obj_point.set_pos(origin, x * object_space.x + y * object_space.y + z * object_space.z)
### points in both worlds
# projection center: where all object point/projections meet
proj_center = MECH.Point('ν')
proj_center.set_pos(origin_image, u0 * image_plane.x + v0 * image_plane.y + d * image_plane.z)
proj_center.set_pos(origin, x0 * object_space.x + y0 * object_space.y + z0 * object_space.z)
### points in the image
# principal point (center of image plane; perpendicular axis through image plane and projection center)
principal_point = MECH.Point('π')
principal_point.set_pos(origin_image, u0 * image_plane.x + v0 * image_plane.y + 0 * image_plane.z)
# projection of the object point to the image plane
img_point = MECH.Point('ι')
img_point.set_pos(origin_image, u * image_plane.x + v * image_plane.y + 0 * image_plane.z)
### static: no movement of no point
for point in [origin, origin_image, obj_point, proj_center, principal_point, img_point]:
point.set_vel(object_space, 0)
point.set_acc(object_space, 0)
point.set_vel(image_plane, 0)
point.set_acc(image_plane, 0)
Take any vector which goes from the projection center to a true 3D point. Any such vector intersects the image plane at an image point. The vectors to the true point and to the plane are colinear, i.e. related by scalar multiplication.
This is the collinearity condition. It gives linear equations that relate the two spaces.
### vectors
A_o = obj_point.pos_from(proj_center).express(object_space)
B_i = img_point.pos_from(proj_center).express(image_plane)
A_i = A_o.express(image_plane)
equations = { coord: c*A_i.dot(coord)  B_i.dot(coord) \
for coord in [image_plane.x, image_plane.y, image_plane.z] \
}
equations
{I.x: c*(r_{1}{1}*(x  x_0) + r_{1}{2}*(y  y_0) + r_{1}{3}*(z  z_0))  u + u_0,
I.y: c*(r_{2}{1}*(x  x_0) + r_{2}{2}*(y  y_0) + r_{2}{3}*(z  z_0))  v + v_0,
I.z: c*(r_{3}{1}*(x  x_0) + r_{3}{2}*(y  y_0) + r_{3}{3}*(z  z_0)) + d}
Some values need to be substituted in, (i) to reduce dimensionality, (ii) to correct for an arbitrary scaling factor.
c_substitute = {c: SYM.solve(equations[image_plane.z], c)[0]}
λ_u, λ_v = SYM.symbols(f'λ_u, λ_v', real = True)
uv_substitutes = { \
(uu0): λ_u * (uu0) \
, (vv0): λ_v * (vv0) \
}
equations = { coord: equations[coord].subs(c_substitute).subs(uv_substitutes) \
for coord in [image_plane.x, image_plane.y] \
}
equations
{I.x: d*(r_{1}{1}*(x  x_0) + r_{1}{2}*(y  y_0) + r_{1}{3}*(z  z_0))/(r_{3}{1}*x  r_{3}{1}*x_0 + r_{3}{2}*y  r_{3}{2}*y_0 + r_{3}{3}*z  r_{3}{3}*z_0)  λ_u*(u  u_0),
I.y: d*(r_{2}{1}*(x  x_0) + r_{2}{2}*(y  y_0) + r_{2}{3}*(z  z_0))/(r_{3}{1}*x  r_{3}{1}*x_0 + r_{3}{2}*y  r_{3}{2}*y_0 + r_{3}{3}*z  r_{3}{3}*z_0)  λ_v*(v  v_0)}
Finally, the resulting equations can be solved to get formulas for u
and v
(image coordinates) as a function of x
, y
and z
(object coordinates).
u_eqn = SYM.Eq( u, SYM.simplify(SYM.solve(equations[image_plane.x], u)[0]).factor(x, y, z) )
v_eqn = SYM.Eq( v, SYM.simplify(SYM.solve(equations[image_plane.y], v)[0]).factor(x, y, z) )
solutions = [u_eqn, v_eqn]
for sol in solutions:
SYM.pprint(sol)
(d⋅r_{1}{1}⋅x₀  d⋅r_{1}{2}⋅y₀  d⋅r_{1}{3}⋅z₀ + r_{3}{1}⋅u₀⋅x₀⋅λᵤ + r_{
u = ──────────────────────────────────────────────────────────────────────────
λᵤ⋅(r_{3}{
3}{2}⋅u₀⋅y₀⋅λᵤ + r_{3}{3}⋅u₀⋅z₀⋅λᵤ + x⋅(d⋅r_{1}{1}  r_{3}{1}⋅u₀⋅λᵤ) + y⋅(d⋅r_
──────────────────────────────────────────────────────────────────────────────
1}⋅x  r_{3}{1}⋅x₀ + r_{3}{2}⋅y  r_{3}{2}⋅y₀ + r_{3}{3}⋅z  r_{3}{3}⋅z₀)
{1}{2}  r_{3}{2}⋅u₀⋅λᵤ) + z⋅(d⋅r_{1}{3}  r_{3}{3}⋅u₀⋅λᵤ))
────────────────────────────────────────────────────────────
(d⋅r_{2}{1}⋅x₀  d⋅r_{2}{2}⋅y₀  d⋅r_{2}{3}⋅z₀ + r_{3}{1}⋅v₀⋅x₀⋅λᵥ + r_{
v = ──────────────────────────────────────────────────────────────────────────
λᵥ⋅(r_{3}{
3}{2}⋅v₀⋅y₀⋅λᵥ + r_{3}{3}⋅v₀⋅z₀⋅λᵥ + x⋅(d⋅r_{2}{1}  r_{3}{1}⋅v₀⋅λᵥ) + y⋅(d⋅r_
──────────────────────────────────────────────────────────────────────────────
1}⋅x  r_{3}{1}⋅x₀ + r_{3}{2}⋅y  r_{3}{2}⋅y₀ + r_{3}{3}⋅z  r_{3}{3}⋅z₀)
{2}{2}  r_{3}{2}⋅v₀⋅λᵥ) + z⋅(d⋅r_{2}{3}  r_{3}{3}⋅v₀⋅λᵥ))
────────────────────────────────────────────────────────────
You can convince yourself that this result is identical to the common DLT expression.
Sympy
can perform what is called common subexpression elimination:
cse_replacements, dlt_terms = SYM.cse(solutions, symbols = SYM.utilities.iterables.numbered_symbols('L'))
SYM.pprint(dlt_terms)
⎡ L₃⋅(L₀⋅L₇ + L₁⋅L₇ + L₂⋅L₇  L₄⋅x₀  L₅⋅y₀  L₆⋅z₀ + x⋅(L₄  L₇⋅r_{3}{1})
⎢u = ─────────────────────────────────────────────────────────────────────────
⎣ λᵤ
+ y⋅(L₅  L₇⋅r_{3}{2}) + z⋅(L₆  L₇⋅r_{3}{3})) L₃⋅(L₀⋅L₁₁ + L₁⋅L₁₁  L
────────────────────────────────────────────────, v = ────────────────────────
₁₀⋅z₀ + L₁₁⋅L₂  L₈⋅x₀  L₉⋅y₀ + x⋅(L₁₁⋅r_{3}{1} + L₈) + y⋅(L₁₁⋅r_{3}{2} + L
──────────────────────────────────────────────────────────────────────────────
λᵥ
₉) + z⋅(L₁₀  L₁₁⋅r_{3}{3})) ⎤
─────────────────────────────⎥
⎦
However, this yields different parameters than the “guided” DLT parameters presented by Kwon.
The significant achievement of such DLT parameters is that one can build a system of linear equations to solve for the \(L_i\)’s.
Numeric Application
Multiplying by the RHS denominator, moving mixed terms to the right side and thereby isolating u, v
results in such a system of linear equations.
A system of linear equations can be solved by linear algebra.
And if thinking of linear algebra raises your neck hair, rest assured that python
will do the work for you.
Argus (Jackson et al., 2016) uses numpy.linalg.lstsq
for this.
The function requires two matrices \(A\) and \(B\), for which we get \(L\) (the DLT parameters) from \(A L = B\).
To solve the systems, \(N\) points have to be known in both image and object space.
\(A\) is the coefficient matrix, shape \((2N \times 11)\). The number \(11\) comes from \(11\) \(L\) coefficients in the problem.
\( A = \begin{array}{c  ccccccccccc} i & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline 2k & x_k & y_k & z_k & 1 & 0 & 0 & 0 & 0 & u_k x_k & u_k y_k & u_k z_k \\ 2k+1 & 0 & 0 & 0 & 0 & x_k & y_k & z_k & 1 & v_k x_k & v_k y_k & v_k z_k \end{array} \quad\quad \forall k \in \{0, N1\} \)
And \(B\) is the dependent matrix:
\( B = \begin{array}{c  c} & \\ \hline 2k & u_k \\ 2k+1 & v_k \end{array} \quad\quad \forall k \in {0, N1} \)
Then, computationally, \(\mid\mid B  A L \mid\mid\) is minimized by the least squares method.
Sounds fun? Here is the function, adapted from the argus code.
def SolveDLT(data, cam):
# solve DLT for a camera
# from N known points
# adapted from https://github.com/kilmoretrout/argus_gui/blob/master/argus_gui/tools.py `solve_dlt`
# input: a data frame with [x, y, z, u_i, v_i] columns where i is the camera index
# coefficient matrix
A = NP.zeros((data.shape[0] * 2, 11))
# dependent variable
B = NP.zeros((data.shape[0] * 2, 1))
# fill the matrices
for k, row in data.iterrows():
A[2 * k, :3] = row[['x', 'y', 'z']].values
A[2 * k, 3] = 1
A[2 * k, 8:] = row[['x', 'y', 'z']].values * row[f'u_{cam}']
A[2 * k + 1, 4:7] = row[['x', 'y', 'z']].values
A[2 * k + 1, 7] = 1
A[2 * k + 1, 8:] = row[['x', 'y', 'z']].values * row[f'v_{cam}']
B[2 * k] = row[f'u_{cam}']
B[2 * k + 1] = row[f'v_{cam}']
# solve system of linear equations
L = NP.linalg.lstsq(A, B, rcond=None)[0]
return L.ravel()
Taking our data from above, the function can yield DLT parameters for the three cameras:
L = {cam: SolveDLT(data, cam = cam) for cam in cams}
L
{1: array([7.03913782e+02, 2.04857500e+02, 1.03511822e+02, 2.65781236e+02,
1.10530429e+02, 1.96650553e+02, 6.71888993e+02, 2.79123077e+02,
1.88877266e01, 2.75455293e01, 1.71605937e01]),
2: array([5.55001628e+02, 3.98060092e+02, 5.05956178e+01, 8.68303335e+02,
6.55700125e+01, 1.17488996e+02, 6.36343639e+02, 3.96662992e+02,
2.90184236e01, 1.79093742e01, 1.44750418e01]),
4: array([7.29359405e+02, 2.20829579e+02, 9.97698632e+01, 6.56184624e+02,
8.31303936e+01, 1.49535713e+02, 7.05594977e+02, 2.11502025e+02,
2.37017457e01, 2.95065936e01, 1.74821150e01])}
To confirm the accuracy, we can project the 3D points to the image plane and check whether points match:
for cam in cams:
for coord in ['u', 'v']:
data[f'{coord}_{cam}_reco'] = 0
for k, row in data.iterrows():
data.loc[k, f'u_{cam}_reco'] = (NP.dot(L[cam][:3].T, row[['x', 'y', 'z']].values) + L[cam][3]) \
/ (NP.dot(L[cam][3:].T, row[['x', 'y', 'z']].values) + 1.)
data.loc[k, f'v_{cam}_reco'] = (NP.dot(L[cam][4:7].T, row[['x', 'y', 'z']].values) + L[cam][7]) \
/ (NP.dot(L[cam][3:].T, row[['x', 'y', 'z']].values) + 1.)
data.loc[:, ['x', 'y', 'z', 'u_1', 'v_1', 'u_1_reco', 'v_1_reco']]
x  y  z  u_1  v_1  u_1_reco  v_1_reco  

pt  
0  0.235521  0.308646  0.101321  321.426491  111.782226  321.775150  111.785891 
1  0.000698  0.311595  0.094717  180.229182  143.665489  180.082729  144.087333 
2  0.004713  0.328370  0.089341  365.455760  315.986937  365.453553  316.329381 
3  0.236395  0.329500  0.092263  522.594701  272.716794  522.499640  272.692276 
4  0.005713  0.315304  0.349343  203.762067  394.935970  203.651747  394.920945 
5  0.001885  0.331354  0.345600  379.879141  596.104179  380.403750  595.960378 
6  0.229327  0.330225  0.348522  524.872077  546.001908  524.553708  546.239586 
7  0.000000  0.000000  0.000000  266.010343  279.548921  265.781236  279.123077 
8  0.132960  0.025250  0.097379  338.886374  191.490384  338.916072  191.099186 
fig = MPP.figure(dpi = 300)
for nr, cam in enumerate(cams):
ax = fig.add_subplot(3, 1, nr+1, aspect = 'equal')
ax.imshow(NP.mean(images[cam][:, :], axis = 2), cmap = 'gray', origin = 'upper')
ax.scatter( data[f'u_{cam}'].values \
, data[f'v_{cam}'].values \
, s = 40 \
, marker = 'x' \
, color = (0.2, 0.6, 0.2) \
, alpha = 0.8 \
, label = 'measured' \
)
ax.scatter( data[f'u_{cam}_reco'].values \
, data[f'v_{cam}_reco'].values \
, s = 40 \
, marker = '+' \
, color = (0.6, 0.2, 0.2) \
, alpha = 0.8 \
, label = 'reconstructed' \
)
ax.legend(loc = 'best', fontsize = 4)
ax.set_xticks([])
ax.set_yticks([])
ax.set_ylabel(cam)
fig.tight_layout()
MPP.show();
This check is trivial, but it looks like a match.
We’ll store the DLT parameters as a matrix.
# dlt: (number of cameras)x11 array of DLT coefficients
dlt = NP.stack([L[cam] for cam in cams], axis = 0)
dlt
array([[7.03913782e+02, 2.04857500e+02, 1.03511822e+02,
2.65781236e+02, 1.10530429e+02, 1.96650553e+02,
6.71888993e+02, 2.79123077e+02, 1.88877266e01,
2.75455293e01, 1.71605937e01],
[5.55001628e+02, 3.98060092e+02, 5.05956178e+01,
8.68303335e+02, 6.55700125e+01, 1.17488996e+02,
6.36343639e+02, 3.96662992e+02, 2.90184236e01,
1.79093742e01, 1.44750418e01],
[7.29359405e+02, 2.20829579e+02, 9.97698632e+01,
6.56184624e+02, 8.31303936e+01, 1.49535713e+02,
7.05594977e+02, 2.11502025e+02, 2.37017457e01,
2.95065936e01, 1.74821150e01]])
3D Point Reconstruction
When filming a calibration object, we know the true positions of points. Afterwards, when filming a scene, the goal is to take the observed image points and get their relative position in 3D, i.e. in “world coordinates”.
The argus code (Jackson et al., 2016) also contains a function for that.
def UVtoXYZ(pts, dlt):
# retrieve Object Points (3D) from multiple perspective Image Points
# adapted from https://github.com/kilmoretrout/argus_gui/blob/master/argus_gui/tools.py `uv_to_xyz`
# pts: (N x 2K) array of N 2D points for K cameras
# can be a single point over time, or multiple points in one scene
# dlt: (11 x K) array of DLT parameters
# Adjusted because data points herein are undistorted prior to calculation
# initialiye empty data array
xyzs = NP.empty((len(pts), 3))
xyzs[:] = NP.nan
# for each point
for i in range(len(pts)):
uvs = list()
# for each uv pair
for j in range(len(pts[i]) // 2):
# do we have a NaN pair?
if not True in NP.isnan(pts[i, 2 * j:2 * (j + 1)]):
# if not append the undistorted point and its camera number to the list
uvs.append([pts[i, 2 * j:2 * (j + 1)], j])
if len(uvs) > 1:
# if we have at least 2 uv coordinates, setup the linear system
A = NP.zeros((2 * len(uvs), 3))
# assemble coefficient matrix of the linear system
for k in range(len(uvs)):
A[k] = NP.asarray([uvs[k][0][0] * dlt[uvs[k][1]][8]  dlt[uvs[k][1]][0],
uvs[k][0][0] * dlt[uvs[k][1]][9]  dlt[uvs[k][1]][1],
uvs[k][0][0] * dlt[uvs[k][1]][10]  dlt[uvs[k][1]][2]])
A[k + 1] = NP.asarray([uvs[k][0][1] * dlt[uvs[k][1]][8]  dlt[uvs[k][1]][4],
uvs[k][0][1] * dlt[uvs[k][1]][9]  dlt[uvs[k][1]][5],
uvs[k][0][1] * dlt[uvs[k][1]][10]  dlt[uvs[k][1]][6]])
# the dependent variables
B = NP.zeros((2 * len(uvs), 1))
for k in range(len(uvs)):
B[k] = dlt[uvs[k][1]][3]  uvs[k][0][0]
B[k + 1] = dlt[uvs[k][1]][7]  uvs[k][0][1]
# solve it
xyz = NP.linalg.lstsq(A, B, rcond=None)[0]
# place in the proper frame
xyzs[i] = xyz[:, 0]
return xyzs
Here is how data can be assembled for this modified version:
reco = UVtoXYZ(data.loc[:, [f'{coord}_{cam}' for cam in cams for coord in ['u', 'v'] ]].values, dlt)
print (reco  data.loc[:, ['x', 'y', 'z']])
x y z
pt
0 0.000953 0.001926 0.001119
1 0.000111 0.000483 0.000105
2 0.000696 0.002089 0.000875
3 0.000479 0.000710 0.000603
4 0.000237 0.000237 0.000447
5 0.000047 0.001261 0.000382
6 0.000397 0.001256 0.000676
7 0.000072 0.000616 0.000609
8 0.001467 0.004175 0.001740
And because those are just the calibration points, we can check whether the positions match.
pts = data.loc[:, ['x', 'y', 'z']].values
fig = MPP.figure(dpi = 300)
ax = fig.add_subplot(1,1,1,projection='3d', aspect = 'auto')
edge_kwargs = dict( \
lw = 0.5 \
, alpha = 0.8 \
, zorder = 0 \
, label = None \
)
for edge in [[0,1], [1,2], [2,3], [3,0], [4,5], [5,6], [4,1], [5,2], [6,3]]:
ax.plot( reco[edge, 0] \
, reco[edge, 1] \
, reco[edge, 2] \
, color = 'k' \
, ls = '' \
, **edge_kwargs
)
ax.plot( pts[edge, 0] \
, pts[edge, 1] \
, pts[edge, 2] \
, color = '0.9' \
, ls = ':' \
, **edge_kwargs
)
ax.scatter( pts[:, 0] \
, pts[:, 1] \
, pts[:, 2] \
, s = 16 \
, marker = 'o' \
, color = (0.2, 0.6, 0.2) \
, alpha = 0.8 \
, zorder = 10 \
, label = 'true points' \
)
ax.scatter( reco[:, 0] \
, reco[:, 1] \
, reco[:, 2] \
, s = 32 \
, marker = '+' \
, linewidth = 0.5 \
, color = (0.6, 0.2, 0.6) \
, alpha = 0.8 \
, zorder = 20 \
, label = 'reconstructed points' \
)
ax.legend(loc = 1, fontsize = 4)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.view_init(30, 20)
AxisEqual3D(ax)
MPP.show();
… and they do, it is a box!
It remains too be seen whether a space far from the calibration object can be well reconstructed.
Applied Example: Dog Video
When testing our setup, two kind shepards with their dogs passed by. (Our university is keeping sheep on the green areas of our campus.)
They allowed us to follow to the meadow and get a few shots of the animals.
Here is one, in real time, from multiple perspectives:
I tracked one stride cycle of the accelerating dog.
Because I have three cameras, I can reconstruct 3D data with two of them, and project that into the third camera (using a modified dlt_inverse
from argus).
Purple are the tracked points, red are the reprojections (slowed down 10x):
I will have to find out where the offset comes from; one reason might be that the calibration videos were not in sync.
Apart from that, the data looks reasonable. But…
…let’s look at the 3D data from all cameras:
The lateral perspective seems okay. However, the depth is not well recovered. This is only in part due to the lacking contrast on some landmarks (e.g. the knee), as can be seen from the welldefined snout point. The other problem is the way I arranged my cameras (flat, next to each other).
I’ll have to improve on that. One way is more accurate calibration. The other is optimization of the camera distance and angles. Noteworthy, the problem is general: if the view directions of cameras pointing at the same scene are at a low relative angle, i.e. if the focal planes are relatively parallel, then the depth is less accurate than the other spatial dimensions. This inaccuracy is not captured by calculating the reprojection error.
Still, getting accurate 3D data this way would be appealing: units are in meters and 3D, so accurate angles and speeds can be determined. There is certainly room for improvement.
Probabilistic DLT
Does the depth inaccuracy come from inaccurate calibration, or from inaccurate tracking?
To tackle this question, one can use probabilistic modeling of the DLT parameters.
Luckily, the formula above can easily be brought to a probabilistic context. Let’s bring in some additional libraries.
import pymc3 as PM
import theano as TH
import theano.tensor as TT
Prepare the matrices as before:
def PrepareDLTData(cam_data):
# prepare linear equations in matrix form from data of one camera
# adapted from https://github.com/kilmoretrout/argus_gui/blob/master/argus_gui/tools.py `solve_dlt`
# input: a data frame with [x, y, z, u, v] columns
# coefficient matrix
A = NP.zeros((cam_data.shape[0] * 2, 11))
# dependent variable
B = NP.zeros((cam_data.shape[0] * 2, 1))
# fill the matrices
for k, row in cam_data.iterrows():
A[2 * k, :3] = row[['x', 'y', 'z']].values
A[2 * k, 3] = 1
A[2 * k, 8:] = row[['x', 'y', 'z']].values * row['u']
A[2 * k + 1, 4:7] = row[['x', 'y', 'z']].values
A[2 * k + 1, 7] = 1
A[2 * k + 1, 8:] = row[['x', 'y', 'z']].values * row['v']
B[2 * k] = row['u']
B[2 * k + 1] = row['v']
return A, B
Here comes the actual data:
cams = [1, 2, 4]
n_cams = len(cams)
A = {}
B = {}
Λ = {}
for cam in cams:
A[cam], B[cam] = PrepareDLTData(data.copy() \
.loc[:, ['x', 'y', 'z', f'u_{cam}', f'v_{cam}']] \
.rename(columns = {f'u_{cam}': 'u', f'v_{cam}': 'v'} \
, inplace = False) \
)
Λ[cam] = NP.linalg.lstsq(A[cam], B[cam], rcond=None)[0] # to get accurate priors
print (A[1].shape, Λ[1].shape)
print (NP.sum(NP.abs((A[1] @ Λ[1]).ravel()  B[1].T.ravel())))
(A[1] @ Λ[1]).T
(18, 11) (11, 1)
3.855946132496541
array([[321.8142397 , 111.78630141, 180.07251953, 144.11673908,
365.45378411, 316.2934613 , 522.50552844, 272.69379468,
203.63567058, 394.91875521, 380.38679315, 595.96502565,
524.54983622, 546.24247655, 265.78123573, 279.12307693,
338.91652818, 191.09317816]])
The model is equivaltent to what was fed to lstsq
above.
However, we have the flexibility to use robust regression by making the posterior distribution StudentT
.
Hence, the result will be more robust to outliers, without the need for a RANSAC (Random sample consensus).
def ProbabilisticDLTModel(A, B, L):
with PM.Model() as model:
# the DLT parameters L
λ = PM.Normal('λ', mu = L.reshape([1,1]), sd = 2*NP.std(L), shape = (11,1))
# matrix multiplication
# A must be a theano tensor
estimator = TH.shared(A) @ λ
# model residual
residual = PM.HalfCauchy('ε', 10.)
# Students T degrees of freedom
dof = PM.HalfNormal('η', 10.)
# posterior distribution/"likelihood"
posterior = PM.StudentT( 'post' \
, nu = dof \
, mu = estimator \
, sd = residual \
, observed = B \
)
# sampling:
trace = PM.sample(draws = 2**10, chains = 2**3, cores = 2**2, target_accept = 0.906)
return model, trace
Once the model is prepared, the distributions can be fit to the data.
models = {}
traces = {}
for cam in cams:
print ('#'*4, f' Creating DLT model for cam {cam} ', '#'*4)
models[cam], traces[cam] = ProbabilisticDLTModel(A[cam], B[cam], Λ[cam])
print()
#### Creating DLT model for cam 1 ####
Autoassigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (8 chains in 4 jobs)
NUTS: [η, ε, λ]
Sampling 8 chains, 0 divergences: 100%██████████ 12192/12192 [00:26<00:00, 468.71draws/s]
The number of effective samples is smaller than 10% for some parameters.
#### Creating DLT model for cam 2 ####
Autoassigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (8 chains in 4 jobs)
NUTS: [η, ε, λ]
Sampling 8 chains, 0 divergences: 100%██████████ 12192/12192 [00:37<00:00, 327.34draws/s]
The acceptance probability does not match the target. It is 0.8449641766015696, but should be close to 0.906. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.7517996349827041, but should be close to 0.906. Try to increase the number of tuning steps.
The number of effective samples is smaller than 25% for some parameters.
#### Creating DLT model for cam 4 ####
Autoassigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (8 chains in 4 jobs)
NUTS: [η, ε, λ]
Sampling 8 chains, 0 divergences: 100%██████████ 12192/12192 [00:37<00:00, 326.80draws/s]
The number of effective samples is smaller than 25% for some parameters.
And here are the results, example model 1:
PM.traceplot(traces[1], combined = True, var_names = ['ε', 'η']);
Hmmm… that Student does not look very normal.
We can get all the \(L_i\)s for each camera (\(\lambda\) here):
summary = PM.summary(traces[1])
summary
mean  sd  hpd_3%  hpd_97%  mcse_mean  mcse_sd  ess_mean  ess_sd  ess_bulk  ess_tail  r_hat  

λ[0,0]  703.478  5.148  713.327  693.761  0.109  0.077  2241.0  2241.0  2191.0  1496.0  1.00 
λ[1,0]  204.950  1.743  208.108  201.676  0.030  0.021  3456.0  3449.0  3603.0  3559.0  1.00 
λ[2,0]  103.390  2.881  108.717  97.805  0.051  0.036  3138.0  3138.0  3135.0  3348.0  1.00 
λ[3,0]  265.808  0.282  265.249  266.306  0.004  0.003  4498.0  4498.0  4432.0  4508.0  1.00 
λ[4,0]  110.696  3.743  103.891  118.065  0.073  0.052  2599.0  2599.0  2548.0  2141.0  1.00 
λ[5,0]  196.683  1.656  199.680  193.472  0.028  0.020  3612.0  3602.0  3793.0  3767.0  1.00 
λ[6,0]  671.693  3.635  678.613  665.013  0.067  0.047  2930.0  2930.0  2887.0  3407.0  1.00 
λ[7,0]  279.143  0.299  278.571  279.699  0.004  0.003  5299.0  5298.0  5166.0  4335.0  1.00 
λ[8,0]  0.188  0.011  0.206  0.165  0.000  0.000  2267.0  2267.0  2222.0  1241.0  1.00 
λ[9,0]  0.275  0.005  0.265  0.285  0.000  0.000  2812.0  2812.0  2913.0  3178.0  1.00 
λ[10,0]  0.171  0.008  0.186  0.157  0.000  0.000  2994.0  2994.0  3037.0  3299.0  1.00 
ε  0.473  0.183  0.200  0.852  0.006  0.004  1080.0  1080.0  778.0  358.0  1.01 
η  9.496  5.839  0.842  19.973  0.100  0.071  3411.0  3411.0  1344.0  364.0  1.01 
Results match what we found with least squares:
print ('\n'.join(map(str, list(zip(map(lambda x: round(x[0], 3), Λ[1][:]), summary['mean'].values[:11])))))
(703.914, 703.478)
(204.858, 204.95)
(103.512, 103.39)
(265.781, 265.808)
(110.53, 110.696)
(196.651, 196.683)
(671.889, 671.693)
(279.123, 279.143)
(0.189, 0.188)
(0.275, 0.275)
(0.172, 0.171)
Finally, and most importantly, instead of using only one fixed value for each \(L_i\), we can sample the posterior distribution to get some values and thereby find out how our point calculations vary.
n_samples = 1000
post_samples = {cam: NP.squeeze(PM.sample_posterior_predictive( \
traces[cam] \
, var_names = ['λ'] \
, samples = n_samples \
, model = models[cam] \
, progressbar = True \
)['λ']) \
for cam in cams }
post_samples[1].shape
/usr/lib/python3.8/sitepackages/pymc3/sampling.py:1246: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample
warnings.warn(
100%██████████ 1000/1000 [00:00<00:00, 29193.81it/s]
100%██████████ 1000/1000 [00:00<00:00, 27780.16it/s]
100%██████████ 1000/1000 [00:00<00:00, 31130.95it/s]
(1000, 11)
There we have many many samples for \(L\)values of each camera. Let’s take a single measurement of \((u_1, v_1, u_2, v_2, u_4, v_4)\) and see how the variation in \(L\) affects the \((x, y, z)\) reconstruction.
We’ll take a regular grid of points in object space.
grid_pts = [NP.linspace(1.0, 1.0, 5, endpoint = True).ravel()]*3
xyz = NP.stack(list(map(NP.ravel, NP.meshgrid(*grid_pts))), axis = 1)
xyz.shape
(125, 3)
We’ll need another function from the argus_gui
code:
def DLTInverse(L, xyz):
uv = NP.zeros((len(xyz), 2))
for k in range(uv.shape[0]):
u = (NP.dot(L[:3].T, xyz[k]) + L[3]) / (NP.dot(L[3:].T, xyz[k]) + 1.)
v = (NP.dot(L[4:7].T, xyz[k]) + L[7]) / (NP.dot(L[3:].T, xyz[k]) + 1.)
uv[k] = [u, v]
return uv
uv = NP.concatenate([DLTInverse(L[cam], xyz) for cam in cams], axis = 1)
uv.shape
(125, 6)
And then we can convert the uv
s back to xyz
s with variable L
s, thereby measuring the inaccuracy that results from our DLT parameters.
reco = NP.stack( \
[ \
UVtoXYZ(uv, NP.stack([post_samples[cam][i, :] for cam in cams], axis = 0)) \
for i in range(n_samples) \
] \
, axis = 2) \
reco_centered = NP.concatenate([reco[:, :, r]  xyz for r in range(reco.shape[2])], axis = 0)
reco_stacked = NP.concatenate([reco[:, :, r] for r in range(reco.shape[2])], axis = 0)
span = NP.array([NP.diff(NP.percentile(reco[:, i], [5, 95])) for i in range(reco.shape[1])]).ravel()
print ('space spanned by the test points:', span)
print (' standard deviation about each point:', reco_centered.std(axis = 0))
space spanned by the test points: [2.0214302 2.0318401 2.02179291]
standard deviation about each point: [0.01706024 0.03142076 0.01727276]
The inaccuracy from calibration in a \(2\ m^3\) volume is in the order of \(0.05\ m\). It drastically increases when moving away from the calibration object (\(0.5\ m\), i.e. factor \(10\), for \(5\ m^3\) cube). Hence, at least part of the inaccuracy problem in our case stems from the fact that the calibration object did not fill the calibration space. Most notably, in was at a different depth than the animal recording.
I conclude that target area (especially distance) has to be taken into account, and calibration should happen at approximately the same position from the cameras as recording, filling the field of view as good as possible.
The utility of probabilistic modeling and posterior sampling cannot be emphasized enough. It is well known for statistics, but it is equally useful to quantify the uncertainty in experimental measurements. Do not hesitate to try it yourself!
As a start, you might want to check this recent youtube tutorial. Or, of course, follow the book and lecture by Richard McElreath which I’ve repeatedly linked on my blog.
OpenCV Attempts: Epipolar Geometry
The DLT reconstruction exemplified above is just scratching on the surface of stereo vision theory, or epipolar geometry. There are a lot of resources that I found relevant for diving deeper:

Lecture by Rich Radke https://www.youtube.com/watch?v=QzYn0OPO0Yw

Lecture by Cyrill Stachniss https://www.youtube.com/watch?v=cLeFKNHgwU

OpenCV documentation https://docs.opencv.org/master/d9/d0c/group__calib3d.html

Argus documentation http://argus.web.unc.edu/formalmethodsanddefinitions
This might help you to build a basic set of vocabulary, containing for example terms as
 camera matrix ←
CV.calibrateCamera
 distortion coefficients ←
CV.calibrateCamera
 fundamental matrix ←
CV.findFundamentalMat
 essential matrix ←
CV.findEssentialMat
 epilines = epipolar lines ←
CV.computeCorrespondEpilines
The openCV
functions next to it are some of many functions related to that topic.
I experienced some trouble with openCV
because my workflow deviates from the supposed calibration.
The trouble arose because (i) my images were undistorted prior to processing, and (ii) I used a box instead of multiple views of a chessboard.
But I also noted that the openCV
error messages are not very instructive. For example, when trying to calibrate from a single scene, one might encounter the error
src\calibration.cpp:3110: error: (210) objectPoints should contain vector of vectors of points of type Point3f in function cv::collectCalibrationData
.
In my case, this could be solved by adding a fake dimension to the object_points
by wrapping it in a list [object_points]
.
Another example are data types, which should always be NP.float32
to avoid openCV
chaos.
Nevertheless, openCV
is probably the most comprehensive image processing library out there, and when you understand the logic behind it and are willing to scan through the raw code, it will doubtlessly be useful.
Epiline Calculation
Here is one example of how to use openCV
to get epilines in corresponding images of different perspective.
One can use the fundamental matrix to relate the images, without even knowing the absolute positions of the calibration object.
When displayed during tracking, epilines can result in more accurate results on lowcontrast landmarks.
The steps to get epilines are as follows:
 digitize points on calibration images of both cameras
 calculate the fundamental matrix with
CV.findFundamentalMat
 compute “epilines” for arbitrary points of interest with
CV.computeCorrespondEpilines
# get two arrays of calibration point coordinates
points = {cam: data.loc[:, [f'{coord}_{cam}' for coord in ['u', 'v'] ]].values \
for cam in cams }
# calculate fundamental matrix
fundamental_matrix = {(cam1, cam2): CV.findFundamentalMat(points[cam1], points[cam2], CV.FM_LMEDS)[0] \
for cam1 in cams for cam2 in cams \
}
The epilines can be displayed on the images:
colors =[tuple(NP.random.randint(150,255,3).tolist()) for _ in range(points[1].shape[0])]
cross_images = { cam: images[cam].copy()//2 for cam in cams } # darkened images
# loop cameras
for cam in cams:
pts = points[cam]
img = cross_images[cam]
if len(img.shape) == 3:
r,c,ch = img.shape
img = img // 2
else:
r,c = img.shape
img = CV.cvtColor(img,CV.COLOR_GRAY2BGR) // 2
# loop the other cameras
for crosscam in cams:
if cam == crosscam:
continue
funmat = fundamental_matrix[(cam, crosscam)]
# add a line for each other image
lines = CV.computeCorrespondEpilines(points[crosscam].reshape(1,1,2), 2, funmat).reshape(1,3)
#print (lines)
for count, r in enumerate(lines):
x0,y0 = [0, r[2]/r[1] ]
x1,y1 = [c, (r[2]+r[0]*c)/r[1] ]
# superimpose line on the image
img = CV.line(img, (int(x0),int(y0)), (int(x1),int(y1)), colors[count],1)
# add a point for the current image
for count, pt1 in enumerate(pts):
img = CV.circle(img,tuple(NP.array(pt1, dtype = int)),5,colors[count],1)
cross_images[cam] = img
Here is the result:
# drawing its lines on left image
fig = MPP.figure(dpi = 300)
for nr, cam in enumerate(cams):
ax = fig.add_subplot(len(cams), 1, nr+1)
ax.imshow(cross_images[cam], cmap = 'gray')
ax.get_xaxis().set_visible(False)
ax.set_yticks([])
ax.set_ylabel(cam)
MPP.tight_layout()
MPP.show()
Looking closely, one can see that some lines are off  again the calibration settings might not be ideal.
OpenCV Camera Calibration
As mentioned above, openCV
is versatile, but also bloated, which is the typical fate of complex code.
I ultimately failed to established my desired workflow in that toolbox due to trouble with CV.calibrateCamera
.
I used it as follows:
GetCMat = lambda foc, cx, cy: NP.array([[foc, 0, cx], [0, foc, cy], [0, 0, 1]], dtype = NP.float32)
GetDistorts = lambda k_vec: NP.array(k_vec, dtype = NP.float32)
CV.calibrateCamera(object_points \
, image_points \
, (1280, 720) # resolution \
, cameraMatrix = GetCMat(*start_values[:3]) \
, distCoeffs = GetDistorts(start_values[3:]) \
, flags = CV.CALIB_USE_INTRINSIC_GUESS \
)
The presence of start_values
therein indicates that I used a manual layer of optimization (scipy.optimize
) to get the correct camera matrix and distortion parameters.
This is highly confusing to me: in my opinion, these parameters should be deterministic.
However, in openCV
, they depend on the start values I give, and how they depend could not even be reproducibly inferred with NelderMead algorithm.
At that point, I researched more and turned to DLT as documented by Argus and Kwon.
Digression: Photogrammetry
Fundamental matrix extraction and epiline generation worked with a fundamental matrix that was calculated from known, manually digitized points.
In contrast, openCV
also offers functions that find keypoints automatically, turns them into descriptors, and matches. Some code to give you a direction:
# Initiate ORB detector
orb = cv.ORB_create(32)
# find the keypoints and descriptors with ORB
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
# brute force matching
bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck = True)
matches = bf.match(des1, des2)
# Alternative: FLANN/K nearest neighbors (knn)
index_params = dict(algorithm = 1, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
flann = cv.FlannBasedMatcher(index_params, search_params)
# match:
matches = flann.knnMatch(np.float32(des1),np.float32(des2),k=2)
This is the heart of Photogrammetry, so feel free to create your own photogrammetry workflow with it.
update 2021/02: More details on Photogrammetry in this blog post.
Summary
Above, I demonstrated how to calibrate an array of cameras in fixed relative position to get 3D data. Accuracy was limited with the simple calibration object I used. But I demonstrated how probabilistic programming and posterior sampling can give you a feel for the accuracy of your personal calibration procedure. Besides that, temporal sync and camera arrangement left room for improvement.
I also pointed out some directions for interested readers to instead use the much more potent openCV
library, with all the pitfalls it contains.
I like the Argus slogan, 3D for the people, and hope that I could contribute my part with this little tutorial. It turns out that it is not that trivial to capture our 3D world in 2D images. But thanks to web documentation and great open source projects, there is hope!
References

Jackson, Brandon E and Evangelista, Dennis J and Ray, Dylan D and Hedrick, Tyson L (2016). 3D for the people: multicamera motion capture in the field with consumergrade cameras and open source software. Biology Open 5: 13341342; https://doi.org/10.1242/bio.018713. http://argus.web.unc.edu, accessed 2020/08/01

Kwon, YoungHoo (2000). Direct Linear Transformation (DLT). Kwon3D Theoretical Foundation. http://kwon3d.com/theories.html, accessed 2020/08/01
Further reading:

AbdelAziz, Y.I., & Karara, H.M. (1971). Direct linear transformation from comparator coordinates into object space coordinates in closerange photogrammetry. Proceedings of the Symposium on CloseRange Photogrammetry (pp. 118). Falls Church, VA: American Society of Photogrammetry.

Miller, N.R., Shapiro, R., & McLaughlin, T.M. (1980). A technique for obtaining spatial kinematic parameters of segments of biomechanical systems from cinematographic data. J. Biomech 13, 535547.
UPDATE 2020/08/04: added probabilistic calibration section