# First principles exploration 👋#

The goal of this article is to explore the topic of the camera response function experimentally. We’ll build up from first principles and the main focus will be on building intuition with code examples. The only pre-requisite is base familiarity with Python + NumPy and, of course, curiosity about the topic. 🧑‍🚀

If you’re looking for a quick way to find out the response function of your camera, you can check out the implementation of CalibrateDebevec in OpenCV, or stick around to get a deeper intuition.

I’ve made an effort to link to interesting resources that provide a good starting point to dig deeper into specific directions that could not be covered in this article. You’re encouraged to follow them and come back to this at your own pace.

👉
You can follow along and play with the code examples. The Jupyter notebook is available on Github. 📔
👉
Subscribe to the RoboAlgorithms YT channel to be notified about the complementary video on the camera response function (coming out soon). 🎥

## What the CRF? 🤔#

A camera measures the light intensity hitting the imaging sensor and outputs an image with pixel values that in some way correspond to that light intensity. This correspondence is modeled by the camera response function (CRF).

For a computer vision engineer, the pixel values are the only observation of the real-world. Thus, it is helpful to understand how they relate to actual light intensities to correctly model that world.

Most cameras have some non-linearity in their response to light intensities. Images taken by consumer-grade cameras are particularly affected by this because they are optimized for viewing experience and not for measurement accuracy.

👉
If we know the response function, we can resolve that ambiguity and turn any camera into a scientific measurement device! 🔬

## Applications of the CRF 🔨#

Many computer vision algorithms that rely on the brightness constancy assumption require accurate estimation of the CRF. For example, optical flow or any direct method for visual SLAM that works with pixel intensities such as DSO.

Direct methods for visual SLAM are currently one of the most robust approaches for accurate robot localization in challenging environments where GNSS coverage is unreliable. An autonomous last-mile robot delivery service is an example that’s increasingly gaining commercial traction in recent years and may use such technology.

💡
One could argue that due to the shift towards deep learning approaches and more readily available cameras that are optimized for machine vision, it will be unnecessary to manually calibrate for the camera response function in the future. Nevertheless, I still see value in exploring the fundamentals for learning purposes.

In this article, we’ll look at the application of generating HDR images which also relies on knowledge about the camera response function because we need to “fuse” multiple images taken at different exposure times.

# How cameras measure light ☀️#

A camera is an array of photodiodes that converts incoming photons into electrical current or voltage. This phenomenon is known as the photoelectric effect and is the same phenomenon utilized in solar panels.

👉
We measure the voltage to determine the amount of incident light.

For each pixel, there’s a corresponding photodiode in the camera sensor. We take an image by exposing the sensor to light for a given period (exposure time). Since the photodiodes produce an electrical current proportional to the amount of light they receive, we can use that measurement to reconstruct the image.

## Converting voltages to pixel values 📺#

When we measure the voltage generated by the photodiode we get an analog quantity. But we need a digital value that fits into our pixel representation of 8 bits. A device that can perform this conversion is called an Analog-Digital (A/D) converter. Most A/D converters used in cameras today will output an 8, 12, or 16-bit digital value.

### Analog-to-Digital Converter 📈#

We will start with an 8-bit A/D converter. Note that regardless of the bit depth, the range of possible values will be limited in the digital domain. Meaning that with 8-bit we only have $$2^{8} = 256$$ , unique values to assign to all possible voltage measurements.

Let’s simulate a naive A/D converter:

def max_value(bit_depth: int) -> int:
"""Return max integer value for the specified bit depth"""
return 2**bit_depth - 1

def adc(milivolts: np.ndarray, bit_depth: int = 8) -> int:
"""Naive A/D converter

milivolts  -- accumulated voltage from a photodiode
bit_depth  -- target bit depth of the digitized output
"""
return np.clip(np.around(milivolts), 0, max_value(bit_depth))

"""Plot a few interesting examples"""
f, axes = plt.subplots(1, 3, figsize=(20, 5))
x_label = 'Input voltage (mV)'
y_label = 'Output pixel value'

voltages = np.arange(0, 10, 0.1)
"In supported range", x_label, y_label)

voltages = np.arange(0, 0.5, 0.01)
"Below supported range", x_label, y_label)

voltages = np.arange(255, 500, 1)
"Above supported range", x_label, y_label)



#### Quantization noise#

From the staircase-like plot above we can see the discrete jumps caused by the quantization effects. Because we’re losing some precision at this step, this effect also becomes a component of the total image noise known as the quantization noise.

#### Under- and overexposure#

The plot also demonstrates that any values below or above the supported range are irreversibly lost. This phenomenon is known as under- or overexposure. Our goal will always be to choose the exposure time such that this effect is minimized.

#### Dynamic Range#

The range of light intensities that the camera can capture without under- or overexposure is called the camera dynamic range. It is a very important metric for outdoor robot operation due to the large variations in light intensities.

💡
Curiously, extending the dynamic range of the camera isn’t as simple as increasing the bit-depth of the A/D converter. The primary limiting factor is usually linked to the physical full-well capacity and noise level of the photodiodes. The bit-depth of the A/D converter is then selected to maximize that.

So our goal is to maximize the information we can get from every image. We have two ways to optimize for that:

1. Choose a camera with a higher dynamic range
2. Adjust the exposure time to minimize over- and underexposure
👉
For very dimly lit scenarios, it is not always an option to adjust the exposure time because a very long exposure time can cause motion blur.

### Programmable Gain Amplifier ⬛#

Underexposure happens for dimly lit scenes or shorter exposure times when the photodiodes don’t have a chance to convert a sufficient amount of photons into electrical current. This is how a fully underexposed image looks:

photodiode_array_milivolts = np.array([0.028, 0.213, 0.234, 0.459])

util.display_images([pixel_values], titles=["Underexposed image"])


Cameras have a programmable-gain amplifier (PGA) that can amplify the voltages from the photodiodes and bring them in the range supported by the A/D converter. The level of amplification can be configured on most cameras through the ISO setting.

👉
We can capture images in the dark without increasing the exposure time as much.
def pga(milivolts: np.ndarray, gain: float = 1.0) -> float:
"""Naive programmable-gain amplifier

milivolts  -- input voltage from photodiode
gain       -- ratio output / input
"""
return gain * milivolts

# Add an amplifier to the same example values from above
util.display_images([pixel_values], titles=["Amplified image"])


There’s a caveat: we did not model image noise. The weaker signal in dimply lit scenes results in a lower signal-to-noise ratio (SNR). In other words, amplifying the signal will also amplify the noise and that’s why using a higher gain usually results in grainier and lower quality images. In extreme cases when the strength of the signal is below the noise floor it can be difficult to extract any useful information.

Reducing the image noise can be performed in the digital image processing pipeline which is the final step before saving the image. This part significantly differs between camera models and may include white balance, tone mapping, gamma encoding, image compression and more.

### Image Acquisition Pipeline 📸#

The figure below provides an overview of the discussed components.

💡
The camera response function is a single function that approximates the response of the whole system with all components inside. More on this later.

I’m aware that I’ve touched upon a lot of topics very quickly. If you don’t have a good intuition already, I highly recommend reading this amazing interactive introduction on Cameras and Lenses. Seriously, please go check this article out.

The Science of Camera Sensors is another great resource in video form that provides an excellent introduction to the physics of how photodiodes and camera sensors work.

Advanced: For a more in-depth look into the theory and physics behind CMOS photodiodes, chapter 3 from this dissertation provides a good overview.

# Simulating a simple camera 📷#

## Disclaimer ℹ️#

For the sake of brevity, many of the physical aspects of cameras are oversimplified in the presented examples. They are only meant to aid the intuition and shouldn’t be relied upon for accuracy or completeness. However, if you spot any mistakes, I would love to hear about them at 📨 or on 🐦 twitter.

## Assumptions 📝#

👉
The following assumptions provide the framework for the rest of the article. Feel free to skip this section and refer back to it later if you want to jump to the fun part already. Some parts may only become clear after completing the entire article.

### Camera#

💡
Despite these simplifying assumptions, the concepts learned here are applicable to real-world RGB cameras with any kind of optical lens. Try to understand why.

### Scene#

• receives constant light intensity from the light source (brightness constancy)
• static — no moving objects in the scene during or between image acquisitions

In a practical sense, it means that we cannot accurately determine the camera response function on a partly cloudy day while relying on sunlight as our light source because the clouds may result in significant changes in the radiant flux received by the scene.

💡
Beware that some artificial light sources might also flicker.
👉
Given the above assumptions, if we take multiple images with the same exposure time t we must always get the same pixel values (not accounting for image noise).

## Simulating a 2x2 pixel sensor 🧱#

We will use the term pixel irradiance to refer to the number of photons hitting the physical area of a pixel per second ( $$m^{−2}⋅s^{−1}$$ ). You can also refer to SI photon units definition on the wiki page of photon counting.

import numpy as np

We have an imaging sensor of 2x2 pixels (stored as 1D array)
The sensor irradiance encodes how many photons
are hitting the image sensor every second.
The values are arbitrarily chosen.
"""
pix11_irradiance = 1122 # photons / second
pix12_irradiance = 3242 # photons / second
pix21_irradiance = 1452 # photons / second
pix22_irradiance = 25031 # photons / second

return np.array([


Let’s start with the simple case of a linear camera response function. This intuitively means that doubling the amount of light (pixel irradiance) would result in doubling the observed pixel value. We define the CRF as:

$$CRF = G(x) = a\cdot x + b$$

We’re going to set $$a = 1$$ and $$b = 0$$ and assume that each photon increases the voltage by 1mV. This is easy to simulate but unfortunately, 100% detached from reality. Accurate simulation of that process is not in the scope of the article but feel free to enter that rabbit hole starting from quantum efficiency ;-). We also assume white Gaussian noise which approximates thermal noise. All other sources of noise will be ignored.

$$CRF = G(x) = x$$

To get an estimate of the voltage accumulated in each pixel, we need to integrate the irradiance over the period of the exposure time t. We define taking an image as:

👉
$$I(k) = G(adc(pga(t\cdot E(k))))$$

where E is the sensor irradiance (i.e. an array of all pixel irradiances), G is the camera response function as already defined above, t is the exposure time in seconds, I is the image intensity or observed pixel value and k is the index of the pixel.

💡

In the literature, you may find the equation defined in its simpler form:

$$I(k) = G^{*}(t\cdot E(k))$$

where the camera response function is:

$$G^{*}(x) = G(adc(pga(x))$$

I’ve chosen to keep the functions separate here so that the logic for amplification and quantization is not intermingled with the CRF in the Python code.

Let’s code this:

from typing import Callable

ResponseFunction = Callable[[np.ndarray, int], np.ndarray]

def capture_image(E: np.ndarray, t: float, G: ResponseFunction,
bit_depth: int = 8, std_noise: float = 1.0) -> np.ndarray:
"""Simulate capturing an image with exposure time t.

You can specify higher bit_depth for internal processing prior to
finally converting the image to 8-bit depth before output.

t         -- exposure time (seconds)
G         -- camera response function (CRF)
bit_depth -- bit depth for A/D converter output and response function input
std_noise -- standard deviation for the Gaussian noise distribution
"""
# Non-realistic but simple calculation
accumulated_milivolts = E * t

# Add the white Gaussian noise
noise = np.random.normal(0, std_noise, size=len(accumulated_milivolts))
accumulated_milivolts += noise

image = G(
)

# Always convert to 8-bit depth prior to saving / outputting
to_8bit_depth = max_value(8) / max_value(bit_depth)
return np.around(image * to_8bit_depth)

def linear_crf(X: np.ndarray, bit_depth: int) -> np.ndarray:
"""Linear camera response function G(x) = x"""
return X



Now take a couple of images with different exposures:

img1 = capture_image(E=get_sensor_irradiance(), t=1/10, G=linear_crf)
util.display_images([img1, img2],
titles=["Exposure: t = 1/10 second",
"Exposure: t = 1/100 second"])


We can correctly expose all pixels with 1/100 second exposure time (none of the pixels are 0 or 255). It becomes more interesting if we have a scene with more extreme values which cannot be captured by the dynamic range of the camera sensor.

Think of an image from inside a cave looking out.

We can simulate this by choosing more extreme values for the sensor irradiance:

def get_extreme_sensor_irradiance():
# Note that the range spans multiple orders of magnitude
return np.array([182, 5432, 15329, 252531])

t1, t2 = 1/5, 1/500
util.display_images([img1, img2],
titles=["Exposure: t1 = 1/5 second",
"Exposure: t2 = 1/500 second"])


In this example, it is not possible to capture the scene without loss of information regardless of the chosen exposure time. Correctly exposing the brightest pixel(s) will result in underexposing the darkest pixel(s) and vice versa. Note that in the second image we have one underexposed and one overexposed pixel.

# HDR Images 🪂#

The pixel values provide us only partial information due to the limited dynamic range of the camera. To reconstruct an HDR image, we need to recover the full sensor irradiance.

💡
Remember that outside our simulation, we can only observe the pixel values from the image and not the actual sensor irradiance.
👉
We can do that by combining information from multiple images at different exposure times that allow us to expose each pixel correctly.

## Inverse CRF 🔙#

The inverse camera response function allows us to do precisely that. It defines the mapping from pixel values back to sensor irradiance or light intensity.

We define $$U := G^{-1}$$ as the inverse of the CRF. Here we assume that G is monotonically increasing and thus invertible. That’s a fair assumption because cameras are mapping brighter scene regions to brighter image intensities.

Remember that we defined taking an image as $$I(k) = G(t\cdot E(k))$$ . To go the other direction, we can apply the inverse function $$U(I(k)) = t\cdot E(k)$$ and we can get the sensor irradiance E from the image I if we know the exposure time t:

👉
$$E(k) = \frac{U(I(k))}{t}$$

The inverse CRF is also identity function $$f(x) = x$$ because:

$$U(G(x)) = x$$

💡
I am using image irradiance to refer to the (partial) sensor irradiance recovered from a single image. This may differ from usage in the literature.
InverseResponseFunction = Callable[[np.ndarray], np.ndarray]

t: float, U: InverseResponseFunction) -> np.ndarray:
"""Recover the irradiance from an image that was
taken with known exposure time t.

We cannot recover any useful information from
oversaturated (255) or undersaturated (0) image
intensities. Return 0 in both cases as this helps
eliminate the need for separate case handling.

I  -- image (array of image intensities)
t  -- exposure time (seconds)
U  -- inverse of the response function
"""
I[I == 255] = 0 # oversaturated pixels
return U(I) / t

def linear_inverse_crf(I: np.ndarray) -> np.ndarray:
"""Linear inverse camera response function U(x) = x"""
return I

print(f"\nImage irradiance (t = {t1}):")


Image irradiance (t = 0.2): [185. 0. 0. 0.]

Image irradiance (t = 0.002): [ 0. 6000. 15000. 0.]

Notice that we can recover the irradiance only for pixels that are not under- or overexposed. To recover the full sensor irradiance, we need to capture a series of images with different exposure times such that each pixel was correctly exposed at least once.

Let’s take another image with a shorter exposure time:

t3 = 1/2000


Image irradiance (t = 0.0005): [ 0. 6000. 16000. 254000.]

We can now estimate the full sensor irradiance by combining the individual image irradiances. Notice that the second and the third image have a conflicting estimate of the third pixel value (15000 vs 16000). We’re going to take the average.

from typing import List

exposures: List[float],
U: InverseResponseFunction) -> np.ndarray:

images    -- list of images to use for recovery
exposures -- corresponding list of exposures
U         -- inverse response function
"""
assert len(images) == len(exposures), \
"The number of images and exposures don't match"
assert len(images) > 0, "Expected non-empty list of images"

image_size = images[0].shape[0]

for I, t in zip(images, exposures):
# Only non-zero values are observed and contribute to average

out=np.zeros(image_size),


Let’s compare that to the known ground-truth.

estimated_sensor_irradiance = estimate_sensor_irradiance_average(
[img1, img2, img3], [t1, t2, t3], U=linear_inverse_crf)

util.print_error_to_ground_truth(


Ground-truth: [ 182 5432 15329 252531]

Estimated: [ 180. 6750. 13750. 248000.]

Error (diff): [ 2. -1318. 1579. 4531.]

It seems we’re close but have lost some precision due to quantization effects and image noise. Can we do better? Maybe average over more images?

### Improving the sensor irradiance estimate#

In this section, we run several experiments to better understand how the noise affects the accuracy. First, we need a more representative dataset with more images.

from typing import Tuple, List

ImagesExposuresTuple = Tuple[List[np.ndarray], List[float]]

def generate_sliding_exposure_dataset(
E: np.ndarray, G: ResponseFunction, bit_depth: int = 8,
start_us: int = 40, step_perc: int = 5, number_exposures: int = 220,
images_per_exposure: int = 1) -> ImagesExposuresTuple:
"""Generate dataset with sliding exposure values

G                    -- camera response function
bit_depth            -- bit depth for the camera's internal processing
start_us             -- microseconds for the starting value of the sequence
step_perc            -- percentage step between subsequent exposure values
number_exposures     -- number of unique exposure times to generate
images_per_exposure  -- number of images per exposure time
"""
def get_exposure_seconds(idx: int) -> float:
multiplier = (1 + (step_perc / 100)) ** idx
return round(start_us * multiplier) * 1e-6

images, exposures = [], []
for exposure_idx in range(number_exposures):
for _ in range(images_per_exposure):
t = get_exposure_seconds(exposure_idx)
images.append(capture_image(E=E, t=t, G=G))
exposures.append(t)

return images, exposures


Feel free to experiment with the parameters for the dataset generation and see what the impact on the final sensor irradiance estimates is.

👉
We can verify that the dataset is statistically well distributed by plotting the number of well-exposed observations for each of the pixels in the image.
images, exposures = generate_sliding_exposure_dataset(

util.display_valid_observations(np.array(images))


Following, I define a helper function that will help us quickly run a few experiments.

EstimateSensorIrradianceFunction = Callable[[List[np.ndarray], List[float],
InverseResponseFunction], np.ndarray]

images_per_exposure: int,
number_experiments: int) -> List[np.ndarray]:
"""Get sensor irradiance estimates from a number of experiments"""

estimates = []

for _ in range(number_experiments):
images, exposures = generate_sliding_exposure_dataset(
images_per_exposure=images_per_exposure)
images, exposures, U=linear_inverse_crf)

return estimates


Let’s run 10 experiments with a single image per exposure time.

irradiance_estimates = get_irradiance_estimates(
images_per_exposure=1, number_experiments=10)
util.plot_errors_to_ground_truth(


RMSE: 1425.4632661240885

Now we have a baseline of root-mean-square error (RMSE) that we want to minimize. I’ve also plotted the per-pixel RMSE and the per-experiment absolute error for the estimate.

👉
We can smoothen the noise variations between the experiments by averaging over multiple images with the same exposure.
rradiance_estimates = get_irradiance_estimates(
images_per_exposure=10,
number_experiments=10)
util.plot_errors_to_ground_truth(
irradiance_estimates, ground_truth=get_extreme_sensor_irradiance())

RMSE: 1215.9117041699783

The vertical oscillations have now subsided and the RMSE is slightly improved.

Notice that the errors for most pixels, except the fourth (red in the figure), are biased toward negative values. This is not caused by the image noise because we would otherwise expect a mean error of zero.

Let’s plot the distribution of the observations that we have for each pixel.

util.display_observations_distribution(np.array(images))


Now we see that the observations in our dataset are similarly skewed. The fourth pixel which had the smallest bias in the estimate also has the best observation distribution.

The skewed observations are the result of using a geometric sequence for the exposure times. In our case it looks like this:

util.plot_exposures(exposures)


The geometric progression is convenient here due to the large range of values we need to cover. We are able to do so with just over 200 exposures. If we use an arithmetic progression, on the other hand, we would need significantly more pictures in the dataset which is not always practical or possible.

Instead, we can improve the algorithm by adding a weighting function that discounts the contribution from observations with lower and higher pixel values.

I’m going to use the weighting function proposed by Debevek & Malik:

$w(z) = \begin{cases} z - Z_{min} & \text{for } z\leq \frac{1}{2}(Z_{min} + Z_{max})\\ Z_{max} - z & \text{for } z\gt \frac{1}{2}(Z_{min} + Z_{max}) \end{cases}$

where $$Z_{min}$$ and $$Z_{max}$$ are the least and greatest pixel values — 0 and 255 respectively.

$w(z) = \begin{cases} z & \text{for } z\leq 127\\ 255 - z & \text{for } z\gt 127 \end{cases}$

This function discounts both lower and higher pixel values. Besides the higher robustness towards different noise sources (see also CCD Saturation and Blooming), it also reflects the fact that values in the mid-range are most accurately encoded by non-linear camera response functions (more on this later).

def weighting_debevek_malik(pix_value: int) -> int:
if pix_value <= 127:
return pix_value
if pix_value > 127:
return 255 - pix_value


Let’s rewrite the function for sensor irradiance estimation to use the weighting function:

def estimate_sensor_irradiance_weighted(
images: List[np.ndarray], exposures: List[float],
U: Callable[[np.ndarray], np.ndarray], w: Callable[[int], int]) -> np.ndarray:

The recovered image irradiances are weighted according to the provided
weighting function. This works best with higher number of images
and well-distributed exposure times.

images    -- list of images to use for recovery
exposures -- corresponding list of exposures
U         -- inverse response function
w         -- weighting function
"""
assert len(images) == len(exposures), \
"The number of images and exposures don't match"
assert len(images) > 0, "Expected non-empty list of images"

image_size = images[0].shape[0]
w_vec = np.vectorize(w)

for I, t in zip(images, exposures):
weights = w_vec(I)
# Only non-zero values are observed and contribute to average

out=np.zeros(image_size),


RMSE: 165.4355269685378

This change alone provides an almost 10x improvement in the estimate errors. I found out that using the square of the weighting function provides even better results.

def weighting_squared(pix_value: int) -> int:
return weighting_debevek_malik(pix_value) ** 2


RMSE: 72.49150623466912

And here’s the final estimate:

images, exposures = generate_sliding_exposure_dataset(

images, exposures, U=linear_inverse_crf, w=weighting_squared)
util.print_error_to_ground_truth(


Ground-truth: [ 182 5432 15329 252531]

Estimated: [ 184.362 5436.691 15341.962 252564.875]

Error (diff): [ -2.362 -4.691 -12.962 -33.875]

## Absolute vs Relative Irradiance ⚠️#

👉
So far we’ve successfully recovered the relative sensor irradiance values.

Recovering the absolute sensor irradiance isn’t quite as straightforward for real cameras. We’ve used a very simplified model of capturing images and haven’t taken the photodiode’s quantum efficiency and other physical aspects into consideration.

Although it may seem that we fully recovered the irradiance values, this method can only recover the relative irradiance values up to an unknown constant factor $$X$$ .

$$E_{absolute} = X \cdot E_{relative}$$

To determine $$X$$ , an absolute radiometric calibration can be performed. The process involves taking an image of a surface with known reflectance values in a controlled light environment.

Conveniently, the relative irradiance values that we obtained are sufficient for most applications, including HDR images.

👉
The irradiance values we recovered are our HDR image!

## Displaying HDR Images 🖥️#

We could display the HDR image by normalizing the values within the 0-255 range.

def normalize_255(image: np.ndarray) -> np.ndarray:
"""Normalize values withing range [0, 255]"""
min_i, max_i = min(image), max(image)
assert min_i < max_i, "Range cannot be normalized (min == max)"
return np.around((image - min_i) * (255 / (max_i - min_i)))

util.display_images([hdr_linear], titles=["HDR (linear)"])


After normalization, the darkest and brightest pixels are pinned to the values of 0 and 255 respectively. All other values are linearly adjusted. The problem with this method is that usually the scene will have very few but extremely bright spots and this would cause the majority of the rest of the scene to appear underexposed.

The image above is a good example of this effect.

This is a limitation of the display technology today. Most devices still have a non-HDR display which is only capable of displaying 24-bit color which is exactly 8-bit per RGB channel. This sets a hard limit on the range of values [0-255].

## Gamma encoding 📈#

To go around this problem cameras perform gamma encoding when saving images in 8-bit format. In doing so, the limited range can be utilized more efficiently.

def gamma(X: np.ndarray) -> np.ndarray:
assert all(X >= 0) and all(X <= 1)
return X**(1/2.2)

def plot_gamma_function():
plt.title("Linear vs Gamma")
plt.xlabel("Input")
plt.ylabel("Output")

x = np.arange(0, 1, 0.001)
plt.plot(x, gamma(x))
plt.plot(x, x)

plot_gamma_function()


The gamma function encodes lower values (darker tones) with significantly higher resolution. This mapping exploits the fact that our eyes perceive brightness proportionally to the logarithm of the light intensity. See Weber-Fechner law.

Think of a light bulb with adjustable brightness. If we increase the light intensity by N two consecutive times, the perceived brightness change will be larger the first time.

The gamma encoding is reversed by a gamma decoding step which applies the inverse of the gamma function. Your monitor likely expects gamma encoded images and automatically does the decoding. All of this is formally defined in the sRGB color space.

Note that the gamma function is just one of many possible tone mapping techniques. We are not going to cover more, but I highly recommend this great article on color spaces.

For very high dynamic range images you can also take the logarithm and normalize the resulting values over the [0-255] range. Following is an example of both approaches.

def apply_gamma(E: np.ndarray, bit_depth: int = 8):
return gamma(E / max_value(bit_depth)) * max_value(bit_depth)

util.display_images([hdr_linear, hdr_gamma, hdr_log], titles=["HDR (linear)", "HDR (gamma)", "HDR (log)"])


👉
The darker values are now much better represented and we can easily identify the nuances.

# Non-linear CRF 🎛️#

Up to now, we’ve assumed that the CRF is linear $$G(x) = x$$ . However, we’ve seen that it makes sense for cameras to encode images using a non-linear curve to optimize for dynamic range and appearance.

👉
This can result in different pixel values for the same light intensity.

Additional non-linearities might be introduced by other parts of the system — e.g. the A/D converter or other digital image processing steps such as white balance correction. The goal of the CRF calibration is to find the overall response of the camera that combines all these factors into a single function.

# CRF Calibration 🔬#

In this section, we’re going to implement a simple but effective algorithm for determining the camera response function from a set of monochrome images. The algorithm is inspired by the approach presented in the paper: A Photometrically Calibrated Benchmark For Monocular Visual Odometry.

## Calibrate from a single pixel 💡#

Let’s iteratively discover the approach by first considering an image consisting of 1 pixel.

Given our assumptions about the scene, we have full control over the amount of light that a pixel receives by controlling the exposure time. We don’t know the absolute value but we know that if we double the exposure time, we receive double the amount of light.

👉
If we correlate that known light change with the resulting pixel value, we can determine the function that relates them.

The inverse CRF takes a pixel value as an argument and we know that pixel values are integers in the range [0-255]. So we can define it as a discrete function. In Python, this translates to an array where the index of the array is the input to the function.

U = np.array([0.] * 256)


We need to have observations for each possible pixel value to fully determine the inverse CRF. So our calibration dataset needs to start from a fully underexposed image and iteratively increase exposure time until the image is overexposed.

Here it makes more sense to use arithmetic progression for the exposures so that we have dense coverage over the entire range.

def generate_crf_calibration_dataset(
E: np.ndarray, G: Callable[[np.ndarray, int], np.ndarray],
bit_depth: int = 8, start_ms: int = 1, end_ms: int = 201,
step_ms: int = 1, images_per_exposure: int = 1) -> ImagesExposuresTuple:
"""Generate dataset with sliding exposure values in the specified range

G                    -- camera response function
bit_depth            -- bit depth for the A/D converter of the camera
start_ms             -- start value (ms) for the range of exposure values
end_ms               -- end value (ms) for the range of exposure values
step_ms              -- step size (ms) for the range of exposure values
images_per_exposure  -- number of images per exposure time
"""

exposures = [x / 1e3 for x in range(start_ms, end_ms, step_ms)]
exposures *= images_per_exposure

images = []
for t in exposures:
images.append(capture_image(E=E, t=t, G=G, bit_depth=bit_depth))

return images, exposures


### Recovering linear response#

As a first example, we’re going to generate a dataset using the linear response function and attempt to recover it. We use an arbitrarily chosen pixel irradiance:

single_pixel_irradiance = np.array([1381])


Generate the dataset and verify that it provides a good distribution of pixel values.

images, exposures = generate_crf_calibration_dataset(
util.display_images(images[0::10], shape=(1,1), annot=False)


util.display_observations_distribution(np.array(images))


We don’t have sufficient images to cover every value from the range (200 images vs 256 values) but we have good enough coverage. Reality isn’t perfect either so let’s try to work around that and recover the reverse CRF function U.

$$U(I(x)) = t\cdot E(x)$$

def recover_U_single_pixel(exposures, images):
# Define an arbitrary (relative) sensor irradiance
# Expected to be wrong by a constant factor X

# Initialize the inverse function U to 0s
U = np.array([0.] * 256)

for t, image in zip(exposures, images):
pixel_value = int(image[0])
if pixel_value == 255 or pixel_value == 0:
# Skip under- or overexposed pixels
continue
U[pixel_value] = accumulated_photons

return U

util.plot_inverse_crf(recover_U_single_pixel(exposures, images))


The zero values are caused by unobserved pixel values. One way we can resolve that is by taking more images with finer exposure steps to cover the entire range of values.

👉
We’re going to interpolate the missing values from their nearest neighbors.

Then we get a very good approximation without stricter requirements on the dataset.

def interpolate_missing_values(U):
observed_intensity = np.zeros(len(U), dtype=bool)
observed_intensity[U != 0.0] = True
U[~observed_intensity] = np.interp(
(~observed_intensity).nonzero()[0],
observed_intensity.nonzero()[0],
U[observed_intensity])
return U

util.plot_inverse_crf(
interpolate_missing_values(
recover_U_single_pixel(exposures, images)))


The recovered function looks a bit noisy. Debevek & Malik propose adding a smoothness term based on the sum of squared second derivatives.

👉
We’re going to follow a simpler approach of averaging the values over multiple images from the same exposure.
# Create a dataset with multiple images for each exposure time
images, exposures = generate_crf_calibration_dataset(

def recover_U_single_pixel_average(exposures, images):

U_sum = np.array([0.] * 256)
U_count = np.array([0] * 256)

for t, image in zip(exposures, images):
pixel_value = int(image[0])
if pixel_value == 255 or pixel_value == 0:
# Skip under- or overexposed pixels
continue

# Sum all observations for specific pixel value
U_sum[pixel_value] += accumulated_photons
U_count[pixel_value] += 1

# Calculate the average
observed_intensity = np.zeros(len(U_sum), dtype=bool)
observed_intensity[U_count != 0] = True

return np.divide(U_sum, U_count, out=np.zeros_like(U_sum),
where=observed_intensity)

recovered_inverse_crf_linear = interpolate_missing_values(
recover_U_single_pixel_average(exposures, images))
util.plot_inverse_crf(recovered_inverse_crf_linear)


Nice and smooth 😊 I call this a success. Let’s move onto more interesting examples!

### Recovering gamma response#

To have meaningful gamma encoding, we need to quantize the images with a 12-bit A/D converter which provides a higher dynamic range that we can then effectively compress down to 8-bit with the gamma encoding. Note that we use 12-bit depth for the camera internal processing but the result is still an 8-bit (gamma encoded) image.

images, exposures = generate_crf_calibration_dataset(
start_ms=1, end_ms=3001, step_ms=15)


Note that I also had to increase the step size for the exposure sequence to cover the extended dynamic range of the 12-bit A/D converter while keeping the total number of exposures constant between the examples.

util.display_images(images[0:200:10], shape=(1,1), annot=False)


💡

The gamma dataset (bottom) appears to follow a more linear progression compared to the linear dataset (top). That’s because your monitor does gamma decoding and wrongly displays non-gamma encoded images.

util.display_observations_distribution(np.array(images))


recovered_inverse_crf_gamma = interpolate_missing_values(
recover_U_single_pixel_average(exposures, images))
util.plot_inverse_crf(recovered_inverse_crf_gamma)


💡
Direct comparison of the relative irradiance values on the Y-axis between the gamma and linear (inverse) response functions reveals the wider range of relative irradiances that the gamma function is capable of encoding (range 0-300 vs 0-18).

## Calibrate from multiple pixels 🐣#

As demonstrated in the previous section, it is possible to calibrate the camera response function by using a single pixel. The downside of this approach is that it requires a lot of images to densely cover the entire range of values and average out the noise — we took 20 images @ 200 unique exposures resulting in a total dataset size of 4000 images.

Let’s hypothetically consider the other extreme case:

👉
What if we have a dataset consisting of a single image with dimensions 64x64?

The image would have 4096 pixels in total. That’s approximately the same number of pixels as in our dataset with single pixel images. Assuming there’s a good variation in the pixel irradiances, would this provide us with the same amount of information?

The answer is no because we wouldn’t know the relative irradiances between the pixels.

💡
In the case of a single pixel, we had a single irradiance value and we could calculate the change in the accumulated light based on the exposure time alone. In order to relate pixel values in the same image, we need to recover the relative sensor irradiance. We already solved this problem in the HDR image section.

The catch is that we already need to know the CRF to recover the irradiance values, but the irradiance values are needed to recover the CRF. The chicken and the egg puzzle?

In the following equation, both $$U(x)$$ and $$E(x)$$ are unknown.

$$U(I(x)) = t\cdot E(x)$$

But we know they’re both constant so if fix one and solve for the other, we can iteratively converge to a solution. In other words, we’re going to attempt to evolve the chicken and the egg simultaneously! 🐣

def recover_U(E: np.ndarray, exposures: List[float], images: List[np.ndarray]):
"""Attempt to recover the inverse CRF (U)

Calculation based on the formula U(I(x)) = t * E(x)
where I(x), t and E(x) are given as arguments.

All observations for a given I(x) are summed up and
the average is calculated. Non-observed values are interpolated.

exposures  -- list of exposure times t (seconds)
images     -- images
"""
U_sum = np.array([0.] * 256)
U_count = np.array([0] * 256)

for t, image in zip(exposures, images):
for i in range(len(image)):
pixel_value = int(image[i])
if pixel_value == 255 or pixel_value == 0:
continue # Skip under- or overexposed pixels
accumulated_photons = t * E[i]

# Sum all observations for specific pixel value
U_sum[pixel_value] += accumulated_photons
U_count[pixel_value] += 1

# Calculate U as the average of all observations
observed_intensity = np.zeros(len(U_sum), dtype=bool)
observed_intensity[U_count != 0] = True
U = np.divide(U_sum, U_count, out=np.zeros_like(U_sum),
where=observed_intensity)

return interpolate_missing_values(U)

def recover_U_and_E(exposures, images, iterations=5):
"""Attempt to solve for both the inverse CRF and the sensor irradiance

Given a dataset, perform a number of iterations by alternatingly solving
for U(x) and E(x) to check if they can converge to a solution.
"""
E = np.array([100] * len(images[0]))
U = np.array([0.] * 256)

response_function = lambda x: U[int(x)]

U_list = []
E_list = []
for _ in range(iterations):
U = recover_U(E, exposures, images)
U_list.append(U)
images, exposures,
U=np.vectorize(response_function),
w=weighting_debevek_malik)
E_list.append(E)

return U_list, E_list


Let’s generate the dataset:

images, exposures = generate_crf_calibration_dataset(
G=apply_gamma,
images_per_exposure=5,
bit_depth=12,
start_ms=1,
end_ms=3001,
step_ms=15)


I’ve experimented with the parameters until I got good coverage of pixel values for each pixel in the image. Here are the first ten darkest images from the sequence:

util.display_images(images[0:10], annot=False)


And here’s a representation of the full (subsampled) dataset:

util.display_images(images[0:200:10], annot=False)


Now we optimize for both the irradiance image and the inverse response:

U_list, E_list = recover_U_and_E(exposures, images, iterations = 5);


Here are the inverse response functions after each iteration of the optimization:

util.plot_multiple_inverse_crf(U_list)


And the irradiance images after each iteration:

util.display_images([normalize_255(np.log(E + 1)) for E in E_list])


print("Normalized Ground-truth: ", normalize_255(get_sensor_irradiance()))
print("Normalized Estimate: ", normalize_255(E_list[-1]))
print("Error: ", normalize_255(


Normalized Ground-truth: [ 0. 23. 4. 255.]

Normalized Estimate: [ 0. 23. 4. 255.]

Error: [0. 0. 0. 0.]

This approach quickly converges to the correct solution for both U and E. It starts falling apart if we try it with the more extreme sensor irradiance values.

images, exposures = generate_crf_calibration_dataset(
images_per_exposure=5, bit_depth=12,
start_ms=1, end_ms=3001, step_ms=15)

util.display_images(images[0:10], annot=False)
util.display_images(images[0:200:10], annot=False)


U_list, E_list = recover_U_and_E(exposures, images, iterations=5);
util.plot_multiple_inverse_crf(U_list)
util.display_images([normalize_255(np.log(E + 1)) for E in E_list])


Nothing we can’t resolve by throwing 50 iterations at it 😉

U_list, E_list = recover_U_and_E(exposures, images, iterations=50);
util.plot_multiple_inverse_crf(U_list[::10])
util.display_images([normalize_255(np.log(E + 1)) for E in E_list[::10]])


print("Normalized Ground-truth: ", normalize_255(get_extreme_sensor_irradiance()))
print("Normalized Estimate: ", normalize_255(E_list[-1]))
print("Error: ", normalize_255(


Normalized Ground-truth: [ 0. 5. 15. 255.]

Normalized Estimate: [ 0. 5. 15. 255.]

Error: [0. 0. 0. 0.]

As we’ll see in the next section, real datasets aren’t quite as extreme and converge within a reasonable number of iterations. In this example, we have only 2x2 images so any extreme values constitute a large percentage of the total pixels. Real-world images might have larger extremes but are constrained to a smaller percentage of the image.

## Calibrate from a real dataset 🎉#

And finally, we get to test our algorithms on a non-simulated dataset acquired with an actual camera! Yay!

You can download the dataset that I’ve used from here: Monocular Visual Odometry Dataset. We’re going to be using only two sequences from the dataset: narrow_sweep3 and narrowGamma_sweep3.

If you want to create a calibration dataset for your camera, you need to be able to programmatically control the exposure time. The camera needs to be static and point towards a static scene with good variability in surface reflectances. Ideally, you want to have close to a flat histogram. The light source has to be constant — beware that LED lights usually are not! You can use sunlight on a non-cloudy day.

Once you have that, you can fix the gain of your sensor to the lowest value to minimize noise and swipe the exposure time ( $$t$$ ) in small $$\Delta t$$ steps from the minimum value until the image becomes fully or mostly overexposed — the same way we’ve done in our simulated datasets.

import os
from skimage import io
from skimage.measure import block_reduce

def load_dataset(path: str, with_downsample: bool = True,
block_size: Tuple[int, int] = (10, 10)):
"""Load dataset from the specified path

Subsambling the image size in order reduce the execution time
of the (unoptimized) example algorithms

path              -- path to the dataset
with_downsample   -- enable/disable downsampling
block_size        -- block size that gets sampled down to 1 pixel
"""
image_folder = os.path.join(path, "images")
exposure_file = os.path.join(path, "times.txt")
gt_file = os.path.join(path, "pcalib.txt")

exposures, images = [], []
shape_original, shape_output = None, None

def downsample(image):
"""Do max sampling in order to avoid bias from overexposed values"""
return block_reduce(image, block_size=block_size, func=np.max, cval=0)

with open(exposure_file, "r") as f:
# Parse meta information and append to output lists
[idx, ts, exposure_ms] = line.strip().split(" ")
exposures.append(float(exposure_ms) * 1e-3)

image_folder, idx + ".jpg"), as_gray=True)
image_output = downsample(image_original) \
if with_downsample else image_original
images.append(image_output.flatten())

if shape_output is None:
shape_original = image_original.shape
shape_output = image_output.shape

print("Original size: " + str(shape_original) +
" Output size: " + str(shape_output))

ground_truth = None
with open(gt_file, 'r') as f:
ground_truth = [float(i) for i in f.readline().strip().split()]

return exposures, images, shape_output, ground_truth


Let’s load the (downsampled) linear response dataset:

exposures, images, shape, ground_truth = load_dataset(
path="data/calib_narrow_sweep3")


Original size: (1024, 1280) Output size: (103, 128)

util.display_images(images[::200], shape=shape, annot=False)


I’m going to use the calibration results provided with the Monocular Visual Odometry Dataset as a reference ground-truth.

Let’s attempt to calibrate:

U_list, E_list = recover_U_and_E(exposures, images, iterations=3);
util.plot_multiple_inverse_crf(U_list)
util.display_images([normalize_255(np.log(E + 1)) for E in E_list],
shape=shape, annot=False)


👉
Great, in just 3 iterations our algorithm converges to the correct solution!

Let’s try it on the gamma encoded dataset as well.

exposures, images, shape, ground_truth = load_dataset(
path="data/calib_narrowGamma_sweep3")


Original size: (1024, 1280) Output size: (103, 128)

util.display_images(images[::200], shape=shape, annot=False)


U_list, E_list = recover_U_and_E(exposures, images, iterations=3);
util.plot_multiple_inverse_crf(U_list)
util.display_images([normalize_255(np.log(E + 1)) for E in E_list],
shape=shape, annot=False)


It is amazing to see how well it works even with 1/10 of the available information after subsampling the images from their original size of (1024, 1280) to (103, 128).

💡
Since we haven’t enforced any constraints on the resulting (inverse) response function, we can calibrate for any non-linearity.
💡
An analogous approach can be used to calibrate RGB images. The process can be repeated for each of the R, G, B channels separately.

# Final Words 🏁#

Although simple in its concept, the camera response function lies at the intersection of sensor physics and digital image processing. This makes it a fascinating topic to explore in-depth as it can lead our curiosity to a variety of unexplored paths.

We’ve barely scratched the surface of camera sensor physics, simulation, noise models and applications of the CRF. This article should hopefully provide a convenient starting point for self-learning and further research.

💡
Have a robotics topic in mind that is inadequately covered by existing learning resources? Feel free to reach out to me on 🐦 twitter and it might be next. ;-)