This is an introductory task where we fit a Multilayer Perceptron (MLP) network to a 2D image.
I created a network according to the following architecture.
The most notable feature of this network is the Positional Encoding layer, using which we
expand the dimensionality of the input coordinates. This has many benefits - for example, it helps the model better capture the high-frequency details in the original image.

Here's the result of training the network on the provided demo image.

By the time it reached 500 iterations, the model already learns the input image well.

Here we demonstrate the result on another image of the Painted Ladies in SF.

For this picture, the set of hyperparameters chosen were num_iterations=3000, batch_size=10000, max_freq_L=10.

2. Fit a Neural Radiance Field from Multi-view Images

In this part, we reconstruct the Lego scene from the original NeRF paper. I first implemented functions that recover help us recover the relationships of different spaces.
1. x_w = transform(c2w, x_c)
This function helps convert the camera space coordinates to the world space, using the camera-to-world (c2w) transformation matrix. c2w is the inverse of world-to-camera (w2c), represented as

2. pixel_to_camera(K, uv, s)
This function transforms a point from the pixel coordinate system back to the camera coordinate system.
3. pixel_to_ray(K, uv, s)
This function converts a pixel coordinate to a ray with origin and normalized direction.
In addition, I implemented a DataLoader class that supports sampling rays from images, as well as sampling points along rays.
We can visualize the samples:
Next, I implemented the neural architecture as described in the following graph.
I also coded the discrete approximation of the volumn rendering equation:
\[
\begin{align} C(\mathbf{r})=\int_{t_n}^{t_f} T(t) \sigma(\mathbf{r}(t))
\mathbf{c}(\mathbf{r}(t), \mathbf{d}) d t, \text { where } T(t)=\exp \left(-\int_{t_n}^t \sigma(\mathbf{r}(s)) d s\right)
\end{align}
\]
\[
\begin{align}
\hat{C}(\mathbf{r})=\sum_{i=1}^N T_i\left(1-\exp \left(-\sigma_i \delta_i\right)\right) \mathbf{c}_i, \text { where } T_i=\exp
\left(-\sum_{j=1}^{i-1} \sigma_j \delta_j\right) \end{align}
\]
Here are the PSNR results from the trained model:
Because it has been trained on a relatively small amount of time (the longer the training time, the better the PSNR), the result is somewhat blurry, but we are still able to see the lego technic tractor.