Glimpse PoW

NerFs

Pytorch implemntation from scratch

Introduction

Neural Radiance Fields are a way of storing a 3D scene within a neural network. This way of storing and representing a scene is often called an implicit representation, since the scene parameters are fully represented by the underlying Multi-Layer Perceptron (MLP). (As compared to an explicit representation that stores scene parameters like colour or density explicitly in voxel grids.) This novel way of representing a scene showed very impressive results in the task of novel view synthesis, the task of interpolating novel views from camera perspectives that are not in the training set. Furthermore, it allows us to store large scenes with a smaller memory footprint than explicit representation, since we merely need to store the weights of our neural network compared to voxel grids, which increase in memory size by a cubic term.

1) March camera rays through the scene to generate a sampled set of 3D points 2) Use those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities 3) Use classical volume rendering techniques to accumulate those colors and densities into a 2D image

Because this process is naturally differentiable, we can use gradient descent to optimize this model by minimizing the error between each observed image and the corresponding views rendered from our representation. Minimizing this error across multiple views encourages the network to predict a coherent model of the scene by assigning high volume densities and accurate colors to the locations that contain the true underlying scene content

Neural Radiance Field Scene Representation

– An approach for representing continuous scenes with complex geometry and materials as 5D neural radiance fields, parameterized as basic MLP networks.
– A differentiable rendering procedure based on classical volume rendering techniques, which we use to optimize these representations from standard RGB images. This includes a hierarchical sampling strategy to allocate the MLP’s capacity towards space with visible scene content.
– A positional encoding to map each input 5D coordinate into a higher dimensional space, which enables us to successfully optimize neural radiance fields to represent high-frequency scene content. alt text

Volume rendering requires a differential probability of a ray terminating at an infinitesimal particle at location x.

Loss Formulation

As the discretized volumetric rendeing equation is fully differentiable, the weights of the underlying neural network can then be trained using a reconstruction loss on the rendered pixels.

Quadrature

(To be studied)

Optimizing a Neural radiance Field

1) Positional Encoding

Mapping the inputs to a higher dimensional space using high frequency functions before passing them to the network enables better fitting of data that contains high frequency variation. A high frquency sinoisoidal function is used to encode the 3D coordinates.
Reformulating FΘ as a composition of two functions FΘ = F 0 Θ ◦ γ, one learned and one not

2) Hierarchical Volume Sampling

Instead of just using a single network to represent the scene, they simultaneously optimize two networks: one “coarse” and one “fine”.

References

https://huggingface.co/learn/computer-vision-course/en/unit8/nerf
https://arxiv.org/pdf/2003.08934 \

Other Interesting Papers