On metrics -- Advanced Notebook¤
This will get mathematical, be warned!
⚠️ ⚠️ ⚠️ ⚠️ ⚠️ This notebook is a WIP, it will come with future release of Exponax ⚠️ ⚠️ ⚠️ ⚠️ ⚠️
At the moment it is a dump of ideas on metric consistency with functional norms, connection to Parseval's theorem, and the relation to the Fourier transform.
import jax
import jax.numpy as jnp
import exponax as ex
Consistency of the metrics computation¤
The discretized states in exponax \(u_h \in \mathbb{R}^{C \times N}\) represent continuous functions sampled at an equidistant interval \(\Delta x = L/N\) where \(L\) is the length of the domain and \(N\) is the number of discretization points. Since we only work with periodic boundary conditions, we employ the convention that the left point of the domain is considered a degree of freedom and the right point is not. Hence, \(u_0\) refers to the value of the continuous function at \(u(0)\) and \(u_{N-1}\) refers to the value of the continuous function at \(u(\frac{L}{N} (N-1))\).
Now assume, we wanted to compute the squared \(L^2\) norm of the function \(u(x)\) over the domain \(\Omega = (0, L)\)
A way to numerically approximate any integral with points given at equidistant samples is via the trapezoidal rule. Assume we wanted to evaluate the following integral
The trapezoidal rule states that
where \(\Delta x = L/(M-1)\) is the distance between two consecutive points. In contrast to our discretization on periodic grids, the trapezoidal rule also accounts for the point on the right end of the domain. However, since the right end of the domain must be equal to the value on the left end of the domain, we have that \(f(0) = f(L)\) and the trapezoidal rule simplifies to
Or if we had \(f(x)\) discretized as \(f_h \in \mathbb{R}^N\) where \(f_h = f(i \Delta x)\) with the periodic convention, we get
or expressed in terms of \(L\) and \(N\)
This is exactly as scaled mean
Since we actually wanted to evaluate the integral over the square absolute function, we have that
or again in terms of the mean
Taking the mean over the element-wise squared is nothing else than the MSE (mean squared error)
Hence, the consistent counterpart to the squared (functional) \(L^2\) norm is the scaled MSE.
For the regular \(L^2\) norm, we have that
As such, we get a consistent counterpart
(TODO: check this). Roughly, we can say that
Adn we can identify the RMSE as the consistent counterpart to the \(L^2\) norm.
It is scaled by the square root of the length of the domain.
Requirements¤
The quadratic convergence on the MSE is only valid if the function is at least twice continuously differentiable. In order to be that it must be that it also periodic. In such a case, the estimate (might) even converges exponentially (https://en.wikipedia.org/wiki/Trapezoidal_rule#Periodic_and_peak_functions) fast!
As a consequence, a bandlimited discrete function representation (might) not evem have a discretization error at all!
On the other hand, if the function is not periodic, the estimate likely not converges quadratically. It converges linearly (https://en.wikipedia.org/wiki/Riemann_sum#Left_rule) if it is continuous which is guaranteed by the periodicity assumption.
Conclusion¤
Assuming we are on the periodic domain, we have:
- A bandlimited function is exactly integrated
- A non-bandlimited, but periodically continuous function converges exponentially linear (similar to how the spectral derivative converges)
- A discontinuous function converges linearly
Due the special case how periodic grids are layed out, we will never have the case of quadratic convergence.
Mean-Average Error (MAE)¤
Is consistent with the L1 norm???
Higher dimensions¤
In higher dimensions with a domain \(\Omega = (0, L)^D\) with \(D\) being the number of spatial dimensions with the same convention for periodic boundary conditions, we have that
Assuming the \(\text{mean}\) function takes the mean over the flattened axes with \(N^D\) elements, we have that
or in terms of the MSE
Correspondingly, the RMSE is the consistent counterpart to the \(L^2\) norm in
Multiple Channels¤
If the underlying function is a vector-valued function \(u(x) \in \mathbb{R}^C\), we can compute the \(L^2\) norm of the function as
Hence, the consistent MSE reads
with the inner product being understand only over the leading channel axis.
Differences between \(p_2\), \(l_2\), and \(L_2\) norms and their relation to commonly used metrics¤
https://mathworld.wolfram.com/L2-Norm.html
Parseval's Identity: Spatial and Fourier aggregator¤
Conceptually both do the same, but the Fourier aggregator can do more it that it also allows filtering and taking derivatives. The latter gives rise to Sobolev-based losses.
However, they are only identical if the function is bandlimited.