Stochastic calculus
Itô and friends
September 19, 2019 — August 31, 2021
Calculus that works, in a certain sense, for random objects, of certain types. Instrumental in stochastic differential equations. This is a popular and well-explored tool and my notes are not supposed to be tutorials, although they might resemble that sometimes. I simply want to maintain a list of useful definitions because the literature gets messy and ambiguous sometimes.
1 Simple integrals of noise
Some of the complexity of stochastic integrals derives from needing to preserve causality for processes happening in time, in terms of various flavours of filtration etc. For simple stochastic integrals of deterministic functions we can avoid all that. Robert J. Adler, Taylor, and Worsley (2016), for example, constructs a standard one:
We define Gaussian noise \(W\) based on spectral density \(\nu\) as a random process defined on the Borel subsets of \(\mathbb{R}^{N}\) such that, for all \(A, B \in \mathcal{B}^{N}\) with \(\nu(A)\) and \(\nu(B)\) finite, \[ \begin{aligned} &W(A) \sim \mathcal{N}(0, \nu(A)) \\ &A \cap B=\emptyset \Rightarrow W(A \cup B)=W(A)+W(B) \text { a.s. }\\ &A \cap B=\emptyset \Rightarrow W(A) \perp W(B). \end{aligned} \] We can think of \(\nu\) as the measure which allocates signal power to spectrum.
Having defined Gaussian noise, we then define the integral \[ \int_{\mathbb{R}^{N}} \varphi(t) W(d t) \] for deterministic \(\varphi\) with \(\int \varphi^{2}(x) \nu(d x)<\infty\). We do this with the “standard machinery”, i.e. we start with simple functions \[ \varphi(t)=\sum_{1}^{n} a_{i} \mathbb{1}_{A_{i}}(t) \] where \(A_{1}, \ldots, A_{n} \subset \mathbb{R}^{N}\) are disjoint, and the \(a_{i}\) are real, and define \[ W(\varphi) \equiv \int_{\mathbb{R}^{N}} \varphi(t) W(d t)=\sum_{1}^{n} a_{i} W\left(A_{i}\right) \] Taking sums of Gaussian RVs leaves us with a Gaussian RV, so \(W(\varphi)\) has zero mean and variance given by \(\sum a_{i}^{2} \nu\left(A_{i}\right)\). Now think of \(W\) as a mapping from simple functions to random variables. We extend it to all functions square integrable with respect to \(\nu\) by taking limits of approximating functions. For functions that are sufficiently nice (e.g. map Lebesgue-measurable sets to Lebesgue-measurable sets, I think that is sufficient?) we can define the limit appropriately. Further extension to general Lévy noise or processes with non-zero mean is not too complicated.
Defining two simple functions on the same sets, \[ \varphi(t)=\sum_{1}^{n} a_{i} \mathbb{1}_{A_{i}}(t), \quad \psi(t)=\sum_{1}^{n} b_{i} \mathbb{1}_{A_{i}}(t) \] we see that \[ \begin{aligned} \mathbb{E}\{W(\varphi) W(\psi)\} &=\mathbb{E}\left\{\sum_{1}^{n} a_{i} W\left(A_{i}\right) \cdot \sum_{1}^{n} b_{i} W\left(A_{i}\right)\right\} \\ &=\sum_{1}^{n} a_{i} b_{i} \mathbb{E}\left\{\left[W\left(A_{i}\right)\right]^{2}\right\} \\ &=\sum_{1}^{n} a_{i} b_{i} \nu\left(A_{i}\right) \\ &=\int_{\mathbb{R}^{N}} \varphi(t) \psi(t) \nu(d t) \end{aligned} \] Taking the limit, we see that also \[ \mathbb{E}\left\{\int_{\mathbb{R}^{N}} \varphi(t) W(d t) \int_{\mathbb{R}^{N}} \psi(t) W(d t)\right\}=\int_{\mathbb{R}^{N}} \varphi(t) \psi(t) \nu(d t).\]
Grand! Except we want to extend this construction to complex random functions also, in which case the following is natural for a centred one.
\[ \begin{aligned} &\mathbb{E}\{W(A)\}=0\\ &\mathbb{E}\{W(A) \overline{W(A)}\}=\nu(A) \\ &A \cap B=\emptyset \Rightarrow W(A \cup B)=W(A)+W(B) \text { a.s. } \\ &A \cap B=\emptyset \Rightarrow \mathbb{E}\{W(A) \overline{W(B)}\}=0. \end{aligned} \]
If we want to make it Gaussian in particular, say, we need to specify the distributions precisely. We think of a complex Gaussian RV as simply a 2-dimensional Gaussian and all the usual rules apply. Note, because these are complex numbers some of the rules are subtly weird — To fully specify the process we need to decompose that covariance into the joint distribution of real and imaginary parts,
\[ \begin{aligned} &\begin{bmatrix} \mathcal{R}(W(A))\\ \mathcal{I}(W(A))\\ \end{bmatrix} \sim \mathcal{N}\left(0, \begin{bmatrix} \nu_{\mathcal{R}^2}(A) & \nu_{\mathcal{IR}}(A)\\ \end{bmatrix}\right)\\ &\text{ where }\nu_{\mathcal{R}^2}(A) + \nu_{\mathcal{I}^2}(A) = \nu(A)\\ &\text{ and }\nu_{\mathcal{R}^2}(A)\nu_{\mathcal{I}^2}(A) > \nu_{\mathcal{IR}}^2(A) \text{ (positive-definiteness).} \end{aligned} \]
Once again from Robert J. Adler, Taylor, and Worsley (2016), we get the suggestion that we could think of this stochastic integral as an infinitesimal version of the Karhunen-Loève expansion. Suppose the (eigenvalue, eigenfunction) pairs are \(\{(\lambda_{\omega}, e^{i\langle t, \omega\rangle}); \\omega\}\) and that \(\lambda_{\omega} \neq 0\) for only a countable number of \(\omega \in \mathbb{R}^{N}\). Then the stationary, complex version of the Mercer expansion tells us that \[ K(t)=\sum_{\omega} \lambda_{\omega} e^{i\langle t, \omega\rangle} \] while the Karhunen-Loève expansion becomes \[ f(t)=\sum_{\omega} \lambda_{\omega}^{1 / 2} \xi_{\omega} e^{i\langle t, \omega\rangle} \] Fine so far. But if the basis is not countable it gets weird. \[ K(t)=\int_{\mathbb{R}^{N}} \lambda_{\omega} e^{i\langle t, \omega\rangle} d \omega \] and \[ f(t)=\int_{\mathbb{R}^{N}} \lambda_{\omega}^{1 / 2} \xi_{\omega} e^{i\langle t, \omega\rangle} d \omega. \] Everything is well defined in the first of these integrals, but the second is made ill-defined because \(\omega\) now parameterises an uncountable basis; then how can we have the \(\xi_{\omega}\) independent for each \(\omega\)? We cannot as such, but we can get an analogue of that by interpreting the spectral representation as a stochastic integral, writing \[ f(t)=\int_{\mathbb{R}^{N}} e^{i\langle t, \omega\rangle} W(d \omega) \] where \(W\) is Gaussian \(\nu\)-noise with spectral measure defined by \(\nu(d \omega)=\lambda_{\omega} d \omega .\) This now evokes classic signal processing textbook exercises, at least to me.
2 Itô Integral
TBD.
3 Itô’s lemma
Specifically, let $ X=(X^{1}, , X^{n}) $ be a tuple of semimartingales and let $ f: ^{n} $ have continuous second order partial derivatives. Then $ f(X) $ is also a semimartingale and the following formula holds:
\[\begin{aligned} f\left(X_{t}\right) - f\left(X_{0}\right) &= +\sum_{i=1}^{n} \int_{0+}^{t} \frac{\partial f}{\partial x_{i}}\left(X_{s-}\right) \mathrm{d} X_{s}^{i} \\ &\quad +\frac{1}{2} \sum_{1 \leq i, j \leq n} \int_{0+}^{t} \frac{\partial^{2} f}{\partial x_{i} \partial x_{j}}\left(X_{s-}\right) \mathrm{d}\left[X^{i}, X^{j}\right]_{s}^{c} \\ &\quad +\sum_{0<s \leq t}\left(f\left(X_{s}\right)-f\left(X_{s-}\right)-\sum_{i=1}^{n} \frac{\partial f}{\partial x_{i}}\left(X_{s-}\right) \Delta X_{s}^{i}\right) \end{aligned}\]
Here the bracket term is the quadratic variation, \[ [X,Y] := XY-\int X_{s-} \mathrm{d}Y(s)-\int Y_{s-} \mathrm{d}X(s) \]
For a continuous semimartingale, the jump terms are null, and the left limits are equal to the function itself \[\begin{aligned} f\left(X_{t}\right) - f\left(X_{0}\right) &= +\sum_{i=1}^{n} \int_{0+}^{t} \frac{\partial f}{\partial x_{i}}\left(X_{s}\right) \mathrm{d} X_{s}^{i} \\ &+\frac{1}{2} \sum_{1 \leq i, j \leq n} \int_{0+}^{t} \frac{\partial^{2} f}{\partial x_{i} \partial x_{j}}\left(X_{s}\right) \mathrm{d}\left[X^{i}, X^{j}\right]_{s}^{c} \end{aligned}\]
Some authors assume that in Itô calculus the driving noise is not a general semimartingale but a Brownian motion, which is a continuous driving noise. Others use the term Itô calculus to describe a more general version. Sometimes the distinction is made between an Itô diffusion, with a Brownian driving term and a Lévy SDE, which has no implication that the driving term is Brownian. However, this is generally messy and particularly in tutorials by the applied finance people it can be hard to work out which set of definitions they are using.
4 Itô isometry
TBD. For now, see Quadratic Variations and the Ito Isometry – Almost Sure.
5 Stratonovich Integral
Itô integral which can “look forward” infinitesimally in time. Has an alternative justification in terms of rough path.
6 Doss-Sussman transform
Reducing a stochastic integral to a deterministic one, i.e. replacing the Itô integral with a Lebesgue one. (Sussmann 1978; Karatzas and Ruf 2016).
Assumptions: \(\sigma \in C^{1,1}(\mathbb{R}), \sigma, \sigma^{\prime} \in L^{\infty}, b \in C^{0,1}\) \[ \mathrm{d} X(t)=b(X(s)) \mathrm{d} s+\frac{1}{2} \sigma(X(s)) \sigma^{prime}(X(s)) \mathrm{d} s+\sigma(X(s)) \mathrm{d} W_{s} \] has a unique (strong) solution \(X=u(W, Y)\) for some \(u \in C^{2}(\mathbb{R})\) and \[ \mathrm{d} Y(t)=f(W(t), Y(t)) \mathrm{d} t \] for some \(f \in C^{0,1}\).
See also Rogers and Williams (1987) section V.28.
Question: does this connect with rough path approaches?
7 Rough path integrals
See rough path.
8 Paley-Wiener integral
There is a narrower, but lazier, version of the Itô integral. Jonathan Mattingly introduces it in Paley-Wiener-Zygmund Integral Assuming \(f\) continuous with continuous first derivative and \(f(1)\)=0.
We define the stochastic integral \(\int_{0}^{1} f(t) * \mathrm{d} W(t)\) for these functions by the standard Riemann integral, \[ \int_{0}^{1} f(t) * \mathrm{d} W(t)=-\int_{0}^{1} f^{\prime}(t) W(t) \mathrm{d} t \] Then \[ \mathbf{E}\left[\left(\int_{0}^{1} f(t) * \mathrm{d} W(t)\right)^{2}\right]=\int_{0}^{1} f^{2}(t) \mathrm{d} t. \] Paley, Wiener, and Zygmund then used this isometry to extend the integral to \(f\in L^{2}[0,1]\) as the limit of approximating continuous functions.
What does this get us in terms of SDEs?