`kernel_control_fwd`#

Forward in time stochastic optimal control.

The policy is specified as a sequence of stochastic kernels \(\pi = \lbrace \pi_{0}, \pi_{1}, \ldots, \pi_{N-1} \rbrace\). At each time step, the problem seeks to solve a constrained optimization problem.

(1)#\[\begin{split}\min_{\pi_{t}} \quad & \int_{\mathcal{U}} \int_{\mathcal{X}} f_{0}(y, u) Q(\mathrm{d} y \mid x, u) \pi_{t}(\mathrm{d} u \mid x) \\ \textnormal{s.t.} \quad & \int_{\mathcal{U}} \int_{\mathcal{X}} f_{i}(y, u) Q(\mathrm{d} y \mid x, u) \pi_{t}(\mathrm{d} u \mid x), i = 1, \ldots, m\end{split}\]

Using kernel embeddings of disrtibutions, assuming the cost and constraint functions \(f_{0}, \ldots, f_{m}\) are in an RKHS, the integral with respect to the stochastic kernel \(Q\) and the policy \(\pi_{t}\) can be approximated by an inner product, i.e. \(\int_{\mathcal{X}} f_{0}(y) Q(\mathrm{d} y \mid x, u) \approx \langle f_{0}, \hat{m}(x, u) \rangle\). We use this to construct an approximate problem to (1) and solve for a policy represented as an element in an RKHS.

\[p_{t}(x) = \sum_{i=1}^{P} \gamma_{i}(x) k(\tilde{u}_{i}, \cdot)\]

The approximate problem is a linear program (LP), and can be solved efficiently using standard optimization solvers.

Note

See examples.benchmark_tracking_problem for a complete example.

class gym_socks.algorithms.control.kernel_control_fwd.KernelControlFwd(cost_fn=None, constraint_fn=None, heuristic=False, regularization_param=None, kernel_fn=None, verbose=True, *args, **kwargs)[source]#

Stochastic optimal control policy forward in time.

Computes the optimal control action at each time step in a greedy fashion. In other words, at each time step, the policy optimizes the cost function from the current state. It does not “look ahead” in time.

Parameters

cost_fn – The cost function. Should return a real value.
constraint_fn – The constraint function. Should return a real value.
heuristic (bool) – Whether to use the heuristic solution instead of solving the LP.
regularization_param (float) – Regularization prameter for the regularized least-squares problem used to construct the approximation.
kernel_fn – The kernel function used by the algorithm.
verbose (bool) – Whether the algorithm should print verbose output.

__call__(time=0, state=None, *args, **kwargs)[source]#

Evaluate the policy.

Returns: An action in the action space.

train(S, A)[source]#

Train the algorithm.

Parameters

S (numpy.ndarray) – Sample taken iid from the system evolution.
A (numpy.ndarray) – Collection of admissible control actions.

Returns

An instance of the KernelControlFwd algorithm class.

gym_socks.algorithms.control.kernel_control_fwd.kernel_control_fwd(S, A, cost_fn=None, constraint_fn=None, heuristic=False, regularization_param=None, kernel_fn=None, verbose=True)[source]#

Stochastic optimal control policy forward in time.

Parameters

S (numpy.ndarray) – Sample taken iid from the system evolution.
A (numpy.ndarray) – Collection of admissible control actions.
cost_fn – The cost function. Should return a real value.
constraint_fn – The constraint function. Should return a real value.
heuristic (bool) – Whether to use the heuristic solution instead of solving the LP.
regularization_param (Optional[float]) – Regularization prameter for the regularized least-squares problem used to construct the approximation.
kernel_fn – The kernel function used by the algorithm.
verbose (bool) – Whether the algorithm should print verbose output.

Returns

The policy.

TO21: Adam J. Thorpe and Meeko M. K. Oishi. Stochastic optimal control via hilbert space embeddings of distributions. In 2021 60th IEEE Conference on Decision and Control (CDC), volume, 904–911. 2021. doi:10.1109/CDC45484.2021.9682801.

@inproceedings{thorpe2021stochastic,
    author    = {Thorpe, Adam J. and Oishi, Meeko M. K.},
    booktitle = {2021 Conference on Decision and Control (CDC)},
    title     = {Stochastic Optimal Control via Hilbert Space Embeddings of Distributions},
    year      = {2021},
    volume    = {},
    number    = {},
    pages     = {},
    doi       = {}
}

kernel_control_fwd#

`kernel_control_fwd`#