For OpenAI Gym Users#
SOCKS and OpenAI gym have a lot in common, but SOCKS has some key differences which are different to how gym does things.
In this guide, we go over some of these differences and help get you started with SOCKS.
1. gym.Env vs. DynamicalSystem#
An environment in gym (gym.Env) is a class with a step(), reset(),
render(), and close() method. A
DynamicalSystem inherits from
gym.Env, which means it has all of the same features.
However, gym environments are designed to encapsulate every part of the runtime loop, including computing the cost, determining simulation termination, and tracking and simulating external “world” objects such as obstacles. This can decrease transparency to the designer, and limits modularity in the simulation design process.
In SOCKS, this functionality is handled externally from the environment class. Calculating cost, simulating obstacles, determining the conditions for ending simulations, etc. are left to the designer.
In addition, there are a few changes to the class interface and the available attributes.
dynamics()#
The step() function for
dynamical systems generally does not need to be overloaded when defining custom
environments. The critical component of a dynamical system is the system dynamics. We
define a new method,
dynamics(), that is used
internally by an initial value problem (IVP) solver in the
step() function. This means we
only need to define a custom
dynamics() method when
subclassing a DynamicalSystem.
state_space#
In gym, an environment nominally has only an
action_space and an
observation_space. The
reason for omitting a state space is unclear, since this is an essential component of a
POMDP definition, which is generally what the gym.Env model is based on.
In SOCKS, we correct this by adding a
state_space attribute to the
class. The action_space
remains unchanged, but the actual underlying state of the system is an element of the
state_space, and the output
of the step() function is an
element of the
observation_space.
generate_observation()#
Using the underlying mathematical theory, an observation is generated by an observation
function, that takes the current state, action, and a random noise variable
representing measurement noise (which is different from process noise), and outputs an
observation. In the DynamicalSystem class,
we separate this out into its own method.
Classes inheriting from DynamicalSystem
should override this method in order to include measurement noise or limit the state
variables that are observable. By default, this function simply returns the true
underlying state of the system (meaning the system is fully observable).
generate_disturbance()#
We also separate the disturbance from the
step() and
dynamics() functions. In
SOCKS, we generate the disturbance independently in its own method.
Classes inheriting from DynamicalSystem
should override this method to specify the type of disturbance which affects the
dynamical evolution of the system, for instance to increase or decrease the process
noise that affects the state or change the noise distribution.
2. Policies#
In SOCKS, we typically represent the policy controlling a system as a separate object or function. During simulation, the policy is a function that returns a control action (and may depend on the simulation time and/or the system state).
SOCKS defines a class, BasePolicy, that has a
simple Callable interface, meaning it requires that the policy implement a
__call__() method.
This class is used by some of the algorithms, but is generally useful more for code organization and to provide a consistent interface, rather than enforcing a strict simulation scheme.
3. Sampling#
A core component of SOCKS is the ability to generate a finite sample (a collection of
observations) from a system. In gym, the simulation is typically handled via a for
loop, which also includes the code used to collect observations and train the policy.
In SOCKS, we separate this process into a sampling phase and a separate training phase. The sampling phase consists of collecting a set of observations of the system transitions, and is handled by the various sampling functions implemented in SOCKS. The training phase is then typically handled via the algorithm, which uses the sample to compute a policy.
Stable Baselines#
The algorithms in stable-baselines3 are designed to be used specifically in reinforcement learning training loops.
SOCKS, on the other hand, is geared toward solving stochastic optimal control problems
(it can be argued that RL is a special case of stochastic optimal control). With this in
mind, we have emphasized modularity in our design, and have abstracted many components
of the training and simulation loop away from the gym.Env class.
If you are already familiar with stable-baselines3, then it may be helpful to think
of SOCKS as a set of algorithms based in kernel methods for solving similar problems.