For OpenAI Gym Users#

SOCKS and OpenAI gym have a lot in common, but SOCKS has some key differences which are different to how gym does things.

In this guide, we go over some of these differences and help get you started with SOCKS.

1. `gym.Env` vs. `DynamicalSystem`#

An environment in gym (gym.Env) is a class with a step(), reset(), render(), and close() method. A DynamicalSystem inherits from gym.Env, which means it has all of the same features.

However, gym environments are designed to encapsulate every part of the runtime loop, including computing the cost, determining simulation termination, and tracking and simulating external “world” objects such as obstacles. This can decrease transparency to the designer, and limits modularity in the simulation design process.

In SOCKS, this functionality is handled externally from the environment class. Calculating cost, simulating obstacles, determining the conditions for ending simulations, etc. are left to the designer.

In addition, there are a few changes to the class interface and the available attributes.

`dynamics()`#

The step() function for dynamical systems generally does not need to be overloaded when defining custom environments. The critical component of a dynamical system is the system dynamics. We define a new method, dynamics(), that is used internally by an initial value problem (IVP) solver in the step() function. This means we only need to define a custom dynamics() method when subclassing a DynamicalSystem.

`state_space`#

In gym, an environment nominally has only an action_space and an observation_space. The reason for omitting a state space is unclear, since this is an essential component of a POMDP definition, which is generally what the gym.Env model is based on.

In SOCKS, we correct this by adding a state_space attribute to the class. The action_space remains unchanged, but the actual underlying state of the system is an element of the state_space, and the output of the step() function is an element of the observation_space.

`generate_observation()`#

Using the underlying mathematical theory, an observation is generated by an observation function, that takes the current state, action, and a random noise variable representing measurement noise (which is different from process noise), and outputs an observation. In the DynamicalSystem class, we separate this out into its own method.

Classes inheriting from DynamicalSystem should override this method in order to include measurement noise or limit the state variables that are observable. By default, this function simply returns the true underlying state of the system (meaning the system is fully observable).

`generate_disturbance()`#

We also separate the disturbance from the step() and dynamics() functions. In SOCKS, we generate the disturbance independently in its own method.

Classes inheriting from DynamicalSystem should override this method to specify the type of disturbance which affects the dynamical evolution of the system, for instance to increase or decrease the process noise that affects the state or change the noise distribution.

2. Policies#

In SOCKS, we typically represent the policy controlling a system as a separate object or function. During simulation, the policy is a function that returns a control action (and may depend on the simulation time and/or the system state).

SOCKS defines a class, BasePolicy, that has a simple Callable interface, meaning it requires that the policy implement a __call__() method.

This class is used by some of the algorithms, but is generally useful more for code organization and to provide a consistent interface, rather than enforcing a strict simulation scheme.

3. Sampling#

A core component of SOCKS is the ability to generate a finite sample (a collection of observations) from a system. In gym, the simulation is typically handled via a for loop, which also includes the code used to collect observations and train the policy.

In SOCKS, we separate this process into a sampling phase and a separate training phase. The sampling phase consists of collecting a set of observations of the system transitions, and is handled by the various sampling functions implemented in SOCKS. The training phase is then typically handled via the algorithm, which uses the sample to compute a policy.

Stable Baselines#

The algorithms in stable-baselines3 are designed to be used specifically in reinforcement learning training loops.

SOCKS, on the other hand, is geared toward solving stochastic optimal control problems (it can be argued that RL is a special case of stochastic optimal control). With this in mind, we have emphasized modularity in our design, and have abstracted many components of the training and simulation loop away from the gym.Env class.

If you are already familiar with stable-baselines3, then it may be helpful to think of SOCKS as a set of algorithms based in kernel methods for solving similar problems.

For OpenAI Gym Users#

1. gym.Env vs. DynamicalSystem#

dynamics()#

state_space#

generate_observation()#

generate_disturbance()#