Pre-implemented Rewards

This gym comes with a series of pre-implemented reward functions that are detailed in the documentation here. See this page for implementing your own custom rewards either in the gym or using an outside function as in the tutorial doc.

Normalized Reward

Tuned 1D Custom Reward

This implements the reward as used in the benchmark paper which is defined as:

\begin{eqnarray} Reward(t) = \begin{cases} \text{truncate_penalty}*(T-t) & \text{truncate}=\text{True} \\ \text{terminate_reward} - \sum_{i=0}^{nt}|u(-1, i)| / 1000 - \|u(x, T)\|_{L_2} & \text{terminate}=\text{True} \\ & \text{and} \\ & \|u(x, T)\|_{L_2} < 20 \\ \|u(x, t-dt*1/\text{control_sample_rate}\|_{L_2} - \|u(x, t)\|_{L_2} & \text{Otherwise} \end{cases} \end{eqnarray}

where \(u(x, t)\) is the solution vector, T is final simulation time, control_sample_rate is the rate at which the controller is resampled (Default=0.01), and \(\|u(x, t)\|_{L_2}\) represents the \(L_2\) norm at time \(t\) over \(x\).

Glioblastoma 1D PDE (Brain Tumor) Reward

This implements the reward used in the Glioblastoma 1D PDE environment.

NS Reward

This implements the reward to track the reference trajectory as well as minimizing the control action loss which is defined as:

\begin{eqnarray} Reward(t) = -\frac{1}{2} \|s' - s_{ref}\|^2 - \frac{\gamma}{2} \| a - a_{ref}\|^2 \end{eqnarray}

where \(\gamma\) is the coefficient for the control cost.