Can Reinforcement Learning generalize beyond its training?
This paper explores the ability of a model trained with reinforcement learning (RL) to generalize, i.e., produce acceptable results when presented with data it was not exposed to during training. The application in this study is an industrial process with multiple controls that determine the effect on a product as it transitions through the process. Determining optimal control settings in this environment can be challenging. For example, when there are interactions between the controls, adjusting one setting can require the readjustment of other settings. Also, a complex relationship between a control and its effect complicates finding an optimal solution. The results presented here show that a model trained by an RL process performs well in this environment. Further, with proper definitions of the state and reward functions in the RL process, the trained model is able to generalize to conditions different from those used for training.