In a lot of ways, the field has already come to this conclusion. At NeurIPS this year some of the biggest topics in Deep RL were model-based RL and meta-learning for RL, both of which aim to learn a generalized representation of an environment that can be used in a variety of downstream tasks.
Some systems fail to even implement the concept of reward (and punishment) and the agent is not even 'aware' of what is a reward (or a 'punishment'), and so the agent don't even know he is being rewarded (or 'punished') is in the first place. Then the system has to be redesigned to optimize the code.
Sometimes AI is the least straight forward solution, the most expensive and the less efficient in matter of result.