We understand reinforcement learning but not how to properly implement it.
In the real world when you reward a pet for doing a trick on command there's a time gap between the command you gave it, the action it performed and you giving it a reward. That gap introduces the potential for false associations, this isn't so bad when the gap is merely a few seconds but as that gap increases the number of potential false associations increases exponentially. After a few minutes the amount of processing power needed to sort the metaphorical wheat from the chaff becomes impractical, at least not without resorting to various tricks that get you to the answer quicker at the expense of some mental flexibility. If the AI dog is specifically programmed to log commands and performed actions (or sequences of actions) and to compare them to when it receives rewards it'll come to the right association (for a given degree of "right", that's another problem) but it's now limited to that paradigm of thought and will have a harder time finding the right association for a situation that doesn't fit the paradigm.
I think we know all the high level and foundational stuff we need but there's this middle layer of metacognition that just seems to get bigger the more you get into it like some vast abyss between an ocean's floor and surface.
To an AI researcher the human brain is like a modern passenger jet to the Wright brothers, they get the basic concepts of combustion engines and aerodynamics but the jet uses that knowledge on a whole different level.
Vission segments the main cortex vision into other areas like spatial and object and color at least 20 pathways that lead to 40 more. Then the frontal lobes send back signals and send pathways internal to the frontal lobes themselves. I mean a 5-year-old does not learn collage chemistry until 23 years old. Reienformant should build up layers of pathways in memory (part of the main pathways) that allow rotation of possibility space to connect cause and effect. It is like how all my knowledge in language allows me to type my understanding of intelligence bit by bit. I have all branches that fold back onto themselves trying to connect what is in front of me to what I predict will happen. If I keep in mind that what happens now leads to future events, I must have many pathways that loop back to the same way language does or vision does for physics. The sim brain my actively be creating future events well-maintaining past events. I mean the frontal lobes must actively be trying to send signals to the visual lobes to have multiple ideas of what happens next confirming or denying.
Long-term connecting with reward and maintaining a goal should really be about extending the learning by increments not all at once leaps. Design a goal achievement system that as it machines more goals it can achieve longer goals delaying reward because it learns how to build up long-term chunks of learning segments as an expert at what it is learning.