Attention mechanisms
Attention Is All You Need ... but scarcity of attention drives Internet world
The basic idea is to read the input structure twice: once to encode the gist and another time (at each step while decoding) to "pay attention" to certain details.