Can someone help me with insights about large context models? Are there relation...

Can someone help me with insights about large context models? Are there relationships that pop up at the beginning and end of long context windows that don't transitively follow from intermediate points? Is there value in the training over these longer windows vs using the more basic/closer weight distributions over different sliding windows?