KV cache

KV cache trades space for time

Without KV cache: when generating token N, we need

With KV cache:

Optimization techniques: