帖文详情
@_akhaliq@x.good.news
Beyond Scaling Laws
Understanding Transformer Performance with Associative Memory
Increasing the size of a Transformer model does not always lead to enhanced performance. This phenomenon cannot be explained by the empirical scaling laws. Furthermore, improved generalization