What makes a language model large?