avatar

Licensed for commercial use (unlike LLaMA). Trained on a large amount of data (1T tokens like LLaMA vs. 300B for Pythia, 300B for OpenLLaMA, and 800B for StableLM). Prepared to handle extremely long inputs thanks to ALiBi (we trained on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models). Optimized for fast training and inference (via FlashAttention and FasterTransformer) Equipped with highly efficient open-source training code.

Login to comment.