My personal collection of interesting models I've quantized from the past week (yes, just week)
My personal collection of interesting models I've quantized from the past week (yes, just week)
My personal collection of interesting models I've quantized from the past week (yes, just week)
itsme2417/PolyMind: A multimodal, function calling powered LLM webui.
Introducing Nomic Embed: A Truly Open Embedding Model
InternLM2 models llama-fied
WizardLM/WizardCoder-33B-V1.1 released!
Microsoft announces WaveCoder
Mixture of Experts Explained (Huggingface blog)
Mistral releases version 0.2 of their 7B model
Mistral drops a new magnet download
Orca 2: Teaching Small Language Models How to Reason
Hundreds of OpenAI employees threaten to resign and join Microsoft
Catch me if you can! How to beat GPT-4 with a 13B model | LMSYS Org
TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B
ExUI - a lightweight web UI for ExLlamaV2 by turboderp
Phind V7 subjectively performing at GPT4 levels for coding
Min P sampler (an alternative to Top K/Top P) has been merged into llama.cpp
HUGE dataset released for open source use
I've started uploading quants of exllama v2 models, taking requests
Text Generation Web-UI has been updated to CUDA 12.1, and with it new docker images are needed
Single Digit tokenization improves LLM math abilities by up to 70x
You can get the resulting PPL but that's only gonna get you a sanity check at best, an ideal world would have something like lmsys' chat arena and could compare unquantized vs quantized but that doesn't yet exist