llama.cpp: don't sleep on --split-mode tensor
llama.cpp: don't sleep on --split-mode tensor
llama.cpp: don't sleep on --split-mode tensor
Gemma 4 is here
Smaller qwen3.5 models released
Qwen3-Coder-Next
Relevance of GPU driver version for inference performance
Magistral-Small-2509 by Mistral has been released
Qwen3-Next with 80b-a3b parameters is out
ExLlamaV3 adds tensor parallelism support
New, promising MoE model "Hunyuan" by Tencent
Do you quantize models yourself?
Well, that's offending
Any experience with Pangolin?
More than 140 Kenya Facebook moderators diagnosed with severe PTSD
Don't forget to ...
Chaining routers and GUA IPv6 addresses
USA to be renamed to XXX
Modern online banking
Any of you have a self-hosted AI "hub"? (e.g. for LLM, stable-diffusion, ...)
I wonder how much storage comes with this driving school
Migrated my self-hosted Nextcloud to AIO and I absolutely love it
A lot has been said, but to add to the list I'd say it gives them access to quite a large pool of free testers.
LLM architectures and optimization techniques change rapidly and by releasing open-weight models a lot of enthusiasts will evaluate new models for free, help implement support in inference engines, catch bugs etc. (and in turn, ofc, get a new model to run for free, so it's at least somewhat symbiotic).
We have at least seen this quite obviously when Alibaba released Qwen3-Next, which was a somewhat undertrained but still useful model which introduced the architecture that their latest models now use "in production" (also their paid "Max" models).