Skip Navigation

noneabove1182

@ noneabove1182 @sh.itjust.works

Posts

94
Comments

150
Joined

3 yr. ago

Moderating

LocalLLaMA sh.itjust.works

2y ago

My personal collection of interesting models I've quantized from the past week (yes, just week)
Jump
noneabove1182 @sh.itjust.works 2y ago
You can get the resulting PPL but that's only gonna get you a sanity check at best, an ideal world would have something like lmsys' chat arena and could compare unquantized vs quantized but that doesn't yet exist

LocalLLaMA @sh.itjust.works

noneabove1182 @sh.itjust.works

2y ago

My personal collection of interesting models I've quantized from the past week (yes, just week)

twitter.com /bartowski1182/status/1763286093334548677

2y ago

Polestar Phone spotted on Google Play's list of supported devices

Jump

noneabove1182 @sh.itjust.works 2y ago

Interesting, hadn't heard of it before today, but guess I don't look at European car brands that often anyways

2y ago

Polestar Phone spotted on Google Play's list of supported devices

Jump

noneabove1182 @sh.itjust.works 2y ago

Ah I mean fair enough :) I don't keep up much with car brands and ownerships, but still TIL haha

2y ago

Polestar Phone spotted on Google Play's list of supported devices

Jump

noneabove1182 @sh.itjust.works 2y ago

Huh, didn't realize Volvo was primarily owned by a Chinese company, you got me there lol, genuinely always thought they were standalone and therefore a Swedish company

2y ago

Code Llama 70b on text-generation-webui?

Jump

noneabove1182 @sh.itjust.works 2y ago

If you're using text generation webui there's a bug where if your max new tokens is equal to your prompt truncation length it will remove all input and therefore just generate nonsense since there's no prompt

Reduce your max new tokens and your prompt should actually get passed to the backend. This is more noticable in models with only 4k context (since a lot of people default max new tokens to 4k)

2y ago

OpenAI’s Got 9.9 Problems, and Twitch Ain’t One

Jump

noneabove1182 @sh.itjust.works 2y ago

I don't understand the title, twitch isn't mentioned anywhere in the article is it??

2y ago

Polestar Phone spotted on Google Play's list of supported devices

Jump

noneabove1182 @sh.itjust.works 2y ago

Colour me intrigued. I want more manufactures that go against the norm. If they put out a generic slab with normal specs at an expected price, I won't be very interested, but if they do something cool I'm all for it

Except I just noticed the part where it's developed by Meizu so nevermind probably will be a generic Chinese phone

2y ago

Meet ‘Smaug-72B’: The new king of open-source AI

Jump

noneabove1182 @sh.itjust.works 2y ago

Stop making me want to buy more graphics cards...

Seriously though this is an impressive result, "beating" gpt3.5 is a huge milestone and I love that we're continuing the trend. Will need to try out a quant of this to see how it does in real world usage. Hope it gets added to the lmsys arena!

LocalLLaMA @sh.itjust.works

noneabove1182 @sh.itjust.works

2y ago

itsme2417/PolyMind: A multimodal, function calling powered LLM webui.

github.com /itsme2417/PolyMind

LocalLLaMA @sh.itjust.works

noneabove1182 @sh.itjust.works

2y ago

Introducing Nomic Embed: A Truly Open Embedding Model

blog.nomic.ai /posts/nomic-embed-text-v1

2y ago

WizardLM/WizardCoder-33B-V1.1 released!

Jump

noneabove1182 @sh.itjust.works 2y ago

If you go for it and need any help lemme know I've had good results with Linux and Nvidia lately :)

2y ago

WizardLM/WizardCoder-33B-V1.1 released!

Jump

noneabove1182 @sh.itjust.works 2y ago

Btw I know this is old and you may have already figured out your hardware and setup, but p40s and p100s go for super cheap on eBay.

P40 is an amazing $/GB deal, only issue is the fp16 performance is abysmal so you'll want to run either full fp32 models or use llama.cpp which is able to cast up to that size

The p100 has less VRAM but really good fp16 performance which makes it ideal for exllamav2 usage. I picked up one of each recently, p40 was failed to deliver and p100 was delivered while I'm away, but once I have both on hand I'll probably post a comparison to my 3090 for interests sake

Also I run all my stuff on Linux (Ubuntu 22.04) with no issues

2y ago

Beginner questions thread

Jump

noneabove1182 @sh.itjust.works 2y ago

You shouldn't need nvlink, I'm wondering if it's something to do with AWQ since I know that exllamav2 and llama.cpp both support splitting in oobabooga

2y ago

Uncensored Mixtral 8x7B with 4 GB of VRAM

Jump

noneabove1182 @sh.itjust.works 2y ago

Yeah q2 logic is definitely a sore point, I'd highly recommend going with Mistral dolphin 2.6 DPO instead, the answers have been very high quality for a 7b model

But good info for anyone wanting to keep up to date on very low bit rate quants!

2y ago

WizardLM/WizardCoder-33B-V1.1 released!

Jump

noneabove1182 @sh.itjust.works 2y ago

I don't have a lot of experience with either at this time, I've used them here and there for programming questions but usually I stick to 7b models because I use them for code completion and I only find that useful if it completes the code before I do lol

That said, I've had overall good answers from either whenever I've decided to pull them out, it feels like wizard coder should be better since it's so much newer but overall it hasn't been that different. Wish phind would release an update :(

LocalLLaMA @sh.itjust.works

noneabove1182 @sh.itjust.works

2y ago

InternLM2 models llama-fied

2y ago

WizardLM/WizardCoder-33B-V1.1 released!

Jump

noneabove1182 @sh.itjust.works 2y ago

I run my Nvidia stuff in containers to not have to deal with all the stupid shenanigans

2y ago

WizardLM/WizardCoder-33B-V1.1 released!

Jump

noneabove1182 @sh.itjust.works 2y ago

The 3060 is a nice cheap one for running okay sized models, but if you can find a way to stretch for a 3090 or a 7900 XTX you'll be able to run these 33B models with decent quant levels

2y ago

WizardLM/WizardCoder-33B-V1.1 released!

Jump

noneabove1182 @sh.itjust.works 2y ago

First few quants are up: https://huggingface.co/bartowski/WizardCoder-33B-V1.1-exl2

4.25 should fit nicely into 24gb (3090, 4090)

Smaller sizes still being created, 3.5, 3.0, and 2.4

LocalLLaMA @sh.itjust.works

noneabove1182 @sh.itjust.works

2y ago

WizardLM/WizardCoder-33B-V1.1 released!

huggingface.co /WizardLM/WizardCoder-33B-V1.1

2y ago

Do you have a Heat Pump in a cold climate?

Jump

noneabove1182 @sh.itjust.works 2y ago

I live in Ontario where we go down to -30C in the harshest conditions.

We have a heat pump and a furnace and they alternate based on efficiency

Somewhere around -5 to +5 C it switches from the heat pump to the furnace

I think you could get by a bit colder but it really loses out on efficiency vs burning gas unless you invest in a geothermal heat pump

LocalLLaMA @sh.itjust.works

noneabove1182 @sh.itjust.works

2y ago

Microsoft announces WaveCoder

twitter.com /_akhaliq/status/1739486811100004513

2y ago

Mistral releases version 0.2 of their 7B model

Jump

noneabove1182 @sh.itjust.works 2y ago

Seems relatively uncensored, willing to answer most questions

2y ago

Mistral releases version 0.2 of their 7B model

Jump

noneabove1182 @sh.itjust.works 2y ago

It's definitely a little odd.. I'm glad they did any kind of official release for 0.2, but yeah information is sorely lacking and would be nice to have more, especially with how revolutionary the previous one was.. is this incremental? Is it a huge change? Is it just more fine tuning? Did they start from scratch? We'll never know 🤷‍♂️

LocalLLaMA @sh.itjust.works

noneabove1182 @sh.itjust.works

2y ago

Mixture of Experts Explained (Huggingface blog)

huggingface.co /blog/moe

LocalLLaMA @sh.itjust.works

noneabove1182 @sh.itjust.works

2y ago

Mistral releases version 0.2 of their 7B model

mistral.ai /news/la-plateforme/

2y ago

Mistral drops a new magnet download

Jump

noneabove1182 @sh.itjust.works 2y ago

The only concern I had was my god is it a lot of faith to put in this random twitter, hope they never get hacked lol, but otherwise yes it's a wonderful idea, would be a good feature for huggingface to speed up downloads/uploads