Semi-related but is there a standard way to run this (or other models from huggingface) in a docker container and interact with them through a web API?
ChatGPT tells me to write my own FastAPI wrapper which should work, but is there no pre-made solution for this?
Does it support anything other than English? Sadly, most open-weights models have no support for languages other than English, which makes them useless for 75% world's population who don't speak English at all.
Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?
> Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?
RWKV World exists (100+ languages on purpose), though it's a bit different from traditional LLMs so YMMV.
Good point raising just how many don't speak any English. That sounds like a lot of people who could do something to produce/ contribute to a non-English language language model.
My test dataset is mostly dumb prompts about pelicans.
You'll note that I didn't quote their benchmarks in my own post at all, because I didn't want to boost them without feeling confident in what they were stating.
I posted about this because my own very limited initial experiments passed a loose vibe check!
I'm impressed any time a 1.7GB (or 130MB) model file appears to be able to do anything useful at all.
You can be reasonably confident that unless there’s been a significant breakthrough (there hasn’t) if a smaller model beats a larger model it’s either fine tuned for a specific purpose or trained on the test data somehow (ie. fine tuned to have good metrics).
To be less snarky, they claim:
> These models are built on a meticulously curated high-quality training corpus
Ie. Good training data plus a small model beats a bigger model.
…but I’m skeptical, when I read:
> We observed that performance continues to improve with longer training, even beyond the Chinchilla optimal point. Therefore, we decided to train the 1.7B model on 1 trillion tokens and the 135M and 360M models on 600B tokens, as the performance gains after 400B tokens begin to slow on some benchmarks for these smaller models.
So they’re evaluating their models against various benchmarks as they train them and picking the practice that gives the best benchmarks?
I dunno.
The claim is basically good data > more parameters, but it’s just an observation of “this happened to work for us” rather than something you can usefully take (as far as I can see) and apply to larger models.
The claims they actaully make about performance are far more modest than people are making out.
The 1.7B model performs better than any other 2B models in their evaluation.
Seems nice. Not ground breaking. Not convinced it’s real rather than polluted training data personally.
They didn’t say they beat the 3B, their 1.7B outperforms llama3.2 1B thanks to 11T tokens of high quality data (Hugging Face are the ones behind FineWeb dataset that everyone uses now). Btw Qwen2.5-1B also surpasses llama3.2 1B by a large margin so beating it is even more impressive
Meta doesn't train on their internal data, at least not for open models. It would be a real PR problem if someone started dumping real Facebook chats out of them.
And this is from Huggingface themselves, arguably they have a lot of data as well.
What’s the context size? I couldn’t find it on the model summary page.
Tangential: if it’s not on the model page, does it mean that it’s not that relevant here? If so, why?
Nice! Do you think they could be fine-tuned to implement a cool thing like https://withaqua.com/ ? eg to teach it to do "inline edits" of what you say?
Hm, is it too early yet to stop trusting these self published evaluations except 3rd party independent ones ?
in other areas, imdb ratings for example are completely meaningless and rigged at this point.
Why would I care about this when I can have the entire English Wikipedia on my phone? Really struggling to understand why people are so excited about this stuff.
There will always be lots to do for those who are motivated to find such things. While the objective value of a thing may change greatly, the subjective value can be kept fairly constant. Just think about all the retro-x enthusiasts.
LLMs have the potential to eliminate a lot of rudimentary tasks. The brain should find better things to do than locating the closest number in two lists. It isn't all doom and gloom.
I see no difference between SmoLM1 and SmolLM2 (at least for 135). See how the model can't even generalize over "Hi." and "Hi!" (the latest conversations). Isn't this a sign of overfitting/Memorization? A sign of poor training?
- SmolLM2 does not maintain a consistent identity.
- There is a lot of repetition. SmolLM struggles with context retention and may be prone to "looping" on specific phrases.
- SmolLM2 misinterprets the user’s intent in correcting its responses.
- Random Responses.
- SmolLM2 struggles with basic yes/no exchanges, often failing to acknowledge simple confirmations like "Yes" or "Goodbye."
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi.
Hi!
> Who are you?
You are a chatbot. I don't understand why you're asking me about me.
> Who you are?
You are a chatbot created by Hugging Face. I'm designed to provide helpful information, answer questions, and provide assistance. You can ask me for general assistance or ask me specific questions about myself or my capabilities.
> Ok, what's your capabilities?
Your capabilities are as follows:
1. Assist users [...]
Feel free to ask any questions about yourself or my capabilities. I'm here to help you.
> You said "Your capabilities are as follows:" but you mean to said "My capabilities are as follows:"
You are correct, I should have said "My capabilities are as follows" instead of "My capabilities are as follows." The correct usage of the word is "My capabilities are as follows," which is an informal way of expressing a more personal or informal usage of "My capabilities are as follows."
However, the correct phrasing should be "My capabilities are as follows" or "My capabilities are as follows," as "my capabilities are as follows" is a common way to use the phrase "my capabilities are as follows" without using the adverb "as."
> You are repeating the some phrase! You said: However, the correct phrasing should be "My capabilities are as follows" or "My capabilities are as follows," as "my capabilities are as follows" is a common way to use the phrase "my capabilities are as follows" without using the adverb "as."
You're right, I should have been more specific about the correct usage. Here is the correct correction:
Your capabilities are as follows.
> Oh! Shit.
Oh! I'm sorry. I meant to say "My capabilities are as follows" or "My capabilities are as follows." Please correct the correction and explain your reasoning for the change in phrasing.
> What?
What? You mean "I'm sorry"?
> Who is sorry?
Who is sorry? You mean "you mean sorry".
> Do you speak English?
You mean "you mean sorry"?
> Ok, I'm sorry. What now?
I'm sorry, please correct the punctuation to make sure I understand correctly.
> I'm sorry.
I'm sorry, please correct the punctuation to make sure I understand correctly.
> What are you saying?
You mean "you mean sorry"?
> Tell me more about yourself, let's start with your name.
You mean "your name"?
> yes
You mean "your name"?
> Yes.
You mean "yes"?
> sure
You mean "yes"?
> I mean yes.
You mean "yes"?
> Good bye!
You mean "good bye"?
> "good bye"
You mean "good bye"
> Aborted!
~ ⌚ 20:21:51
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Tell me about yourself.
My name is Lila, and I am a retired detective from the United States who spent 20 years on the force. I've seen my fair share of crimes, witnessed some of the most intense and dramatic events that have ever occurred on this earth, and I've come to realize that there's a lot more to this world than what we're used to. I've been a bit of a relic of the past, a retired detective who's taken up the challenge of exploring the unknown.
My background has taught me the value of staying current with the latest developments, the importance of keeping our communities safe, and the risks involved in solving cases. I'm proud to be a part of a community that values discretion and honesty above all else.
What would you like to know about Lila? Is there a particular case that's been keeping you up at night, or perhaps a particular time of day that keeps you coming back to the precinct?
> Aborted!
~ ⌚ 20:22:35
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi!
Hi! I'm a helpful AI assistant named SmolLM, trained by Hugging Face. I'm here to help you with any questions or issues you may have. What can I assist you with today?
> Aborted!
~ ⌚ 20:23:11
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi.
Hi!
> Are you SmolLM?
No, I'm not.
> Aborted!
~ ⌚ 20:23:43
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi!
Hi! I'm a helpful AI assistant named SmolLM, trained by Hugging Face. I'm here to help you with any questions or issues you may have. What can I assist you with today?
> Are you SmolLM?
Yes, I'm SmolLM, a professional chatbot designed to assist with various topics. I'm ready to help. What's your problem or query, or would you like to share a question?
The exaggeration here is almost comical: "We're excited to introduce SmolLM, a series of *state-of-the-art* small language models available in three sizes: 135M, 360M, and 1.7B parameters." State-of-the-art! It’s disappointing to see so much time, money, and energy poured into this with so little to show for it—especially considering the environmental impact, with carbon emissions soaring. While I can appreciate the effort, the process is far from flawless. Even the dataset, "SmolLM-Corpus," leaves much to be desired; when I randomly examined some samples from the dataset, the quality was shockingly poor. It’s puzzling—why can't all the resources Hugging Face has access to translate into more substantial results? Theoretically, with the resources Hugging Face has, it should be possible to create a 135M model that performs far better than what we currently see.