Adventures in Ollama (Part 2)
Today, let’s compare two more models together. And, let’s implicitly compare them against Llama3.3:70b, which we reviewed in part 1.
GPT-OSS:20b
First? Let’s look at a newer model, GPT-OSS:20b. Again, I ran the exact same prompt on the model, in a new chat (so no previous context).
When I entered the prompt it started “thinking” for about 15 seconds. I could show or hide what it was thinking about (I think) by expanding or collapsing the thinking button. After the thinking it took an additional 20 seconds to write 161 lines of code. This surprised me because the earlier Llama3.3:70b model had produced files under 100 sloc. But it was very fast, and the explanation of the code it gave was quite different too. Whereas Llama3.3 would often give useless commentary such as “This script uses regular expressions to split the file contents into individual words/tokens and then creates a list of objects with the specified properties.”, it would sometimes pop a second window open with an example running in it, or give me some ideas on how to run it from the command line. I can see how this might be useful in some cases, but for what I am looking for, if I need that I will ask for it. On the other hand, GPT-OSS:20b gave me a short, numbered list
overview of how the program worked (in point form) and some point-form tips on the major decisions it made such as:
Word cleaning uses preg_replace(“/[^A-Za-z0-9′-]/”, “”, $token) so apostrophes (‘) and hyphens (-) stay, while all other punctuation is removed. Adjust the regex if you want a different rule.
I found this way of explaining what it did much more useful to a programmer than Llama3.3:70b’s way of commenting on it. It even gave me a preview of how the JSON file would look. This is really great! I find I can quickly scan the important points and now I’m excited to look at the code (and i’m only on the first run!)
So, why is the code 60% bigger than before? Oh no, is the code…. bad? No! wait for it…
beautiful comments!
I have to show you this. Let’s compare the front of the file and the first few sections with what I got from Llama3.3:70b;
| gpt-oss:20b | Llama3.3:70b | ||||
|
gpt-oss:20b
|
Llama3.3:70b
|
The above two snippets are each just about 10.5 SLOC. But the visual impact and ease of reading the source code is very different. Please refer to the numbered sections. They co-respond to the overview I mentioned earlier. This way of commenting and organizing code is superior to what Llama3.3:70b produced.
I much prefer the code that gpt-oss:20b produced. However, I have not tested it yet.
Testing
In testing, the other three runs would sometimes think for 20 or 30 seconds, but would make up for it in code generation speed later on. Overall a very quick model compared to Llama3.3:70b. What about the results, then? First, all the runs produced beautiful code and commentary as before. Second, the files were 161, 151, 195 and 178 lines of code, respectively.
JSON quality
This time, the filename and pathing requirements seemed better understood, although run #2 produced a file named story.json.json. Shame! And, run #3 counted 4650 words (while every other run counted 4651, as did the Llama 3.3 70b model runs). However, all the JSON was full and complete and human readable, and that is appreciated that I can rely on it for something simple like that.
Reconstruction quality
All 4 were able to reproduce the text file, however, the second run again made an oopsie in the filename and called it story.json-2.txt. To tell you the truth, I modified the prompt after the Llama3.3:70b tests AND the first batch of tests for this model, because I thought my instructions might have been bad (despite a 50% or better success rate with what I wanted). Well, it still managed to screw it up, so there. But overall I think the quality here and the comprehension is a bit better than Llama3.3:70b. Still gotta watch it, occasionally. “write the test first” they say.
GPT-OSS:20b in retrospect
it’s better than Llama3.3:70b for PHP. And, I suspect, for other light code generation tasks.
What about GPT-OSS:120b?
Yeah yeah. Okay I’ll do a couple runs!
Well, it was a bit slower, finishing in about 45 seconds. What did I get? Well, the code seems a little clearer and more complete. It’s hard to describe beyond little things like better comments, better break-down of what’s going on inside a loop, and little touches like normalizing an apostrophe to the ascii version. It’s just a lot of little things, but they add up. Here’s an example.
|
|
Can you tell which one is better? yeah, that’s the 120b model. Wait, you can’t tell? it’s the one on the right. it has a better error message and more concise code. I mean, I get the idea to spread things out, but the 120b model seems to have a better feel for what I prefer in terms of organization and verbosity.
Another example? The comment for step 4 was “Tokenise the text – keep words and whitespace separately” in the 20b model and “Walk through the token list and build the objects” in the 120b model. I like the second comment better. If that wasn’t enough, it included a 4a. subsection comment which was very useful, and said “Build the dictionary‑ready version (obj->w)”. The 20b model didn’t think of that.
It was all those little things that led me to believe, if it’s just me doing coding, i’d rather use the 120b model, because it’s just going to be that much more complete and I can wait another 15 seconds for a better file. Especially if it would save me an hour of time coding and debugging otherwise.
But I also see the use case of the 20b model if you just need really really fast simple code completion. Something like where your coding and you type a detailed function name and it suggests a function for you. Still, I kind of prefer the code from the 120b model.
And there it is. GPT-OSS is better at PHP code generation than Llama3.3:70b. Not that Llama3.3:70b can’t get the job done, though! To be completely honest, looking at the code from both models taught me a little about PHP. Both models have something to offer, and I’m not deleting llama3.3:70b yet. I’m just replacing it with GPT-OSS in my IDE!
Filed under: AI,LLM,Ollama,PHP,Programming - @ October 28, 2025 1:58 am