More

pansa2 · 2026-03-17T23:51:33 1773791493

> Wouldn't this get the funding back?

The funding was Microsoft employing most of the team. They were laid off (or at least, moved onto different projects), apparently because they weren't working on AI.

kelvinjps · 2026-03-18T01:31:30 1773797490

With Python being the main language for AI, isn't like more important to be more performant? I kinda don't get Microsoft reasoning, maybe they're just tight in money

brianwawok · 2026-03-18T02:05:07 1773799507

I don’t think Python is the main language of AI.

eru · 2026-03-18T02:36:19 1773801379

Python is pretty big as glue in the AI ecosystem as far as I can tell. It also seems to be most agent's 'preferred' language to write code in, when you don't specify anything.

(The latter is probably more to do with the preferences they give it in the re-inforcement learning phase than anything technical, though.)

pansa2 · 2026-03-17T23:19:38 1773789578

The Python devs didn’t want to make huge changes because they were worried Python 3 would end up taking forever like Perl 6. Instead they went to the other extreme and broke everyone’s code for trivial reasons and minimal benefit, which meant no-one wanted to upgrade.

Even the main driver for Python 3, the bytes-Unicode split, has unfortunately turned out to be sub-optimal. Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.

diziet_sma · 2026-03-18T00:32:49 1773793969

> Python essentially bet on UTF-32 (with space-saving optimisations)

How so? Python3 strings are unicode and all the encoding/decoding functions default to utf-8. In practice this means all the python I write is utf-8 compatible unicode and I don't ever have to think about it.

sheept · 2026-03-18T00:54:00 1773795240

UTF-32 allows for constant time character accesses, which means that mystr[i] isn't O(n). Most other languages can only provide constant time access for code units.

msl · 2026-03-18T08:06:49 1773821209

UTF-32 allows for constant time access to code points. Neither UTF-8 nor UTF-16 can do the same (there are 2 to the power of 20 valid code points, though not all are in use).

While most characters might be encodable as a single code point, Python does not normalize strings, so there is no guarantee that even relatively normal characters are actually stored as single code points.

Try this in Python:

  s = "a\u0308"
  print(s)
  print(s[0])

You will see:

  ä
  a

cloudbonsai · 2026-03-18T03:43:51 1773805431

Internally Python holds a string as an array of uint32. A utf-8 representation is created on demand from it (and cached). So pansa2 is basically correct [^1].

IMO, while this may not be optimal, it's far better than the more arcane choice made by other systems. For example, due to reasons only Microsoft can understand, Windows is stuck with UTF-16.

[1] Actually it's more intelligent. For example, Python automatically uses uint8 instead of uint32 for ASCII strings.

zahlman · 2026-03-18T07:19:43 1773818383

There is no caching of a "utf-8 representation". You may check for example:

  >>> x = '日本語'*100000000
  >>> import time
  >>> t = time.time(); y = x.encode(); time.time() - t # takes nontrivial time
  >>> t = time.time(); y = x.encode(); time.time() - t # not cached; not any faster

Generally, the only reason this would happen implicitly is for I/O; actual operations on the string operate directly on the internal representation.

Python uses either 8, 16 or 32 bits per character according to the maximum code point found in the string; uint8 is thus used for all strings representable in Latin-1, not just "ASCII". (It does have other optimizations for ASCII strings.)

The reason for Windows being stuck with UTF-16 is quite easy to understand: backwards compatibility. Those APIs were introduced before there supplementary Unicode planes, such that "UTF-16" could be equated with UCS-2; then the surrogate-pair logic was bolted on top of that. Basically the same thing that happened in Java.

cloudbonsai · 2026-03-18T09:56:17 1773827777

> There is no caching of a "utf-8 representation".

No there certainly is. This is documented in the official API documentation:

    UTF-8 representation is created on demand and cached in the Unicode object.

    https://docs.python.org/3/c-api/unicode.html#unicode-objects

In particular, Python's Unicode object (PyUnicodeObject) contains a field named utf8. This field is populated when PyUnicode_AsUTF8AndSize() is first called and reused thereafter. You can check the exact code I'm talking about here:

https://github.com/python/cpython/blob/main/Objects/unicodeo...

Is it clear enough?

zahlman · 2026-03-18T19:26:35 1773861995

The C API may provide for it, but I'm not seeing a way to access that from Python. This sort of thing is provided for people writing C extensions who need to interface to other C code.

(And the code search seems to be broken; it can't find me the definition of `unicode_fill_utf8` although I'm sure it's obvious enough.)

nslsm · 2026-03-18T04:41:03 1773808863

Read first paragraph here https://devblogs.microsoft.com/oldnewthing/20190830-00/?p=10...

pansa2 · 2026-03-18T01:07:26 1773796046

> all the encoding/decoding functions default to utf-8

Languages that use UTF-8 natively don't need those functions at all. And the ones in Python aren't trivial - see, for example, `surrogateescape`.

As the sibling comment says, the only benefit of all this encoding/decoding is that it allows strings to support constant-time indexing of code points, which isn't something that's commonly needed.

laurencerowe · 2026-03-18T01:34:55 1773797695

They absolutely do because random byte strings are not valid utf8. Safe Rust requires validating bytes when converting to strings because this.

zahlman · 2026-03-18T07:11:32 1773817892

> Python essentially bet on UTF-32 (with space-saving optimisations), while everyone else has chosen UTF-8.

It did nothing of the sort. UTF-8 is the default source file encoding and has been the target for many APIs. It likely would have been the default for all I/O stuff if we lived in a world where Windows had functioning Unicode in the terminal the whole time and didn't base all its internal APIs on UTF-16.

I assume you're referring to the internal representation of strings. Describing it as "UTF-32 with space-saving optimizations" is missing the point, and also a contradiction in terms. Yes, it is a system that uses the same number of bytes per character within a given string (and chooses that width according to the string contents). This makes random access possible. Doing anything else would have broken historical expectations about string slicing. There are good arguments that one shouldn't write code like that anyway, but it's hard to identify anything "sub-optimal" about the result except that strings like "I'm learning 日本語" use more memory than they might be able to get away with. (But there are other strings, like "ℍℯℓ℗", that can use a 2-byte width while the UTF-8 encoding would add 3 bytes per character.)

rjh29 · 2026-03-18T00:16:58 1773793018

Ironically Perl 5 managed to do the bytes-Unicode split with a feature gate, no giant major version change.

pansa2 · 2026-03-17T22:38:38 1773787118

Maybe they could have two versions of the interpreter, one that’s thread-safe and one that’s optimised for single-threading?

Microsoft used to do this for their C runtime library.

veber-alex · 2026-03-17T23:05:15 1773788715

That's exactly what we have now and it looks like the python devs want a single unified build at some point

chuckadams · 2026-03-18T00:05:28 1773792328

PHP does this as well. Most distributions ship PHP without thread safety, but it's seeing more use now that FrankenPHP uses it. Speaking of which, it would be nice if PHP's JIT got a little love: it's never eked out more than marginal gains in heavily-numeric code.

pansa2 · 2026-03-17T22:29:17 1773786557

There isn’t a dev mailing list any more, is there? Do you mean the Discord forum?

pansa2 · 2026-03-17T22:21:52 1773786112

>> Python 2->3 transition

> taking backwards compatibility so seriously

Python’s backward compatibility story still isn’t great compared to things like the Go 1.x compatibility promise, and languages with formal specs like JS and C.

The Python devs still make breaking changes, they’ve just learned not to update the major version number when they do so.

BarryMilo · 2026-03-17T22:55:59 1773788159

Indeed, Python's version format is semver but it's just aesthetics, they remove stuff in most (every?) minor version. Just yesterday I wasted hours trying to figure out a bug before realizing my colleague hadn't read the patch notes.

pansa2 · 2026-03-09T08:22:38 1773044558

Crystal’s syntax is similar to Ruby’s, but AFAIK the similarity more-or-less ends there.

pansa2 · 2026-03-08T06:57:49 1772953069

PyPy is a fantastic achievement and deserves far more support than it gets. Microsoft’s “Faster CPython” team tried to make Python 5x faster but only achieved ~1.5x in four years - meanwhile PyPy has been running at over 5x faster for decades.

On the other hand, I always got the impression that the main goal of PyPy is to be a research project (on meta-tracing, STM etc) rather than a replacement for CPython in production.

Maybe that, plus the core Python team’s indifference towards non-CPython implementations, is why it doesn’t get the recognition it deserves.

mattip · 2026-03-08T08:01:10 1772956870

Third party libraries like SciPy scikit-learn, pandas, tensorflow and pytorch have been critical to python’s success. Since CPython is written in C and exposes a nice C API, those libraries can leverage it to quickly move from (slow) python to (fast) C/C++, hitting an optimum between speed of development and speed of runtime.

PyPy’s alternative, CFFI, was not attractive enough for the big players to adopt. And HPy, another alternative that would have played better with Cython and friends came too late in the game, by that time PyPy development had lost momentum.

toxik · 2026-03-08T08:09:14 1772957354

PyPy on numpy heavy code is often a lot slower than CPython

mattip · 2026-03-08T09:29:05 1772962145

Yes. The C API those libraries use is a good fit to CPython, a bad fit to PyPy. Hence CFFI and HPy. Actually, many if the lessons from HPy are making their way into CPython since their JIT and speedups face the same problems as PyPy. See https://github.com/py-ni

jjgreen · 2026-03-08T18:52:46 1772995966

I rather like Python and have used the C API extensively, "nice" is not the word I'd choose ...

glkindlmann · 2026-03-08T13:25:47 1772976347

Sorry can you explain more the connection between PyPy and CFFI (which generates compiled extension modules to wrap an existing C library)? I have never used PyPy, but I use CFFI all the time (to wrap C libraries unrelated to Python so that I can use them from Python)

mattip · 2026-03-08T14:30:02 1772980202

CFFI is fast on PyPy. The JIT still cannot peer into the compiled C/C++ code, but it can generate efficient interface code since there is a dedicated _cffi_backend module built into PyPy. Originally that was the motivation for the PyPy developers to create CFFI.

glkindlmann · 2026-03-08T20:42:24 1773002544

Thank you for the background info, and sorry for me explaining CFFI (I just wanted to be sure we were talking about the same thing). Being ignorant about PyPy, I honestly had no idea until now that there was a personnel or purpose overlap between CFFI and PyPy. I am very grateful for CFFI (though I only use it API mode).

pjmlp · 2026-03-08T18:58:16 1772996296

Python was already widely deployed before them, thanks to Zope, and being a saner alternative to Perl.

EdNutting · 2026-03-08T10:34:58 1772966098

The Faster Python project would’ve got further if Microsoft hadn’t let the entire team go when they made large numbers of their programming languages teams redundant last year. All in the name of “AI”. Microsoft basically gave up on core computer science to go chase the hype wave.

pansa2 · 2026-03-08T11:18:39 1772968719

You’re right, of course: even Guido seems to have been moved off working on CPython and onto some tangentially-related AI technology.

However, Faster CPython was supposed be a 4-year project, delivering a 1.5x speedup each year. AFAIK they had the full 4 years at Microsoft, and only achieved what they originally planned to do in 1 year.

Qem · 2026-03-08T14:41:37 1772980897

To be fair, they suffered a bit from scope creep, as mid project it was started a second major effort to remove the gil. So the codebase was undergoing two major surgeries at the same time. Hard to believe they could stick to the original schedule under those conditions. Also gil removal decreases performance from sequential execution. I imagine some gains from Faster CPython were/will be spent compensating this hit on gil-less single thread performance.

EdNutting · 2026-03-08T10:35:33 1772966133

(This affected TypeScript, .NET and other folk too)

pjmlp · 2026-03-08T10:42:41 1772966561

See also VC++ now lagging behind ISO, after being the first to achieve C++20.

grzaks · 2026-03-08T10:01:25 1772964085

We have been using PyPy on core system component on production for like 10 years

ajross · 2026-03-08T13:32:36 1772976756

> PyPy is a fantastic achievement and deserves far more support than it gets

PyPy is a toy for getting great numbers in benchmarks and demos, is incompatible in a zillion critical ways, and is basically useless for large-scale development for anything that has to interoperate with "real" Python.

Literally everyone who's ever tried it has the experience that you mock up a trial for your performance code, drop your jaw in amazement, and then run your whole app and it fails. Until there's a serious attempt at real 100% compatibility, none of this is going to change.

Also none of the deltas are well-documented. My personal journey with PyPy hit a wall when I realized that it's GC is lazy instead of greedy. So a loop that relies on the interpreter to free stuff up (e.g. file descriptors needing to be closed) rapidly runs into resource exhaustion in PyPy. This is huge, easy to trip over, extremely hard to audit, and... it's like it's hidden lore or something. No one tells you this, when it needs to be at the top of their front page before your start the port.

networked · 2026-03-08T15:00:57 1772982057

"Ask HN: Is anyone using PyPy for real work?" from 2023 contradicts you about PyPy being a toy. The replies are noticeably biased towards batch jobs (data analysis, ETL, CI), where GC and any other issues affecting long-running processes are less likely to bite, but a few replies talk about sped-up servers as well.

https://news.ycombinator.com/item?id=36940871 (573 points, 181 comments)

cfbolztereick · 2026-03-08T15:01:20 1772982080

Timely management of external resources is what the `with` statement has been for since 2006, added in python 2.5 or so. To debug these problems Python has Resource Warnings.

Additionally, CPython's gc is also only eager in a best effort kind of way. If cycles are involved it can take long to release memory. This will become even more the case in future versions of CPython, in the free threading variants.

ajross · 2026-03-08T15:46:29 1772984789

Sorry, the with statement is non-responsive. The question isn't whether you "can" write PyPy-friendly code. Obviously you can.

The question isn't even whether or not you "should" write PyPy-friendly code, it's whether YOU DID, or your predecessors did. And the answer is "No, they didn't". I mean, duh, as it were.

PyPy isn't compatible. In this way and a thousand tiny others. It's not really "Python" in a measurable and important way. And projects that are making new decisions for what to pick as an implementation language for the evolution of their Python code have, let's be blunt, much better options than PyPy anyway.

aftbit · 2026-03-09T16:43:38 1773074618

Strongly disagree. If you're relying on Python garbage collection to free file descriptors in a loop, you have a subtle bug that will rear its head in unexpected and painful ways (and by some unwritten law of software, most notably either at 3 AM or when you have an important demo scheduled). This is true whether you're running in CPython or PyPy. It's not hard to avoid - use `with` or `try...finally`. It's not some newfangled language feature. It's not a surprise - you can't write good RAAI code in Python. It's a sign of someone with a poor grasp of the language they're using. If you find things like this, you should fix them, even if you never intend to use PyPy.

ajross · 2026-03-10T15:28:56 1773156536

> If you're relying on Python garbage collection to free file descriptors in a loop

Again, that's a proscription for how to write python code for future execution. It's emphatically not a statement for the behavior expected by python code already in production, which tends to rely on this behavior (along with many other such warts and subtleties) implicitly.

And the fact that PyPy doesn't feel the need to clone it (and all the others) explains why PyPy basically doesn't work for existing python code.

I mean, me being an idiot python developer in your eyes does nothing to make the ancient code I received run. It just makes you feel smarter. That's a bad trade.

PyPy needs to be compatible before anyone is going to use it. And it isn't. And so people didn't. And so now it's basically dying as no one wants to work on a project no one uses.

aftbit · 2026-03-12T04:12:30 1773288750

It's not about feeling smart or dumb. I'm not doubting that lack of perfect compatibility is holding pypy back, though I suspect it's more related to C extensions and libraries than it is to bug-for-bug compatibility with the garbage collector. I just think that code which does this specific wrong thing that you've mentioned is already doomed to not work reliably even under CPython.

cozzyd · 2026-03-08T19:02:33 1772996553

I've run into similar resource limit exhaustion due to the GC not keeping issues with cpython as well

pansa2 · 2026-02-21T14:20:52 1771683652

Fundamentally, CPUs use 0-based addresses. That's unavoidable.

We can't choose to switch to 1-based indexing - either we use 0-based everywhere, or a mixture of 0-based and 1-based. Given the prevalence of off-by-one errors, I think the most important thing is to be consistent.

pansa2 · 2026-02-21T14:05:47 1771682747

The reason many languages prefer `length` to `count`, I think, is that the former is clearly a noun and the latter could be a verb. `length` feels like a simple property of a container whereas `count` could be an algorithm.

`countof` removes the verb possibility - but that means that a preference for `countof` over `lengthof` isn't necessarily a preference for `count` over `length`.

ncruces · 2026-02-21T14:16:41 1771683401

But count is more clearly a dimensionless number of elements, and not a size measured in some unit (e.g. bytes).

layer8 · 2026-02-21T18:42:29 1771699349

I tend to use numFoos (short for “number of foos”), and only use fooCount when the variable is used for actual counting (like an errorCount variable that is incremented for each error).

Countof is strange, because one doesn’t talk about the “count of something” in English, other than uses like “on the count of three” (or the “count of Monte Cristo” ;)).

pansa2 · 2026-02-16T09:30:55 1771234255

At least one of them is wrong