Then you should take a look at https://github.com/flanglet/kanzi-cpp: it is optimized for fast roundtrips, multi-threaded by design and produces a seekable bitstream.
In the LZ high compression regime where LZ can compete in terms of ratio, BWT compressors are faster to compress and slower to decompress than LZ codecs. BWT compressors are also more amenable to parallelization (check bsc and kanzi for modern implementations besides bzip3).
Notice that bzip3 has close to nothing to do with bzip2. It is a different BWT implementation with a different entropy codec, from a different author (as noted in the GitHub description "better and stronger spiritual successor to BZip2").
Was the name used with permission? Even if not trademarked (because open source freedom woohoo), it's a bit weird to release, say, Windows 12 without permission from the authors of Windows 11
I tried looking it up myself but it doesn't say in the readme or doc/ folder, there is no mention of any of the Bzip2 authors, and there is no website listed so I presume this Github page is canonical
Free software doesn't have a lot of trademarks, with the notable exception of Linux.
Also, the name is the algorithm. Bzip2 has versions and bzip3 is something else which has its own updated versions. Programs that implement a single algorithm often follow this pattern.
Hence my saying "not trademarked (because open source freedom woohoo)". I understand bzip3 is legal, but my question is whether it's a dick move
> the name is the algorithm. Bzip2 has versions and bzip3 is something else
I don't understand this. If bzip3 is "something else" compared to bzip2 then how can both be named after the algorithms they use? Nothing in bzip2's algorithm is related to the digit 2. If it used, say, powers of 2 centrally and bzip3 used pi for something, then the naming could make sense since it's indeed not a version number, but I don't remotely see this argument in these algorithms
Looking on Wikipedia and bzip2's website, there is no explanation of the name, and yesterday I read somewhere (either in the OP or in another comment) that it would stand for "better zip". It has nothing to do with PKZIP though. If they had both implemented pkzip's format, then a "spiritual successor" to bzip2 would be something like uzip, uzip2 (replacing "better" with "ultimate"), bbzip2, or any of a million other options, but that's not the pattern they chose to follow
How is this not just taking their established name for your own project with the version number +1?
I read a while back that bzip2 is named that way because the original bzip used arithmetic coding. The person who made bzip then made bzip2 to use Huffman coding due to patent problems with arithmetic coding.
If people are claiming that a country is threatening nuclear war, they damn well better be able to back it up with something more than narrative-shaping sound bytes.
Go does not have threads but something like "tasks". The fact that no thread handle is exposed allows for transparently moving these tasks across threads if the scheduler decides so.
"go makes concurrency a first-class concept"
I think it usually refers to goroutines being built in the language.
"Go is abnormally dangerous when it comes to concurrency IMO". Personnally, it has not been my experience with Go concurrency. However I have hit some issues when trying to ocrhestrate tasks via channels and ended up resorting to atomics to do the job.
> Go does not have threads but something like "tasks". The fact that no thread handle is exposed allows for transparently moving these tasks across threads if the scheduler decides so.
This doesn't stop there being "task handles" then, though? I think the point GP was making is that something that in most languages would be simple methods on a handle like "wait for this task to finish" or "stop this task" instead need to be done manually in Go with channels (or potentially `Context` in the latter case, although that was a later addition to the standard library). It doesn't really matter whether you call it a thread or a task; either way, it would be nice to get some return value from spawning some background operation and being able to use it to directly interact with it. I agree with GP that it does seem like an odd omission, since I haven't really heard any actual practical explanation for it.
Context for cancellation and replacing thread-local variables (or indeed any way to observe your "current" thread) is one of the things I like tbh. Though Context has abysmal performance implications.
But yeah, I want a goroutine handle with a "Wait()" method. Ideally also returning the results. Like most languages. It'd eliminate a ton of manual mutex and channel use that doesn't need to exist.
---
Re thread vs tasks: that's an implementation detail. You write threaded code and it runs in multiple threads with thread-like memory behavior. In all in-Go observable ways it's identical to threads, and it could be changed to use real hardware threads tomorrow and none of the semantics would change at all. Even cgo would stay the same.
Go has (green) threads. Being more specific is relevant for runtime implementation spelunking and performance details, but not otherwise.
Yeah, I generally think of the word "thread" as referring to OS threads and/or "green" threads depending on context (and in this case I thought it was clear what you were referring to!), but since the person who responded to you made the distinction, I figured I'd use their terminology when explaining what I thought you were saying.
I was just leveraging your already-top reply to reply to both of you, sorry about that :) I should've just done two comments. I think you and I are on the same page here.
I think the main reason it doesn't exist is that go had no generics. It'd need to be another custom-generic type (Future[T] basically), and it would make it harder to pass around, just like channels. But since channels are generally intrusively-added, they aren't part of the return signature, so they avoid that generic-return issue. E.g. every "worker pool" accepts a `func()` and callers need to coordinate return values via channels, instead of needing to return a `func[T]()` reference which they have been unable to do until recently (to some degree at least).
Though they probably could've just said "use a Future[interface{}]", like they did for every other generic collection type.
Plus it'd take some of the emphasis off channels, and they seem to really not want to do that. If they were focused on usability instead of channels and select, they'd let us park on multiple mutexes just like channels, just like the runtime does internally a lot to implement all this... but no. Imagine a world where you could `select { case mut.Lock(): ...}`...
> I think the main reason it doesn't exist is that go had no generics. It'd need to be another custom-generic type (Future[T] basically), and it would make it harder to pass around, just like channels. But since channels are generally intrusively-added, they aren't part of the return signature, so they avoid that generic-return issue.
That's a good point I hadn't thought of! Naively I wan to say they could just "implicitly" make anything returned from a `go func` be passed to a channel and then have `go func` return a channel, but that would require doing a bit of type inference as well as deciding semantics for whether it's possible to get multiple values out of that return channel. It honestly seems like there are some interesting ideas here (e.g. having multiple yields out of a go routine that then get sent to an "output" channel, making a sort of generator-like thing, but I guess I'm not super surprised that Go didn't choose to go this route.
My (rather horrid) pattern to address this problem is to wrap the goroutine in a function that returns a channel receiver. When the goroutine ends it sends something to the channel and whatever called it can await the result or completion using the receiver.
I have, on occasion, used a similar pattern, but instead of sending something, I simply close the channel (usually with a "defer close(c)" at the beginning of the function/closure that encompasses the main code of the goroutine's work).
That way, if I end up having multiple waiters, they will all be able to proceed.
I've always thought it would be nice if the go command returned an ID. Doing so would also be completely backwards compatible, of course. Then add a library or few builtins to do things on that ID, at minimum maybe kill it, perhaps get status of it, etc. Maybe not full blown actor model, but having nothing feels powerless.
Go has OS threads and “green threads” (named “go processes”). You create green threads via the go keyword and the Go runtime assigns that to an OS thread. You can have many go processes to a single OS thread and typically have a maximum of 1 OS thread per CPU core (though that is configurable).
The GP is correct that you cannot manage go processes from outside of that green thread. With (for example) POSIX threads, which still leaves a lot to be desired, you can at least manage the thread from other threads.
Go definitely has some rough edges around threading. The idea is you’re supposed to use channels for everything but in my experience channels have so many edge cases for subtle ways to completely lock up your application that it’s often easier to fallback to the classic mutex-style idioms.
I do really like the go keyword, it’s handy. But I have a background in POSIX threads so probably find concurrency in Go easier than most yet even I have to concede that Go under-delivered on its concurrency promises.