Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Okay, this one has me laughing out loud. Of COURSE Microsoft doesn't like fork()... Windows pretty much can't do it. I'll admit, there have been a lot of times I wish there was a more streamlined way to spawn processes on Linux (particularly daemons) but when I don't have fork() I always end up missing it. I'd take this paper a lot more seriously if it came from someone with a less obvious bias.


The article pointed out legitimate drawbacks related to the intersection of fork() and other features like posix threads.

The paper mentions the benefit of posix_spawn for the fork+exec use case.

I might've seen posix_spawn while skimming a manpage or browsing a change log but this is the first time that I'd actually learned about its purpose.

The article's conclusion isn't "and therefore Linux is bad" btw.


As Linux developer and Windows hater, I agree with Microsoft. fork() is a hack.

Of course, all Windows APIs are terrible, but that doesn't make complaints about fork() any less legitimate. The concept of Establishing empty processes, instead of cloning yourself, is much more sane.

After all, the use of fork() is 99% of the time just to call execve(), and anything done in between is just to clean up the mess from fork(). Having a dedicated way to just create processes in a controlled fashion would have been better there. And, the other 1% is usually cases where pthread should have been used instead.


Cleaning up your own process between fork and exec is hard. Several programs resort to terrible hacks like force-closing everything except file IDs 0,1,2 in a loop. Or they look into their /proc directory to discover whichnfile IDs exist, which is only marginally better. But when your process is a house of cards built on third party libraries with their own minds, there are not a lot of other options.


Use O_CLOEXEC everywhere (even third party libs). It's really annoying, but necessary. Means you need to use accept4(), dup3(), popen with an additional "e" (of course all of that needs to be feature tested, during compilation/runtime).


The catch is that you may not be able to control 3rd party libraries enough to be be able to do all that. Thus all these annoying hacks. To me, the complexity of using fork() and the race conditions around pid reuse are the worst design problems of POSIX systems.


Win32 has the opposite semantics, that O_CLOEXEC is the default semantics and the app has to request the opposite if it wants it, and this causes problems too. There should have been two flags and the application should have to specify one on every handle-/fd-creating system call. Hindsight is 20/20.


> Of course, all Windows APIs are terrible, but that doesn't make complaints about fork() any less legitimate. The concept of Establishing empty processes, instead of cloning yourself, is much more sane.

I like the ease with which you can pass resources and data to the forked child from the parent, though. Otherwise I'd have to do a lot of serialiation and deserialization, or use shared memory, or unix sockets to pass fds, all of which also has it's gotchas and is way more complicated and error prone.


But if you pass resources and data form forked child to parent, you are already using shared memory.

And, in this case, it sounds like a thread would do exactly what you want, but without the oddities of fork().


The memory is not shared, but copied, so you don't have to care about concurrent memory access.


> And, the other 1% is usually cases where pthread should have been used instead.

Ummmm. No. Threads are a much harder API to get right. They can work in this area, but that's not the same as saying they're right for all/most cases in this area.

I think a sizable part of that remaining 1% (if it is that low) are programs that leverage fork as the very powerful right tool for the job. Many of those also happen to be widely-used programs crucial for the operation of web services and large-data-set processing.


> Ummmm. No. Threads are a much harder API to get right.

Ummmm. No. Threading is not a hard API to get right. It's very simple: You get a new executing thread in the same memory space. You can create them whenever you like without any side-effects. Now, don't trample on your memory. Read all you want from anywhere. If you want to write to shared memory, ensure both reads and writes are behind a mutex, or learn about atomics.

Fork(), on the other hand, is much trickier. Sure, you get a cloned memory space so you can trample all you want, but now you have to establish some form of IPC (which might itself end up requiring threading), and if you didn't fork() as the first thing in your process, you end up inheriting all sorts of state that you do not want. Threads and locks, for example, are now in limbo (depending on your unix flavor of choice), and you likely have a bunch of fd's that you did not want.

I cannot really think of any legitimate use-cases for fork() without exec(). There are legitimate use-cases for multi-process designs, but such designs are severely inconvenienced by fork(), as all they wanted to do was to start processes without inheriting state.

I also certainly cannot see any sensible argument for threading being harder than fork(), especially if you're just using it as a drop-in replacement where there will be no shared state after invocation outside of explicitly created communication channels.


> but now you have to establish some form of IPC

Shared memory for threads is a form of IPC too, except one where it's very easy to make a mistake, introduce concurrency bugs.

> I also certainly cannot see any sensible argument for threading being harder than fork()

You should read a paper or two on concurrency bugs. Including on those using explicit but shared communication channels, like CSP does.


Concurrency bugs can be eliminated with state of the art static analysis (see Rust, Pony) - with the exception of deadlocks, which you can easily introduce with multiple processes as well.

http://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.ht...


> I also certainly cannot see any sensible argument for threading being harder than fork(), especially if you're just using it as a drop-in replacement where there will be no shared state after invocation outside of explicitly created communication channels.

Very well said. It's gets no simpler than this. I think all too often, people try and complicate things where they don't need to be. Always do the SIMPLEST thing that works well.


ONE of the authors is from Microsoft. The other THREE are at Boston University and ETH.


As the article points out, the NT kernel actually natively supports fork. It's just not exposed.


Well, couldn't. Whatever they're doing with LXSS and picoprocesses seems to be good enough.

I don't run Windows so I'm far from the most biased person but frankly, on the surface the fork/exec thing really does seem unnecessary and weird in the modern world, where we've come up with better ways to do concurrency than just raw threads and processes anyways.


> Windows pretty much can't do it.

Win32 API cannot do it. The underlying NT kernel can.


I thought Linux had clone which glibc called for their implementation of fork.


Yes, the underlying syscall for fork() is clone [1], and the underlying syscall for exec*() is execve [2].

[1]: http://man7.org/linux/man-pages/man2/fork.2.html#NOTES

[2]: http://man7.org/linux/man-pages/man2/execve.2.html


Section 6: REPLACING FORK

> Alternative: clone().

> This syscall underlies all process and thread creation on Linux. Like Plan 9’s rfork() which preceded it, it takes separate flags controlling the child’s kernel state: address space, file descriptor table, namespaces, etc. This avoids one problem of fork: that its behaviour is implicit or undefined for many abstractions. However, for each resource there are two options: either share the resource between parent and child, or else copy it. As a result, clone suffers most of the same problems as fork (§4–5).


Which part of the paper made you laugh out loud?

Their arguments of why fork() is not a good fit these days seemed pretty reasonable to me.


I've been working with Unix systems for a long time. I too dislike fork(), even though I used to think it was the greatest thing. Here's a write-up of mine as to fork() being "evil": https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: