Need Help With Pull Request

I’ve recently been working on a pull request for Rake and I’ve run into some problems I don’t know how to solve, so I’m asking for some help.

Here’s the pull request on GitHub.

At my previous job, we used Rake to build our C++ projects. We used Rake’s multitask feature to allow multiple C++ files to be compiled at the same time, which nicely speeds up builds. However, the output from the various tasks often gets jumbled. Given that the tasks are running in parallel, this isn’t surprising, but it is a major pain when an error occurs. In that case, the error message really needs to be on its own line so that the editor or IDE can find it properly when navigating through errors.

Here’s a simplified example that illustrates the interleaved output:

Simple Rakefile

25.times do |n|
  child_task_name = "child\#{n}"
  task child_task_name do
    print "Run ", "child", n, "\n"
  end
  multitask default: child_task_name
end

Running rake with this Rakefile results in output something like this:

Rake output

Run child0
Run child1
Run child2
Run child3
Run childRun Run child6Run child8
Run child11Run child15
Run child19Run child23Run child7
Run childRun child13Run child16Run child20
4
 
 
Run child10Run childRun child17
Run childRun child
Run child9
 
 
Run child18
 
child5
14
Run child24
22
12
21

My initial idea for fixing this problem was to wrap the $stdout and $stderr streams with an object that would acquire and hold a lock while forwarding any output message to the original stream. That would essentially treat any single output operation (like a call to print or puts) as an atomic operation. As of this writing, the code in the pull request implements this solution.

After testing this approach, I realized that it doesn’t work. There are a couple of problems:

Using puts or other output methods directly (as opposed to something like $stdout.puts) calls methods on Kernel. Logically, these methods simply forward to $stdout as if we’d written $stdout.puts. However, in MRI these methods are implemented directly in C and so the forwarding is done at that level. These output methods ultimately make one or more calls to the low-level write() function. It is only these low-level calls that go through the wrapper and get synchronized. In the example above where I call print with four arguments, the high-level call to print doesn’t go through the wrapper, and so is not treated as an atomic operation. Instead, there are four separate calls to write that go through the wrapper. JRuby seems to do something similar here.
Many Rake tasks use sh or similar to run external programs. These programs may write to the standard output streams directly, and those writes won’t go through the wrapper either.

I’m not sure where to go next. Does anyone have any ideas on how to synchronize output to standard streams in a way that still allows the various tasks to run in parallel? I don’t mind if the output from the various tasks is interleaved; I just want each line of output to stand on its own. I’ll settle for having each independent write operation be treated as atomic.

I’d appreciate any ideas or advice on how to proceed with this, or pointers to other solutions that people have come up with in similar contexts.

Comments