.NET Concurrency Model

I tried to replicate the Go concurrency model as demonstrated by my previous post, Software Design and Go. However, the outcome is not as pretty. I say Go concurrency model as it was made most obvious to me from a talk I watched given by Rob Pike, one of the designers of Go. Obviously as a language design decision, Go has channels for communicating between concurrent processes. .NET, as far as I'm aware, does not  have this exactly. What it does have is a set of specialized concurrent collections that you can use for passing data between threads. This seems to work.

The one thing you might be saying is ".NET has async and await now! There's no use for threads anymore!"  I can't begin to describe how wrong that is.  Async and await are more to act like Javascript and Node.js, non-blocking threads of execution that don't necessarily have to be coordinated (concurrent). Threads and Concurrent collections best simulate what's going on with Go's goroutines and channels. Particularly the ConcurrentQueue class.

But, as you might imagine, the syntax is much more verbose. 208 lines vs. 171 lines. 6,946 bytes vs. 3,652 bytes (C# vs Go respectively).

Here is a taste of what the C# program looks like. Obviously the bulk of it has been trimmed, but the meat is here:

Software Design and Go

Go has really flipped the concept of software design upside down for me. It's made it more interesting and fun. I will take you through some iterations of a recursive "merge verifier" that I wrote which was a necessity due to subversion's bad merge outcomes. Where after merging 3-4 branches into a master, the result would be broken. So alternatively, it can just be used to find different files among two identical folder structures.  It's not going to tell you missing files, but that's just a shortcoming of it right now, it will be an added feature later.

So to design this application, there's the linear way. Read in a file from the "left directory", and find the same file in the "right directory". Get an MD5 hash of their contents. I know MD5 is "broked" but in what case in my use would they ever collide on two files that are diffent?  Anyway, compare the MD5s. If they're not equal report them, otherwise, repeat, recurse, and recurse, and repeat. Repeatedly. Until the folder structures have been fully recursed.

Then there's the Go way. Although I'm still trying to determine what that is, evidenced by the application has gone through 2 redesigns at this point. One when learning Go initially, and now, after learning Go and putting it down for a good few months. However, I have a much better idea of how they're supposed to be designed, and the first time I didn't try to do it with the Go concurrency model.

So this new way...  Get a file path and the fact that it exists, and send this along a channel. On a select statement, listen for these newly discovered files. In a go subroutine, get an MD5 of their contents ignoring white space.   This is the blocking call, so we want to do this as separately as possible. Once the file contents are hashed, hang onto it until its twin (the same file in the other directory structure) gets its contents hashed. Once you have both, create a FilePair object and set its left content to the left file, and right content to the right file, and send it along another channel. Receive these on another select statement, and just do reporting. Compare the MD5, if they're different, spit out a message with the relative path to the file.

Going with the new way, I've cut the program down from 15 seconds to 7 seconds, on a 1.48 GB directory structure. So that's two directories so nearly 3GB of data.

The program does include a way to exclude directories (like .svn or .git) and specify only files that we care about (.cs, .css, .js, .ascx, .cshtml, etc), so it's not reading the entire 3 GB of data. But to cut it down in HALF is pretty good, just by reordering the steps and grouping IO together separate from other things, allowing it to read the files and still do work on the side.

Here's the meat of the code: