The Mind of a 20+ Year Developer

I'm not sure how long I've been coding. Let me see... I started with GWBasic in my senior year of high school, so approximately Fall of 1996, so 22 years fully, but I didn't know jack then. I don't know jack still, so by that measure, I've not been developing at all :)

But computer science started in college in Spring of 1998, so it's been over 20 years since I actually started to learn some things. Those were exciting times! Anyway...

As a developer of 20 years, really into learning everything I could, having dealt with many different languages and coding paradigms, frameworks, ideas, projects, protocols, etc. at some point, a lot of solutions started to seem obvious to me. For a recent example, I've been working on my Go Sitecore API. Sitecore is a hierarchical system for storing content. Content is a broad term in the Sitecore world. As they say, "Sitecore is built on Sitecore". It stores all of your data structures (templates), content, ways to render the content (layout and renderings for all intents and purposes), system data, etc. Everything is a node. A node is an instance of a "Template".  The "Template" node is a node. Etc. And it's all tree based.

So in writing the Sitecore API, you don't necessarily want to deal with the entire tree. Also being a tree, the nodes have a "Path" property, like /sitecore/templates/User Defined/My Template. I wrote a simple way to build the tree from the database, then filter out the tree by a set of paths ( paths []string). This would simply go through the tree and the result would be a map of Guid -> Item (a node), where the nodes returned would all reside within the path specified. You could provide a path like "-/sitecore/system" (beginning with a hyphen) to say that you want to exclude those items. That code is here. So I found myself needing the opposite.

Give me all nodes in these paths, then give me all nodes NOT in these paths. You could write a set operation, an XOR or something like that. But I needed to do it by path. Knowing I had the path operations like "-/sitecore" (starting with hyphen) to exclude items, I quickly said to myself, "why not use the same paths, and where it starts with -, remove it, and if it doesn't start with -, then prepend it, and use those paths?"  So that's what I did. You can see that code here.

Of course, now I'm thinking the XOR operation might be a better idea! Give me all of the nodes in those paths, then loop through the base tree and add any nodes where the ID is not in the resulting filtered tree... that might be a little bit better, I think... although it does result in two loops through the entire tree contents, my original idea may actually be the better one.

So you can see how the mind of a 20 year developer works. Also I'm not afraid to say "Oh, yeah, that can be done much better, I'm going to rewrite it."

An Uncanny Memory

Another thing that I've noticed is that I know how pretty much everything I wrote works. For every project, if someone asked me, "Hey, there's a component in this project you worked on 5 years ago, I have to make an update to it, can you tell me how you wrote it?"  "Sure! It works like this" and then I'd spout off all the things it does. And any thing to watch out for. Sometimes I'm amazed at how I have retained that information through all I've worked on in the past 20 years.

I might benchmark those two methods and see which one is faster. Happy Coding!

SQL Server, A Million Updates, Multithreading and Queues

In the past month or two, with work, I've had two projects that have involved massive updates of data. Pulling data from a source, processing it, and updating SQL Server in both instances coincidentally. I've learned a lot.

First, SQL Server does not respond well to multiple threads doing thousands of updates each. I did not know this. I've seen the error message, "Transaction (Process ID XX) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction." more times than I'd like to admit. I've done multiple threads doing SQL updates many times before, but I guess never with tens of thousands of rows.

I wrote two apps that process hundreds of thousands of rows of data each. One was written in C#, the other was in Python. I'm not quite as adept in Python but I've learned some tricks.

The approach I've taken for each language was almost similar. Both involved creating a shared queue that would hold all of the SQL statements that need to run. The SQL statements are just stored procedure calls. There would be one process that just goes through the queue, batches them into 15-30 statement chunks, then executes it.

The two solutions, Python and C#, were slightly different though. In Python, the multiple threads would add to the queue, then after all the threads were done processing, it would process all of them. The C# solution involved creating an object which was a singleton (per connection string), and held the queue, and it would contain its own thread which would constantly process the queue. But just one thread so it wasn't overwhelming SQL Server in any way. Here's a little bit of code. In each language, I used the built in Queue provided by their respective standard library, although in C# I used the ConcurrentQueue.

C# pseudo code



        multiple threads collecting data
        {
              call service
              add all data to be updated to the service
        }

        service data collection
        {
              get sql for object (orm)
              add to the shared queue
        }


        sql execution thread - run
		{
			while (true)
			{
				open connection
					while queue has stuff
					{
						create a string builder, add 15 or so commands to the batch, separated by ;

						build n database command and execute it.
					}
				}

				close connection.
				sleep for a second.
			}
		}

Python pseudo code


        multiple threads collecting data
        {
             download data (in this case, it's just download csvs.
             process data
             get sql statement
             add sql statement to the shared queue
        }

        main thread
        {
             collect all data (fill the queue) across multiple threads
             process the queue, calling each batch of 35 in this case, in a single thread
        }

So as you can see, the C# version is processing the queue as the data is being collected, and Python waits until the end and then processes the queue. I think the C# approach is better, as I said I'm a little bit more adept with C# so I'm more comfortable doing things like that there. Hopefully this helps someone out there with processing loads of data.

Of course, in Go I would have just used a channel! Just kidding, there'd be the same amount of complexity in Go, but definitely the end result would be a lot better looking!

Happy Coding!

CRUDGEON

I thought that was a word when I first wrote it. But I was thinking about other words like curmudgeon or something. Anway...

CRUDGEON

A set of big projects came along in work that consisted of some of the same pieces, in a high level of thinking. Get data from a service, store it in a database, generate html based on the data. I guess it doesn't matter that HTML is generated. Generate formatted output based on the data. That's better.

The services are not consistent in their details. One was a WSDL Web Service, one is a JSON service, and two are just schemaless XML. This part was pretty annoying. Schemaless XML and JSON need to go away. We are in 2018, the dynamic typing experiment is over ;)  (that's sure to ruffle some feathers).

When looking over the data that was coming back, 2 responses returned types that have 130+ fields in them. This would have to be represented in a SQL Table, in stored procedures, in a C# class and interface. Looking over 130+ fields, I immediately thought there's no way I'm typing all of that by hand.

A really lazy person (like me) would probably try to use a text editor with regexp find/replace functionality, copy a list of the fields in and run regexp find/replace to format it in the way that I would need it at that given moment. Like, in a list of parameters to a stored procedure, or as a list of parameters to a constructor, or generating properties on a POCO (plain old C# object). I am definitely lazy, but I'm also too lazy to do that each time.

A championship caliber lazy person (like me) would probably write CRUDGEON.  I also don't know why I keep doing it in all caps. "crudgeon" is equally acceptable for those who's caps lock and shift key are just too far away.

So what is it?

Basically, you give it a fake C# list of properties, and it'll generate whatever you need it to. Right now it'll generate:

  1. Database Table
  2. Stored procedures for Get, Update, and Delete
  3. C# objects with appropriate attributes for pulling data from services, like XML attributes, JSON "DataMember" and "DataContract" attributes, and so on.
  4. A convenience "map" script which does the copy from a service object to your POCO, in the case of WSDL objects where you don't want to have the WSDL generated type available to everyone, and hence depended on by anything except what you control (I always do this by the way... never expose WSDL types, they should only be internal. But I digress).

The README.md has a lot of info on using crudgeon. It also has example inputs and outputs within the git repository itself. I wrote it for these specific projects in mind, and the generated C# code has references to things I wrote specifically for these projects, but if I come across any other project that needs to courier data and store it locally, I will definitely be broking open the ol' VS Code again. I wrote sqlrun in conjunction with crudgeon because I needed a quick way to run all those SQL files it was generating. I've used it hundreds of times in the week since. After testing, I'd find that I'd need a new attribute, or a column needs to allow null, or something. And I'd regenerate all of the code, run sqlrun at the SQL that was generated, and begin importing the code again, all withing like 10 seconds of each other.

Maybe you'll find some use for it. I know I definitely will. Like I said, it was definitely written with these sets of projects in mind, but with little modification, maybe they can be used more broadly. Or maybe with no modification! I'll know later when I find an excuse to use it again :)

Happy Coding!

Grunt Work Principle

One word that I've used to describe my work style, but not really my programming style, is "lazy". This can be a word that describes behavior that is often considered detrimental. Non-flattering.

However, in the programming world, it is a very good trait, if employed with other proper attributes. Like, being a bad programmer and lazy isn't good. But being a decent programmer with good work ethic and lazy is actually pretty good!

Lazy definitely doesn't describe my work ethic. Lazy describes how I am when confronted with grunt work. I have been trying to describe it with a principle or some other short definition. Out of pure laziness and lack of creativity in general, I'll call it the "Grunt Work Principle". You can feel free to put my name in front of that. I couldn't be bothered to.

Grunt Work Principle

If the amount of grunt work presented exceeds 1 minute, and the grunt work can be automated, no matter how long the process to automate, it will be automated.

In practice, this will take the form from anything like my last post on Sitecore Doc, to something simple like taking 100 lines of short text and compacting them to 100 characters per line. For that compacttext project motivation in particular, each time I would have to do it would not exceed 1 minute probably, but easily add up all the times I've had to do it and it's in the 3-4 minute range :)  That project probably took me 1 hour total to create but it can be used indefinitely.

There is no upper limit on the amount of time automating will take. There is only the restriction of whether it can be automated. For instance if it requires human interaction or things that aren't so easily determined by a computer alone. Even then I'd probably find a way to automate as much as I can. For instance, with the Sitecore Doc project, I could automate getting items and renderings from Sitecore, generating output, but at the time (and I plan to integrate source-d into my workflow) I could not easily figure out a way to map method calls to renderings. So that part I had to do manually, which was a few hours worth of grunt work. Oh believe me, tracking calls to methods is grunt work when all you are doing is producing documentation!!

Benefits of Automation

Future re-use: The top reason to always automate the current task at hand is for future re-use. You may initially very specifically automate for the task at hand, but then in the future find a way that you can re-use that but with small modifications. Or even a complete rewrite. Or completely as is. This is all fine.

Consistency: Automating will produce consistent results. In my compacttext example, the output is predictable. If I specify 100 line length in that example, the same input will produce the same output 100% of the time. If a human were doing it, there's no guarantee as eyeballing the line length can skewed by things like screen size, font size, caffeine consumed, etc.

It is usually too soon to optimize, but wtf are you waiting for with automation?!  Get to it!!

Sitecore Doc

Sitecore solutions can become large and unweildy. Recently I was tasked with the following: Find out which page each of these service methods (40-50) are called. With how .NET and Sitecore applications (generally all good applications) are written, a service method call would be written within a component, but that component could be put on any page!

Luckily these components manifest themselves in Sitecore as "renderings". They can have data sources and parameters. And code, which is what leads us to this mess in the first place ;)

First we'd need a way to map these renderings to service calls. I came up with a generic "info" data field to do that in a JSON file which defines all of the renderings we're interested in. On a side note, I only provide those that we're interested in, this current project would yield a 4-5 MB result file which would be ridiculous if it included everything. That JSON looks like this:

 

{
    "includeUndefined": false,
    "renderings": [
        {
            "id": "6A8BB729-E186-45E7-A72E-E752FDEC2F48",
            "name": "AccountBalancePurgeSublayout",
            "info": [
                "CustomerInformationService.GetCardStatusV4(authHeader, accountId)",
                "CustomerInformationService.GetPurgeInfo(authHeader, accountID)"
            ]
       }
}

Using my (recently updated to accommodate this request) Go Sitecore API, I was able to take that information and map it against every page renderings (or standard values renderings) and produce a file that is filled with every page and their (eventual) calls into service methods. These aren't directly called within the page code (usually), and there's heavy caching going on as well. Here's what the output looks like:

 

    Name:     booking-calendar
    ID:       f1837270-6aca-4115-94bc-08d1a4ed43ad
    Path:     /sitecore/content/REDACTED/booking-calendar
    Url:      https://www.REDACTED.com/booking-calendar
    Renderings:
            Default
                CalendarBooking   
                    Path:         /layouts/REDACTED2013/REDACTED/SubLayouts/Booking/Calendar/CalendarBooking.ascx
                    Placeholder:  content 
                    Info:
                                  ReservationService.GetAllRoomTypesV2()
                                  ReservationService.GetCashCalendarV3(GetAuthHeader(),promoCode,startDate,endDate,isHearingAccess,isMobilityAccess, isWeb)
                                  ReservationService.GetCashCalendarWithArrivalV3(GetAuthHeader(), promoCode, roomType, arrivalDt, numNights, isWeb)
            Mobile
                CalendarBookingMobile   
                    Path:         /layouts/REDACTED2013/REDACTEDMobile/SubLayouts/Booking/Calendar/CalendarBookingMobile.ascx
                    Placeholder:  content 
                    Info:
                                  ReservationService.GetAllRoomTypesV2()
                                  ReservationService.GetCashCalendarV3(GetAuthHeader(),promoCode,startDate,endDate,isHearingAccess,isMobilityAccess, isWeb)
                                  ReservationService.GetCashCalendarWithArrivalV3(GetAuthHeader(), promoCode, roomType, arrivalDt, numNights, isWeb)

This was very useful for this specific task, however it's written in a way that will be very useful going forward, to provide insights into our Sitecore implementations and how the content is structured.

This app will see updates (sorry the code isn't available for now) so that it will show usages among different renderings, unused ones or broken (exists in a renderings field but not as an actual item in sitecore [was deleted or not imported]), and other stuff that I can think of. This binary is named "scdoc" as I like to keep my names short :)  The Sitecore Code Generation tool I wrote is simply "scgen".

Check out that Go Sitecore API though if you want to easily query your Sitecore database!  Happy Coding :)

Goals for the Summer

Goals for the summer... I've written a few of these in my life, it'd be nice to not have to do them again.

#1 - Code generator that generates code generators.
#2 - ORM which, based on inputs, will map the appropriate choice of ORM to my current needs. An ORM Mapper.

https://twitter.com/jasontconnell/status/989965266141569025

More null than null

// no value which is more null than null in this case.

A comment in my code just now. I highly enjoy implying degrees of things which have no degrees. Binaries. It is now more perfect code with this comment in there.

https://twitter.com/jasontconnell/status/989926649465638912

Statistically Hilarious

"Trying to determine a good base average for what I spend every month. But every month is an outlier"

Statistically speaking, that joke is hilarious.

https://twitter.com/jasontconnell/status/989237488651767808

My nail maintenance by necessity

my nails would not be considered "long" by any stretch, but once they get to a certain point, like millimeters, i basically can't type anymore

https://twitter.com/jasontconnell/status/988480972722200576

Royal Family News

Is there a way to turn off Royal family news? I managed to go years without knowing anything and I'd like to keep it that way

https://twitter.com/jasontconnell/status/988394646823895040