Advent of Code 2017 - Day 13

Day 13 reveals itself as a sort of lock picking exercise. Part one is a simple tumble (get it) through the series of inputs they give you, to figure out if you would be caught on a certain layer, and if so, do some multiplication to get your answer. Simple.

The puzzle could also be thought of as the scene in The Rock (the movie about Alcatraz with Nicholas Cage and Sean Connery), where, to get into the prison, Sean Connery has memorized the timing for the fires that blaze, and rolls through them unscathed.

Except, the timings way down the line don't match up since they themselves are on their own timers. And there's like 40+ of them.

The sample input looks like this:

0: 3
1: 2
4: 4
6: 4

So, layer 0 has a depth 3. So on "second 0" the scanner is at 0, on second 1 it's at 1, on second 2 it's at 2, on second 3 it goes back to 1, and on second 5 it's back to the start, blocking any would-be attackers.

Layer 1 only has depth 2, so it's fluctuating back and forth between 0 and 1.

Since the puzzle input may include gaps, and it's easier (probably) to complete the puzzle with no gaps, the first step is to fill them in! As usual I'm writing my answers in Go

func fill(scanners []*Scanner) []*Scanner {
	max := scanners[len(scanners)-1].Depth
	for i := 0; i < max; i++ {
		if scanners[i].Depth != i {
			s := &Scanner{Depth: i, Range: 0, Nil: true}
			scanners = append(scanners[:i], append([]*Scanner{s}, scanners[i:]...)...)
		}
	}
	return scanners
}

That line   ------    append(scanners[:i], append([]*Scanner{s}, scanners[i:]...)...)  ---- with all the dots!! What it's doing is pretty simple, though.

If we don't have a scanner at depth i, insert a scanner with depth i at the current position.   "scanners[:i]" is all scanners up to i.  "scanners[i:]" is all scanners after and including i  ( the :   (colon) syntax is very subtle). So we want to insert it between those two. That's all it's doing. The ellipsis confusion is just because "append" takes a variadic list of parameters, and you can convert an array to variadic parameter with the ellipsis. Done!

We'll need a method to move all of the scanners every step. That's pretty straightforward. I called this method "tick". The Scanner is just a struct with Depth, Range, Current, and Dir for telling which direction the thing is moving.

func tick(scanners []*Scanner) {
	for _, s := range scanners {
		if s.Nil {
			continue
		}

		if s.Current == s.Range-1 {
			s.Dir = -1
		} else if s.Current == 0 {
			s.Dir = 1
		}

		s.Current += s.Dir
	}
}

Part 1 was to just send a "packet" (or Sean Connery) through and every time you are at a security scanner (explosion, going back to the movie), multiply the depth times the range at that scanner, and add it to the previous number to get the new number. That part was fine, and you could do it with the physical motion of passing the Sean Connery through :)

So, to figure out part 2, which is "you need to get through without getting caught this time".  Getting caught is simply being at Depth "d" when the scanner at "d"'s current position is 0. You could brute force this.

For brute force, you'd start at delay 0, then try go figure out if you can make it all the way through. If not, move to delay 1 and try again. For each delay, you have to run the tick method. For delaying 100 seconds, tick would have to be run 100 times to get the puzzle into the correct state. So this becomes computationally intense!

This is a fine solution in most instances. In this instance, though, I let it run over lunch and checked in with it 44 minutes later, and it wasn't complete yet! So, back to the drawing board.

But wait!!  Math is a thing. And it's very useful!  I'm actually pretty certain that I don't even need to check the final answer by actually traversing the sequence of layers, it's just the answer. Due to math!

So, to get through safely, the position of a particular depth has to not be 0 when we're passing through it. I wrote a method to figure this out, called "possible". It's pretty much the key to the puzzle, and solving it insanely fast.

func possible(scanners []*Scanner, delay int) bool {
	p := true
	for _, s := range scanners {
		blocking := (s.Range*2 - 2)
		p = p && (s.Nil || ((delay+s.Depth)%blocking != 0))
	}
	return p
}

A "Nil" scanner is a scanner that was filled in due to gaps. This one has 0 range and can't block anything. So if it's one of these, it can pass through this depth at this time.

The (s.Range * 2) - 2.  Call this "blocking" or whatever. I called it blocking since it's at position 0 of its range every "blocking" number of steps. A scanner with range 4 is back to 0 every 6 steps (4 * 2 - 2) To determine if it's possible at this delay, a layer 7 steps in cannot be blocking on delay + 7. Otherwise it gets caught.  (delay + depth) % blocking.   (After delay, scanner at depth "depth" has to not be at the start ( mod blocking == 0) ).  "p" is kept, for each step, if we can pass through up until now and the current layer. You could possibly shave off some iterations here by checking p, and if it's false, break out of the loop. I might actually update it to do that and report back! It takes about 1 second to run currently. (CONFIRMED. it runs in about half a second now).

So, all that's left is to still brute force the numbers to find if it's possible to get through the sequence at the current delay, without getting caught, but you don't have to actually do it, which speeds it up somewhere on the order of a million percent :)

Check out the full solution here - https://github.com/jasontconnell/advent/blob/master/2017/13.go

Happy coding, and happy holidays!!

Some Plays on Words and Phrases

Sometimes hilarious things pop in my head. Actually, I'd wager that sometimes that doesn't happen. Implying it happens most of the time. Here are some new ones (for me) for you potential comedy writers out there. I don't need any public credit :P  Maybe just a comment or email.

"Do I look like somebody who " ... You know this. "Do I look like someone who checks the toilet seat before they sit down?"  Or "Who cares what color their shirt is?"  etc. Here are a few fun ones I came up with.

"Do I look like somebody who looks like somebody?"   Yeah, pretty bad.

"Do I look like somebody who asks to be compared to anybody by their looks?"  A little better.

"Do I look like somebody who looks at things?"  Might be said by a blind person, so after my original inspiration for this, the ingenuity has been lost and it has been discounted back to lacklustre.

That's all I got though. Any more would be forcing it. You might argue the first 3 are as well ;)

Go Dep

As of this moment I've updated all of my Go code to use "dep". It's useful, and it works. It's slow but that'll be fixed. Phew! Slight pain updating all of my repos to it. But man, it'll make life easy going forward.

First of all, I had to get all of my dependencies, code libraries that I've written to be shared, into my public github account. After that was complete, I had to update all of the imports. For instance, instead of "conf", which was a folder directly inside my Go path. Which makes it interesting. For a few reasons.

  1. I had to create a repo (in my private server) for each dependency.
  2. If I didn't clone that repo on another machine, the build would fail.
  3. If I forgot to commit the dependency repo but committed the repo that was depending on it, the build would fail on my other computer

These are problems. Here are a few solutions...

  1. For dependency repos, I may only have them in github going forward. Until it's easy to go get private server repositories. All of my repos are literally on the same server this site is running on.
  2. Doing dep ensure will get all of my dependencies, at the version it was locked at.
  3. Using dep and the dep paths (github.com/jasontconnell/____) willl ensure that the project will only build if it's in github.

You'll see all of my dependency repos within github. There are a few full fledged projects out there as well. (Including scgen :).  It is just a matter of updating the code to use the github url instead of the local file url, running dep init, and waiting :)

One tricky thing is I have a bash script to automatically deploy websites on that server. It looked like this:

#!/bin/bash

cd src/$1
git fetch

deploybranch=`git branch -a | grep deploy`

if [ -z "$deploybranch" ]; then
   echo no branch named deploy. exiting.
   exit
fi

git checkout deploy
git pull
cd ..

echo building $1

go build -o bin/$1 $1

outdir=/var/www/$1

echo $outdir

PID=`pgrep $1`

echo found pid $PID
if [ -n "$PID" ]; then
    echo killing process $PID
    sudo kill  $PID
fi

sudo cp bin/$1 $outdir/

if [ -d "$PWD/$1/content" ]; then
    echo copying content
    sudo cp -R $PWD/$1/content/ $outdir/content
fi

if [ -d "$PWD/$1/site" ]; then
   echo copying site
   sudo cp -R $PWD/$1/site/ $outdir/site
fi

cd $outdir

sudo nohup ./$1 > $1.out 2> $1.err < /dev/null & > /dev/null

echo $1 started with pid $!

exit

I'm very noobish when it comes to shell scripting. Anyway, this will checkout a deploy branch if it exists, pull latest, run go build, kill the current process and then start the process. It'll then copy contents over to the website. Simple build and deploy script.

It is named "deploy.sh". It exists in /home/jason/go and it is run just like this, "./deploy.sh jtccom"  It finds the folder "jtccom" inside of src and does all of the operations there. However, since I'm now using "dep", and none of the files exist within the "vendor" folder (you really shouldn't commit that... dep creates reproducible builds), I will have to modify it to run dep first. This has to happen after the pull. I've included the entire contents of the new deploy.sh here.

#!/bin/bash

cd src/$1
git fetch

deploybranch=`git branch -a | grep deploy`

if [ -z "$deploybranch" ]; then
   echo no branch named deploy. exiting.
   exit
fi

git checkout deploy
git pull

if [ -f Gopkg.toml ]; then
   echo Running dep ensure
   dep=`which dep`
   $dep ensure
fi

cd ..

echo building $1

GOCMD=`which go`
$GOCMD build -o bin/$1 $1

outdir=/var/www/$1

echo $outdir

PID=`pgrep $1`

echo found pid $PID
if [ -n "$PID" ]; then
    echo killing process $PID
    sudo kill  $PID
fi

sudo cp bin/$1 $outdir/

if [ -d "$PWD/$1/content" ]; then
    echo copying content
    sudo cp -R $PWD/$1/content/ $outdir/content
fi

if [ -d "$PWD/$1/site" ]; then
   echo copying site
   sudo cp -R $PWD/$1/site/ $outdir/site
fi

cd $outdir

sudo nohup ./$1 > $1.out 2> $1.err < /dev/null & > /dev/null

echo $1 started with pid $!

exit

I've updated how it calls go and dep, since calling just "go" didn't work anymore for some reason. Command not found. Anyway, here's the output.

[jason@Setzer go]$ ./deploy.sh jtccom
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ssh://myserver/~git/jtccom
   03999eb..7c49dc6  deploy     -> origin/deploy
   03999eb..7c49dc6  develop    -> origin/develop
Already on 'deploy'
Your branch is behind 'origin/deploy' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
Updating 03999eb..7c49dc6
Fast-forward
 Gopkg.lock | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Running dep ensure
building jtccom
/var/www/jtccom
found pid 29767
killing process 29767
copying content
jtccom started with pid 30260

That Gopkg.lock was updated because I had to update a dependency to also use the github version of the dependency it was importing, since I deleted all dependencies on this server. So that was it. It's very easy to use and will make my life a lot easier, albeit a little bit more tedious. BUT! I really can't complain because the old way of doing things was painful. Forgetting to commit the dependency, now my code doesn't build on my work computer, so I have to wait until I get home :P  Plus everyone can look at the little dumb Go code I use across multiple projects!  Enjoy.

Go and Sitecore Interlude

This is part of the same program I'm developing to generate, serialize and deserialize items, but it's a general helper method that I found very useful, and can be used in any Go program. It can be expanded to be more complete, I pretty much did it for things that I am currently working with. You'll see what I mean.

The application is written as one application that does it all (generation, serialization, deserialization). So, it's looking for configuration settings for all aspects of those pieces of functionality. You generally don't want to serialize the same Sitecore paths that you want to generate code against. However, having the configuration in one file is not what I wanted. Here are the drawbacks.

If the configuration is in one file, you would have to update your configuration file in consecutive runs if you wanted to serialize, git fetch and merge, then deserialize. Your configuration would be committed and would be set for the next person that wants to run the program. You couldn't write bat files to run updates.

You could use the flag package to control the program pieces. Of course. But I set out to have multiple configs. For instance, if you wanted to serialize from a shared database then serialize to your local database. You could also make the config file a flag and set it to different huge files that each only differ by the connection string.

You could.

But then I wouldn't have this cool piece of code :)

Basically, when you run the program, you call it with a "-c" flag which takes a csv list of config files. The program reads them in order and merges them, having configuration values later in the chain overwrite values in the previous versions. I do this using Go's reflect package. As follows:

func Join(destination interface{}, source interface{}) interface{} {
    if source == destination {
        return destination 
    }
    td := reflect.TypeOf(destination)
    ts := reflect.TypeOf(source)

    if td != ts || td.Kind() != reflect.Ptr {
        panic("Can't join different types OR non pointers")
    }

    tdValue := reflect.ValueOf(destination)
    tsValue := reflect.ValueOf(source)


    for i := 0; i < td.Elem().NumField(); i++ {
        fSource := tsValue.Elem().Field(i)
        fDest := tdValue.Elem().Field(i)

        if fDest.CanSet(){
            switch fSource.Kind() {
                case reflect.Int:
                    if fDest.Int() == 0 {
                        fDest.SetInt(fSource.Int())
                    }
                case reflect.Bool: 
                    if fDest.Bool() == false {
                        fDest.SetBool(fSource.Bool())    
                    }
                case reflect.String: 
                    if fDest.String() == "" && fSource.String() != "" {
                        fDest.SetString(fSource.String())
                    }
                case reflect.Slice:
                    fDest.Set(reflect.AppendSlice(fDest, fSource))
                case reflect.Map:
                    if fDest.IsNil(){
                        fDest.Set(reflect.MakeMap(fDest.Type()))
                    }
                    for _, key := range fSource.MapKeys() {
                        fDest.SetMapIndex(key, fSource.MapIndex(key))
                    }
                default:
                    fmt.Println(fSource.Kind())
            }
        } else {
            fmt.Println("Can't set", tdValue.Field(i))
        }
    }

    return destination
}

So, you can see what I mean when I said it can be expanded. I'm only doing strings, bools, ints, slices and maps. The slice handling is different in that it adds values to the current slice. Map handling will add entries or overwrite if the key exists. Strings will only overwrite if the existing string is blank and the source isn't blank. So that's probably different from how I described the code in the beginning :)

Go is very useful. There's like, nothing you can't do :)

So the program is called like this:

scgen -c scgen.json,project.json,serialize.json

scgen.json will have the template ids for "template" and "template field", stuff that's pretty ok if it's hard coded. If sitecore were to change those template IDs, I'm fairly positive there's a lot of existing code out there that will break.

project.json has the connection string, the field type map, serialization path (since it's used for serialization and deserialization), and base paths for serialization.

serialize.json, in this instance, only has { "serialize" : true }  as its entire contents. Files like "generate.json" have "generate": true  as well as the file mode, output path, the Go text template, and template paths to generate.

So these files can be combined in this way to build up an entire configuration. The bools like "serialize" and "generate" are used to control program execution. The settings can be set in separate files, different files can be set and used depending on the environment, like a continuous integration server, or in a project pre-build execution. I foresee this being used with bat files. Create a "generate.bat" file which calls with generate.json in the config paths, etc for each program mode. Or a bat file to serialize, git commit, git pull, and deserialize. Enjoy!

Go and Sitecore, Part 3

In parts 1 and 2, so far, we've covered code generation with Go against a Sitecore template tree, and serializing items from the database to disk. Part 3 takes that serialized form and updates the database with items and fields that are missing or different, then clears out any items or fields that were orphaned in the process.

It probably doesn't do the clearing orphaned fields completely correctly, as I will only clear fields where the item doesn't exist anymore. It won't clear fields that no longer belong to the new template if the item's template changed. That'll probably be an easy change though, as it could probably be done with a single (albeit, advanced) query.

Deserializing involves the following steps.

  1. Load all items (already done at the beginning every time the program runs)
  2. Load all field values. This happens if you are serializing or deserializing.
  3. Read the contents from disk, mapping serialized items with items in the database.
  4. Compare items and fields.
    1. If an item exists on disk but not in the database, it needs an insert
    2. If an item exists on the database but not on disk, it needs a delete (and all fields and children and children's fields, all the way down its lineage)
    3. #2 works if an item was moved because delete happens after moves.
    4. Do the same thing for fields... update if it changed, delete if in the db but not on disk, insert if on disk but not in the db.
  5. This can, in some cases, cause thousands of inserts or updates, so we'll do batch updates concurrently.

Deserialization code just involves 2 regular expressions, and filepath.Walk to get all serialized files. Read the files, build the list, map them to items where applicable, decide whether to insert / update / delete / ignore, and pass the whole list of updates to the data access layer to run the updates.

I love the path and filepath packages. Here's my filepath.Walk method.

func getItemsForDeserialization(cfg conf.Configuration) []data.DeserializedItem {
	list := []data.DeserializedItem{}
	filepath.Walk(cfg.SerializationPath, func(path string, info os.FileInfo, err error) error {
		if strings.HasSuffix(path, "."+cfg.SerializationExtension) {
			bytes, _ := ioutil.ReadFile(path)
			contents := string(bytes)
			if itemmatches := itemregex.FindAllStringSubmatch(contents, -1); len(itemmatches) == 1 {
				m := itemmatches[0]
				id := m[1]
				name := m[2]
				template := m[3]
				parent := m[4]
				master := m[5]

				item := data.DeserializedItem{ID: id, TemplateID: template, ParentID: parent, Name: name, MasterID: master, Fields: []data.DeserializedField{}}

				if fieldmatches := fieldregex.FindAllStringSubmatch(contents, -1); len(fieldmatches) > 0 {
					for _, m := range fieldmatches {
						id := m[1]
						name := m[2]
						version, _ := strconv.ParseInt(m[3], 10, 64)
						language := m[4]
						source := m[5]
						value := m[6]

						item.Fields = append(item.Fields, data.DeserializedField{ID: id, Name: name, Version: version, Language: language, Source: source, Value: value})
					}
				}
				list = append(list, item)
			}
		}

		return nil
	})

	return list
}

I did a quick and crude "kick off a bunch of update processes to cut the time down" method.

func update(cfg conf.Configuration, items []data.UpdateItem, fields []data.UpdateField) int64 {
	var updated int64 = 0
	var wg sync.WaitGroup
	wg.Add(6)
	itemGroupSize := len(items)/2 + 1
	fieldGroupSize := len(fields)/4 + 1

	// items - 2 processes
	for i := 0; i < 2; i++ {
		grp := items[i*itemGroupSize : (i+1)*itemGroupSize]
		go func() {
			updated += updateItems(cfg, grp)
			wg.Done()
		}()
	}

	// fields - 4 processes
	for i := 0; i < 4; i++ {
		grp := fields[i*fieldGroupSize : (i+1)*fieldGroupSize]
		go func() {
			updated += updateFields(cfg, grp)
			wg.Done()
		}()
	}

	wg.Wait()

	return updated
}

Very unclever. Take all of the update items and fields, break them into a set number of chunks, kick off six processes, allocating twice as many for fields than for items. Each call to the respective update methods opens its own connection to SQL Server. This can be done much better but it does accomplish what I set out to accomplish. Utilize Go's coroutines (goroutines) and where something can be done concurrently, do it concurrently to try to cut down the time required. This is the only process that uses Go's concurrent constructs.

That's it for part 3!  Part 4 will come more quickly than part 3 did. I had some things going on, a year anniversary with my girlfriend, lots of stuff :)

Series:
Part 1 - Generation
Part 2 - Serialization
Part 3 - Deserialization

Go and Sitecore, Part 2

In part 1, I covered how I'm now generating code from Sitecore templates, to a limited degree. I won't share the whole process and the whole program until the end, but just going over touch points until then.

For part 2, we'll cover Sitecore serialization. For the terminology, I'm not sure what TDS or other similar tools would refer to them as, but I will refer to these acts as serialization (writing Sitecore contents to disk) and deserialization (reading Sitecore contents from disk and writing to the database)

For Sitecore serialization, I would say step 1 is to decide which fields you DON'T want to bring over. In the past, I've had loads of issues with serializing things like Workflow state. And locks. So my approach is to ignore the existence of certain fields. Essentially, find out all of the fields on "Standard template", and decide which ones are essential or useful. Remove those from a global list of "ignored fields" list. Then get your data. For the data, from part 1 we use the same tree of items. When we build the tree, it gets a root node tree and an item map  (map[string]*data.Item). For serialization we need the item map. The root is only useful for building paths, after that we could most likely toss it. With the item map in hand, and a list of ignored fields, we can get the data.


        with FieldValues (ValueID, ItemID, FieldID, Value, Version, Language, Source)
        as
        (
            select
                ID, ItemId, FieldId, Value, 1, 'en', 'SharedFields'
            from SharedFields
            union
            select
                ID, ItemId, FieldId, Value, Version, Language, 'VersionedFields'
            from VersionedFields
            union
            select
                ID, ItemId, FieldId, Value, 1, Language, 'UnversionedFields'
            from UnversionedFields
        )

        select cast(fv.ValueID as varchar(100)) as ValueID, cast(fv.ItemID as varchar(100)) as ItemID, f.Name as FieldName, cast(fv.FieldID as varchar(100)) as FieldID, fv.Value, fv.Version, fv.Language, fv.Source
                from
                    FieldValues fv
                        join Items f
                            on fv.FieldID = f.ID
                where
                    f.Name not in (%[1]v)
                order by f.Name;
    

With SQL Server, we're able to do common table expressions (CTEs) which makes this a single query and pretty easy to read. We're getting all field values except for those ignored. We get version and language no matter what, and we get the source, which table the value comes from. ValueID is just the Fields table ID which could be useful as a unique identifier, but it's not actually used right now.  We simply pull all of these values into another list of serialize items, matching their ItemID with the item map to produce a new "serialized item" type, which will be serialized. SerializedItem only has a pointer to the Item, and a list of field values. Field values have Field ID and Name, the Value, the version, the language, and the source (VersionedFields, UnversionedFields, SharedFields).

The item map is also trimmed down to items in paths that you specify, so you're not writing the entire tree. In SQL Server with the current database (12K items), the field value query with no field name filter takes 3 seconds and returns 190K values. That's a bit high for my liking, but when you're dealing with loads of data you have to be accepting of some longer load times.

The serialized file format is hard coded, versus being a text template. However I feel I could do the text template since I've found out how to remove surrounding whitespace (e.g.  {{- end }}, that left hyphen says remove whitespace to the left). However, putting it in a text template, as with code generation, implies that the format can be configured. But, this needs to be able to be read back in through deserialization, so should be less configurable, 100% predictable.

func serializeItems(cfg conf.Configuration, list []*data.SerializedItem) error {
	os.RemoveAll(cfg.SerializationPath)
	sepstart := "__VALUESTART__"
	sepend := "___VALUEEND___"

	for _, item := range list {
		path := item.Item.Path
		path = strings.Replace(path, "/", "\\", -1)
		dir := filepath.Join(cfg.SerializationPath, path)

		if err := os.MkdirAll(dir, os.ModePerm); err == nil {
			d := fmt.Sprintf("ID: %v\r\nName: %v\r\nTemplateID: %v\r\nParentID: %v\r\nMasterID: %v\r\n\r\n", item.Item.ID, item.Item.Name, item.Item.TemplateID, item.Item.ParentID, item.Item.MasterID)
			for _, f := range item.Fields {
				d += fmt.Sprintf("__FIELD__\r\nID: %v\r\nName: %v\r\nVersion: %v\r\nLanguage: %v\r\nSource: %v\r\n%v\r\n%v\r\n%v\r\n\r\n", f.FieldID, f.Name, f.Version, f.Language, f.Source, sepstart, f.Value, sepend)
			}

			filename := filepath.Join(dir, item.Item.ID+"."+cfg.SerializationExtension)
			ioutil.WriteFile(filename, []byte(d), os.ModePerm)
		}
	}

	return nil
}

If you've looked into the TDS file format, you've noticed it adds the length of the value so that parsing the field value is "easier???" or something. However, it makes for git conflicts on occasion. Additionally, you can't just go in there and update the text and deserialize it.  For instance, if you had to bulk update a path that would end up in the value for each item, like a domain name or url in an external link field which is the value for many fields, with the TDS method you can't just do a find replace (unless the length of the value doesn't change!). Without the length you could find/replace across the whole path of serialized objects. There are other future benefits to this. Imagine you need to generate a tree but you don't want to use Sitecore API. You could generate this file structure and have it deserialize to Sitecore. The length doesn't help that scenario though, it just makes it a tiny less painful.

The idea for this was first, "common sense", but second, it's been working for HTTP and form posts for YEARS!! HTTP multipart forms just use the boundary property. My boundary isn't dynamic, it's just a marker. If that text were to show up in a Sitecore field, this program doesn't work. Most likely I'd replace underscores with some other value. I could generate a boundary at the start of serialization, and put it in a file in the root of serialization, like ".sersettings" with "boundary: __FIELDVALUE90210__" which would be determined at the start of serialization to be unique and having no occurrences in sitecore field values. Anyway, I've gone on too long about this :)

Also, the path and path/filepath packages in Go are the best. So helpful.

In this format, here is what the "sitecore" root node looks like serialized.

ID: 11111111-1111-1111-1111-111111111111
Name: sitecore
TemplateID: C6576836-910C-4A3D-BA03-C277DBD3B827
ParentID: 00000000-0000-0000-0000-000000000000
MasterID: 00000000-0000-0000-0000-000000000000

__FIELD__
ID: 56776EDF-261C-4ABC-9FE7-70C618795239
Name: __Help link
Version: 1
Language: en
Source: SharedFields
__VALUESTART__

___VALUEEND___

__FIELD__
ID: 577F1689-7DE4-4AD2-A15F-7FDC1759285F
Name: __Long description
Version: 1
Language: en
Source: UnversionedFields
__VALUESTART__
This is the root of the Sitecore content tree.
___VALUEEND___

__FIELD__
ID: 9541E67D-CE8C-4225-803D-33F7F29F09EF
Name: __Short description
Version: 1
Language: en
Source: UnversionedFields
__VALUESTART__
This is the root of the Sitecore content tree.
___VALUEEND___

In part 3, we'll be looking into deserializing these items.

Series:

Series:
Part 1 - Generation
Part 2 - Serialization
Part 3 - Deserialization

Go and Sitecore, Part 1

I will use this post to sell you on the merits of Go, since I am in love with it. :)  In our dev shop, a C# MVC.NET etc shop, we've been using Hedgehog's Team Development for Sitecore. While this product does what it says it does, it's a few things. Slow, expensive, difficult to learn, does way too much, requires a website to be running, "clunky" (official term), and a few other things that make it undesirable. Fidgetty, if that's a word. Sometimes unreliable. My boss has decided to try to head away from that direction and that product.

However. There are a few good things that it does provide. They are useful if the rest of the product has its flaws. (Of course there are other products out there as well, like Unicorn). Those features that I find most useful as a developer, and a developer on a team (hence the T in TDS), are

  1. Code generation from templates
  2. Serializing Sitecore content (templates, layouts, and other items) and committing those to git.
  3. Deserializing Sitecore content which others have committed to git  (or which you've serialized earlier and messed up)

Those features, if they could be separated out, are desirable to keep around.

Over the past month or two, I've had to do a lot of work dealing directly with the sitecore database. You have Items, VersionedFields, UnversionedFields, and SharedFields. That's your data. Unless you are worried about the Media Library, then you might need to deal with the Blobs table. I haven't researched that fully, so I'm not sure of which tables are required for media. So I felt comfortable getting in there and giving code generation a shot with my new-found knowledge of the Sitecore table. Now I know them even more intimately.

Go and Sitecore

Go and Sitecore are a natural fit. First thing you need is the sql server library from "github.com/denisenkom/go-mssqldb". That thing works great. You just have to change your "data source" parameter to "server" in your connection string. Very mild annoyance :) In building this thing, it's probably best to just select all items. Items table is a small set of data. The database I'm working on is 12,000+ items, which are all returned in milliseconds.

Select * from Items. In this case, though, I did a left joint to the SharedFields table two times, one to get the base templates (__Base templates) and one to get the field type (Type). If they are a Template, they'll have base templates sometimes, if they are a field, they'll have a field type. I just hard coded those field ids in there for now.

select 
            cast(Items.ID as varchar(100)) ID, Name, replace(replace(Name, ' ', ''), '-', '') as NameNoSpaces, cast(TemplateID as varchar(100)) TemplateID, cast(ParentID as varchar(100)) ParentID, cast(MasterID as varchar(100)) as MasterID, Items.Created, Items.Updated, isnull(sf.Value, '') as Type, isnull(Replace(Replace(b.Value, '}',''), '{', ''), '') as BaseTemplates
        from
            Items
                left join SharedFields sf
                    on Items.ID = sf.ItemId
                        and sf.FieldId = 'AB162CC0-DC80-4ABF-8871-998EE5D7BA32'
                left join SharedFields b
                    on Items.ID = b.ItemID
                        and b.FieldId = '12C33F3F-86C5-43A5-AEB4-5598CEC45116'

The next part is to use that item data to rebuild the sitecore tree in memory.


func buildTree(items []*data.Item) (root *data.Item, itemMap map[string]*data.Item, err error) {
	itemMap = make(map[string]*data.Item)
	for _, item := range items {
		itemMap[item.ID] = item
	}

	root = nil
	for _, item := range itemMap {
		if p, ok := itemMap[item.ParentID]; ok {
			p.Children = append(p.Children, item)
			item.Parent = p
		} else if item.ParentID == "00000000-0000-0000-0000-000000000000" {
			root = item
		}
	}

	if root != nil {
		root.Path = "/" + root.Name
		assignPaths(root)
		assignBaseTemplates(itemMap)
	}
	return root, itemMap, nil
}

Since I select all Items, I know the "sitecore" item will be in there, with ParentID equal to all 0s. This is the root. After the tree is built, I assign paths based on the tree. And then assign all of the base templates. Base templates are of course crucial if you're generating code. You will want to provide implemented interfaces to each interface when you generate Glass Mapper interfaces, for instance.

assignPaths of course couldn't be any easier. For each item in the children, set the path equal to the root path + / + the item name. Then recursively set it for all of that item's children. That process takes no time in Go, even across 12000 items. Now your tree is built.

The path is important for determining a namespace for your interfaces. If your path is at /sitecore/templates/User Defined/MySite and there's a template under MySite with the relative path of "Components/CalendarData", you'd want the namespace to be some configured base namespace  (Site.Data) plus the relative path to that template, to get something like Site.Data.Components   and then your class or interface would be e.g. ICalendarData.

So after you determine all of that for every item, you just have to filter the itemMap by templates and their children (template section, template field), create a list of templates  ( Template struct is another type in my Go code, to better map templates, namespaces, fields, field types, etc), and run that list of templates against a "text/template" Template in Go. It's only a list of templates at this time, vs keeping the hierarchy, because you don't need hierarchy once you've determined the path. And Templates just have fields, once you break it down. They aren't template sections and then fields. They are just Template, Field, and Base Templates.

My text template is this:

{{ range $index, $template := .Templates }}
namespace {{ .Namespace }} {
    public class {{ $template.Item.CleanName }}Constants {
        public const string TemplateIdString = "{{ $template.Item.ID }}";
        public static readonly ID TemplateId = new ID(TemplateIdString);
    }

    [SitecoreType(TemplateId={{ $template.Item.CleanName }}Constants.TemplateIdString)]
    public partial interface I{{ $template.Item.CleanName }} : ISitecoreItem{{ if (len $template.BaseTemplates) gt 0 }}{{ range $bindex, $baseTemplate := $template.BaseTemplates }}, global::{{ $baseTemplate.Namespace}}.I{{ $baseTemplate.Item.CleanName}}{{end}}{{end}} {
        {{ if (len $template.Fields) gt 0 }}
        {{ range $findex, $field := $template.Fields }}
            //{{$field.Item.ID}}
            [SitecoreField("{{$field.Name}}")]
            {{ $field.CodeType}} {{ $field.CleanName }}{{ $field.Suffix }} { get; set; }{{end}}{{end}}
    }
}{{end}}

A few other notes. In configuration, I provide a CodeType mapping. For each field type, you can specify what the type of that property should be. This is of course useful if you have a custom type for "Treelist" that you want to use, and also if you're generating code that's not in C#. I didn't want to hard code those values. Also useful if you provide custom field types in Sitecore.

Field suffix is for something like, if you have a Droptree, you can select one item, and that value would be a uniqueidentifier in the database. This is represented in C# as a "Guid" type. For these fields, I like to add an "ID" to the end of the property name to signify this. Then in the accompanying partial class that isn't overwritten every time you generate, provide a field for the actual item that it would reference, like  IReferencedItem ReferencedItem {get;set;}  where the generated code would yield a Guid ReferencedItemID {get;set;}  You get the picture ;)

For each field, template, and base template, you have a pointer to the Item from which it was created, so you have access to all of the item data. Like CleanName, ID, Created and Update times if you need them for some reason. The path. Everything.

The best part of this tool is that it does all of those things that I require. Code generation, serialization, and deserialization. And it does it in a neat way. Currently I'm not doing any concurrency so it might take 20 seconds to do all of it, but that's still way faster than TDS!

This is Part 1 of this small series. In the future I'll go over how I did, and improved, item serialization and deserialization, as well as some neat things I did to accomplish not running all of those processes each time you run the tool. So you can run it to just generate, or just deserialize, or generate and serialize. Not many permutations on that, but the important thing is you don't have to wait 8 seconds for it to also serialize if all you want to do is generate, which takes less than a second. 800 ms to be exact.

800 ms!!

Tune in for Part 2 soon! Serialization.

Series:

Series:
Part 1 - Generation
Part 2 - Serialization
Part 3 - Deserialization

Advent of Code 2016

 I've been doing the Advent of Code 2016 this year, I've been a little late to the game due to a week off of doing anything with computers and generally just not being very attentive to it.

There have been a few favorites. Day 11, 13 and 17 with their tree searching have been a good refresher. Day 15 literally took me longer to understand the problem than to code it up.

Day 19 took me through a few different paths. Day 19 is the White Elephant with 3 million elves. The elf takes the presents from the elf to their left and the next elf that has presents goes next. And it repeats until one elf has all the presents. Part 2, which you find out later, is that it changes it to take the presents from the elf across from them.

The first attempt at this solution was an array of elves. 3 million odd elves. Looping through would be fast but finding the next elf who has presents would be looping a ton. Solving the 5 elf example worked but moving it to 3 million+ pointed out how terribly inefficient this would be.

So. Linked list to the rescue  (I have always said "to the rescue" when picking a tool to solve the job, but ever since seeing "The Martian", now my internal voice is replaced by Matt Damon's, when he determines that hexadecimal is the answer to his problems).

For part one, just create 3 million elves and have a pointer to the next elf. The last elf's next elf is the first elf. Done.

for elf.Next != elf

      elf.Presents += elf.Next.Presents

       elf.Next = elf.Next.Next

      elf = elf.Next // move to the next elf

This solves it very quickly. But then the bombshell of part 2 comes in. Find the elf across from the current elf in a Linked List?! What?

First attempt was for each elf, use a counter to determine the one across. So every time an elf is removed from the circle, decrement the counter. To determine the across elf, loop to across.Next (count / 2) times. This works. But then it proved very slow as well. Each time you're looping it loops 1.5 million-ish times (half). This proved to be correct though, with my test with the five elves.

Then I remembered the ability to just have as many pointers as you want! This is computers, you can do whatever you want. There are no rules :)   So, instead I created an "across" pointer and initialized it with my count/2 loop. This would be the only time it would need to loop. Just determine it's starting location.

I would still have to keep a count though. The count was useful for the case where, for instance in the 5 elf circle...  1 takes presents from 3, but there are 2 across from 1, so you choose the left side. However, removing 3 then, and moving to 2, 2 is across from 5, but if we were to just increment the across elf to across.Next, it would point to 4. So if there are an even number of elves left, add another move to across.Next after setting it to across.Next (move it two forward in the list).  This proved to work and was very fast :)

Check the solution here - https://github.com/jasontconnell/advent/blob/develop/2016/19.go

Go CrawlFarm Code

I have added the code to Github

Please check it out. I will be making bug fixes and updating code to be more consistent and better. It's a pretty awesome project that I'm pretty proud of :)

Key Golang features, if you want to view it for example code:

  1. Gob
  2. Concurrent For / Select on Channels
  3. Logging using log package
  4. net.Conn, Server and Client, Dial and Accept
  5. sync.Mutex
  6. encoding/json
  7. time.NewTicker
  8. Synchronous and Asynchronous channels
  9. net/http.Client, CheckRedirect
  10. Regex to find href/src attributes

Feel free to fork and create pull requests :)

New Go Project: Distributed Crawler

In work, we created a site for a client. While the development work was pretty straightforward (although I did do a few neat things), it was very, extremely content heavy. We're talking 5000 or more pages. While the development also was pretty straightforward, it was ongoing until the end, and doing that while content entry is going on proves to be a bit of a challenge. If the way something works changes, and content is not present, things break.

There are tools like that yelling amphibian (you know the one) to help crawl sites and report on errors, like 404s, 500s, etc. However, it takes a long time to run, costs money, doesn't do specifically what I need, and did I mention it takes forever?

One of the reasons it takes a long time to run is that it's probably doing more in there than it needs to do. I don't know, since I didn't write it. So, the benefit of being a software developer, besides getting to do awesome things all the time, is that I can write code to replace software that I don't know how it works, don't know what it's doing that makes it slow, and it is free* to replace (* takes my time but I learn stuff, so it pays for itself :)

So I wrote a crawler. A parallel crawler with go routines. Go and get the main site, parse out any URLs (image sources and anchor hrefs), add any that are unique to a map, crawl the next one. It was a fun exercise in getting familiar with Go and concurrent processing again.

But it takes 24 minutes!  Unacceptable across 5000 pages. So my idea was to create a distributed crawler, using the same core code, modified for distributed workload across multiple machines. However many you want, just start up the client on a new computer and it will dial the server and begin getting URLs that haven't been processed yet. It's done. Pretty awesome thing.  I'm not going to post the code here now, because it's on a different computer, and I haven't figured out a good project structure and put it in git yet.

But here's the general idea. You can just imagine the code :)

Server has a "job" config with the site that it wants to crawl. It starts up and immediately goes into listen mode, just waiting for clients. It does no crawling of its own (although the way the code is shared, that code is compiled into the server executable as well).

Client connects. To the first client that connects, the server sends the first URL. The client does the parsing, and sends back a status code as well as the original URL, and the list of URLs that it found.

The server records the status along with the URL, and adds any unique URLs found by the client. These URLs go into the "UnprocessedLinks" channel of type Link ( Url string, Referrer string). Referrer is kept so we know where the page was that lead to the 404 or whatever.

The server then continues to send any links on the UnprocessedLinks channel to any client that connects (The first link is sent to the first client, it is just added to the UnprocessedLinks channel when the server starts)

There were a few challenges that I overcame while writing this. First, learning Gobs. Next, ensuring that packets should be sent and received first, are indeed, sent and received first. The other thing is determining, in Go, when a client disconnects or the server shuts down. That was really all, the rest of the logic was written when I wrote the parallel crawler, so that was all figured out.

There's another thing I have to figure out. At any one time, a client can have a few hundred URLs in its process queue, because the server just sends those across all clients when it gets them (uniquely giving 1 URL to 1 client and not duplicating any work). However, if a client disconnects and has loads of URLs left in its queue, those URLs are currently lost forever. So on the server, I have to keep track of which URLs were sent, and which ones were received back as processed. Then if it disconnects, I can pump the remaining URLs that were sent and not processed back into the UnprocessedLinks channel, for future processing by future or current clients.

That's it. I haven't ran it yet across multiple computers, but it does a pretty good job of doing what it's supposed to do. I can't wait to test it soon :)