Go and Sitecore, Part 2

I've integrated Go heavily into tools to make development of Sitecore much more easier. Replacing Team Development for Sitecore has been hugely beneficial. This is part 2, which covers serialization of Sitecore data.

In part 1, I covered how I'm now generating code from Sitecore templates, to a limited degree. I won't share the whole process and the whole program until the end, but just going over touch points until then.

For part 2, we'll cover Sitecore serialization. For the terminology, I'm not sure what TDS or other similar tools would refer to them as, but I will refer to these acts as serialization (writing Sitecore contents to disk) and deserialization (reading Sitecore contents from disk and writing to the database)

For Sitecore serialization, I would say step 1 is to decide which fields you DON'T want to bring over. In the past, I've had loads of issues with serializing things like Workflow state. And locks. So my approach is to ignore the existence of certain fields. Essentially, find out all of the fields on "Standard template", and decide which ones are essential or useful. Remove those from a global list of "ignored fields" list. Then get your data. For the data, from part 1 we use the same tree of items. When we build the tree, it gets a root node tree and an item map  (map[string]*data.Item). For serialization we need the item map. The root is only useful for building paths, after that we could most likely toss it. With the item map in hand, and a list of ignored fields, we can get the data.


        with FieldValues (ValueID, ItemID, FieldID, Value, Version, Language, Source)
        as
        (
            select
                ID, ItemId, FieldId, Value, 1, 'en', 'SharedFields'
            from SharedFields
            union
            select
                ID, ItemId, FieldId, Value, Version, Language, 'VersionedFields'
            from VersionedFields
            union
            select
                ID, ItemId, FieldId, Value, 1, Language, 'UnversionedFields'
            from UnversionedFields
        )

        select cast(fv.ValueID as varchar(100)) as ValueID, cast(fv.ItemID as varchar(100)) as ItemID, f.Name as FieldName, cast(fv.FieldID as varchar(100)) as FieldID, fv.Value, fv.Version, fv.Language, fv.Source
                from
                    FieldValues fv
                        join Items f
                            on fv.FieldID = f.ID
                where
                    f.Name not in (%[1]v)
                order by f.Name;
    

With SQL Server, we're able to do common table expressions (CTEs) which makes this a single query and pretty easy to read. We're getting all field values except for those ignored. We get version and language no matter what, and we get the source, which table the value comes from. ValueID is just the Fields table ID which could be useful as a unique identifier, but it's not actually used right now.  We simply pull all of these values into another list of serialize items, matching their ItemID with the item map to produce a new "serialized item" type, which will be serialized. SerializedItem only has a pointer to the Item, and a list of field values. Field values have Field ID and Name, the Value, the version, the language, and the source (VersionedFields, UnversionedFields, SharedFields).

The item map is also trimmed down to items in paths that you specify, so you're not writing the entire tree. In SQL Server with the current database (12K items), the field value query with no field name filter takes 3 seconds and returns 190K values. That's a bit high for my liking, but when you're dealing with loads of data you have to be accepting of some longer load times.

The serialized file format is hard coded, versus being a text template. However I feel I could do the text template since I've found out how to remove surrounding whitespace (e.g.  {{- end }}, that left hyphen says remove whitespace to the left). However, putting it in a text template, as with code generation, implies that the format can be configured. But, this needs to be able to be read back in through deserialization, so should be less configurable, 100% predictable.

func serializeItems(cfg conf.Configuration, list []*data.SerializedItem) error {
	os.RemoveAll(cfg.SerializationPath)
	sepstart := "__VALUESTART__"
	sepend := "___VALUEEND___"

	for _, item := range list {
		path := item.Item.Path
		path = strings.Replace(path, "/", "\\", -1)
		dir := filepath.Join(cfg.SerializationPath, path)

		if err := os.MkdirAll(dir, os.ModePerm); err == nil {
			d := fmt.Sprintf("ID: %v\r\nName: %v\r\nTemplateID: %v\r\nParentID: %v\r\nMasterID: %v\r\n\r\n", item.Item.ID, item.Item.Name, item.Item.TemplateID, item.Item.ParentID, item.Item.MasterID)
			for _, f := range item.Fields {
				d += fmt.Sprintf("__FIELD__\r\nID: %v\r\nName: %v\r\nVersion: %v\r\nLanguage: %v\r\nSource: %v\r\n%v\r\n%v\r\n%v\r\n\r\n", f.FieldID, f.Name, f.Version, f.Language, f.Source, sepstart, f.Value, sepend)
			}

			filename := filepath.Join(dir, item.Item.ID+"."+cfg.SerializationExtension)
			ioutil.WriteFile(filename, []byte(d), os.ModePerm)
		}
	}

	return nil
}

If you've looked into the TDS file format, you've noticed it adds the length of the value so that parsing the field value is "easier???" or something. However, it makes for git conflicts on occasion. Additionally, you can't just go in there and update the text and deserialize it.  For instance, if you had to bulk update a path that would end up in the value for each item, like a domain name or url in an external link field which is the value for many fields, with the TDS method you can't just do a find replace (unless the length of the value doesn't change!). Without the length you could find/replace across the whole path of serialized objects. There are other future benefits to this. Imagine you need to generate a tree but you don't want to use Sitecore API. You could generate this file structure and have it deserialize to Sitecore. The length doesn't help that scenario though, it just makes it a tiny less painful.

The idea for this was first, "common sense", but second, it's been working for HTTP and form posts for YEARS!! HTTP multipart forms just use the boundary property. My boundary isn't dynamic, it's just a marker. If that text were to show up in a Sitecore field, this program doesn't work. Most likely I'd replace underscores with some other value. I could generate a boundary at the start of serialization, and put it in a file in the root of serialization, like ".sersettings" with "boundary: __FIELDVALUE90210__" which would be determined at the start of serialization to be unique and having no occurrences in sitecore field values. Anyway, I've gone on too long about this :)

Also, the path and path/filepath packages in Go are the best. So helpful.

In this format, here is what the "sitecore" root node looks like serialized.

ID: 11111111-1111-1111-1111-111111111111
Name: sitecore
TemplateID: C6576836-910C-4A3D-BA03-C277DBD3B827
ParentID: 00000000-0000-0000-0000-000000000000
MasterID: 00000000-0000-0000-0000-000000000000

__FIELD__
ID: 56776EDF-261C-4ABC-9FE7-70C618795239
Name: __Help link
Version: 1
Language: en
Source: SharedFields
__VALUESTART__

___VALUEEND___

__FIELD__
ID: 577F1689-7DE4-4AD2-A15F-7FDC1759285F
Name: __Long description
Version: 1
Language: en
Source: UnversionedFields
__VALUESTART__
This is the root of the Sitecore content tree.
___VALUEEND___

__FIELD__
ID: 9541E67D-CE8C-4225-803D-33F7F29F09EF
Name: __Short description
Version: 1
Language: en
Source: UnversionedFields
__VALUESTART__
This is the root of the Sitecore content tree.
___VALUEEND___

In part 3, we'll be looking into deserializing these items.

Series:

Series:
Part 1 - Generation
Part 2 - Serialization
Part 3 - Deserialization

blog comments powered by Disqus