Image Processing with Go

Refer to my Node.js post on the same topic

package main

import (
    "github.com/disintegration/imaging"
    "image"
)

func main(){
    if img,err := imaging.Open("test.jpg"); err == nil {
        newimg := imaging.Resize(img, 300, 200, imaging.Lanczos)

        imaging.Save(newimg, "test2.jpg")

        cropped := imaging.Crop(img, image.Rect(0,0, 600, 600))

        imaging.Save(cropped, "test3.jpg")
        imaging.Save(cropped, "test3.png")

        crsz := imaging.Resize(cropped, 200, 200, imaging.Lanczos)
        imaging.Save(crsz, "test4.jpg")
    }
}

I was going about it all wrong

MongoDB Aggregation Framework, which, if it didn't work right, I would call the Aggrevation Framework. Well it works wonderfully. So that name shall be put in my back pocket for a special kind of software.

For this blog, there is a list on the right which has the years months and posts for that month. It's pretty easy to represent in SQL, like so:

select Month(Post.Date), Year(Post.Date), count(*) from Post group by Month(Post.Date), Year(Post.Date)

Well, in my current version of this site, which is written in Node.JS using MongoDB, I wrote something which after looking at it, and the new solution, is not something I want to post here. :)

I think it was written before I knew about the Aggregation Framework. It was like this:

Loop from 2005 to now creating Date objects for each month along the way, loop over these Date objects, grab post counts for each Date object, update object with count.

Anyway, it was correct but could be done way better. Like instead of multiple calls to the database, how about just 1? Seems like an improvement.

Here it is in straight up MongoDB shell.

db.posts.aggregate (
    [ 
        { $match: {} }, 
        { $project: {    
            "postMonth": { "$month": "$date" }, 
            "postYear": { "$year": "$date" }, _id: 0 } 
            },  
        { $group: { 
            _id: { "postMonth": "$postMonth", "postYear":"$postYear" }, 
            "count": { "$sum": 1 } }    
        } 
    ]
)

Here it is in mgo (apparently pronounced mango)

pipe := posts.Pipe([]bson.M{
    { "$match": bson.M{} },
    { "$project": bson.M{ "postMonth": bson.M{ "$month": "$date" }, "postYear": bson.M{ "$year": "$date" } } },
    { "$group": bson.M{ "_id": bson.M{ "postMonth": "$postMonth", "postYear": "$postYear" }, "count": bson.M{ "$sum": 1 } } },
    { "$sort": bson.M{ "_id.postYear": -1, "_id.postMonth": -1 } },
})

Here's the whole GetPostDateCounts method:

func (postRepo PostRepository) GetPostDateCounts() []YearCount {
    posts := postRepo.OpenCollection(postRepo.collection)
    pipe := posts.Pipe([]bson.M{
        { "$match": bson.M{} },
        { "$project": bson.M{ "postMonth": bson.M{ "$month": "$date" }, "postYear": bson.M{ "$year": "$date" } } },
        { "$group": bson.M{ "_id": bson.M{ "postMonth": "$postMonth", "postYear": "$postYear" }, "count": bson.M{ "$sum": 1 } } },
        { "$sort": bson.M{ "_id.postYear": -1, "_id.postMonth": -1 } },
    })

    itr := pipe.Iter()
    p := YearMonthCount{}
    mapped := make(map[int][]MonthCount, 30) //this will work for 30 years. this code should be re-architected to accommodate advances in medicine

    for itr.Next(&p) {
        mapped[p.Date.Year] = append(mapped[p.Date.Year], MonthCount{ Month: p.Date.Month, Count: p.Count, MonthNum: int(p.Date.Month) })
    }

    yc := []YearCount{}
    for year, months := range mapped {
        yc = append(yc, YearCount{ Year: year, Months: months })
    }

    sorter := &YearCountSorter{ Entries: yc }
    sort.Sort(sort.Reverse(sorter))
    return sorter.Entries
}

And some structs used within that code

type YearMonth struct {
    Year int     `bson:"postYear"`
    Month time.Month     `bson:"postMonth"`
}

type YearMonthCount struct {
    Count int `bson:"count"`
    Date YearMonth     `bson:"_id"`
}

type MonthCount struct {
    Count int
    Month time.Month
    MonthNum int
}

type YearCount struct {
    Year int
    Months []MonthCount
}

Go Concurrency ala Rob Pike

I watched Rob Pike's talk, "Concurrency is not Parallelism", so I wanted to take what he was saying with his gopher example, and make a program that tightly followed his model.

My example program, in re-writing this website in Go, is the bit of it that gets photos from my Flickr account that I have tagged with "jtccom" in order to make them into header images. This utilizes the Flickr JSON API which is pretty easy to use.

There are multiple steps to using the Flickr API, three separate web calls, which makes this ideal for Go concurrency style programming. First step is to get the photos from my account tagged with "jtccom". This returns an array with photo ID and photo Secret. In order to get the URL for the photo, you have to get the sizes first. This is a separate call to the Flickr API, which returns an array of, among other things, Label and Source. Source is the URL, Label is the size name. In this case I'm only interested in the Original size, which has the "Original" label. The next part is to download the content pointed to by Source in the "Original" Size.

So the idea was to have a goroutine that gets photos (step 1), another one that gets sizes (step 2), and the other one that downloads content. Conceptually, this looks like this:

getPhotos(photos chan Photo)  // pumps photos into the photos channel
getSizes(photos chan Photo, sizes chan []Size) // pumps sizes for each photo into the sizes channel after calling the API for the photos in the photos channel
downloadPhoto(label string, size chan []Size, photoContent chan PhotoContent) // download files of size 'label' from the Size channel and pump it into the PhotoContent channel

Realistically, it works pretty much in order because the calls to getPhotos and getSizes are done way before it's done downloading the content, as each file is around 9-12 MB, but at least the getPhotos and getSizes can pretty much run in parallel.

Code-wise, it looks very similar, just with go routines and some object style things, json parsing etc.

For clarity I broke out Flickr specific calls into a separate file, but not a separate package. Here's the "flickrsvc.go" file, with some hidden things like API key obfuscated.

package main

import (
    "fmt"
    "time"
    "sync"
)

func saveFiles(tmp, dest string, photoContent chan PhotoContent){
    for photoContent := range photoContent {
        fmt.Println("Downloaded", photoContent.Photo.Id, "of size", len(photoContent.Content))
    }
}

func process(){
    var apiKey = "blahblah"
    var userId = "28240873@N07"
    var tag = "jtccom"

    var tmp = "../jtccom/content/tmp_download/"
    var destination = "../jtccom/static/images/backgrounds/"

    procWG := sync.WaitGroup{}

    photos := make(chan Photo)
    sizes := make(chan PhotoWithSizes)
    content := make(chan PhotoContent)

    procWG.Add(3)
    go func(){
        getPhotosByTag(tag, apiKey, userId, photos)
        close(photos)
        procWG.Done()
    }()

    go func(){
        getPhotoSizes(apiKey, photos, sizes)
        close(sizes)
        procWG.Done()
    }()

    go func(){
        downloadPhotos("Original", sizes, content)
        close(content)
        procWG.Done()
    }()

    saveFiles(tmp, destination, content)

    fmt.Println("wait procWG")
    procWG.Wait()
}

func main(){

    for {
        fmt.Println("going")
        process()

        fmt.Println("wait wg")

        fmt.Println("Sleeping")
        time.Sleep(3*time.Second)
    }
}

And here is the output:

C:\Users\jconnell\Documents\go\src\jtccom.flickrsvc>jtccom.flickrsvc.exe
going
Downloaded 14685510038 of size 9867146
Downloaded 14465862480 of size 11279714
Downloaded 14649298391 of size 9423168
Downloaded 14076004795 of size 8925512
Downloaded 13936652032 of size 14851399
Downloaded 12076007194 of size 14099167
Downloaded 11678436824 of size 9671802
Downloaded 11507180674 of size 13510941
Downloaded 11507190024 of size 11963353
Downloaded 11412952753 of size 13030709

Here is flickr.go (although it doesn't matter what it's called).

package main

import (
    "strings"
    "net/http"
    "net/url"
    "encoding/json"
    "io/ioutil"
)

type Response struct {
    Wrap Photos `json:"photos"`
}

type Photos struct {
    Photo []Photo     `json:"photo"`
}

type Photo struct {
    Id string     `json:"id"`
    Secret string `json:"secret"`
}

type SizeArray []Size

func (sizeArray SizeArray) GetSize(label string) Size {
    var size Size
    for _,sz := range sizeArray {
        if strings.EqualFold(sz.Label, label) {
            size = sz
            break
        }
    }
    return size
}

type SizesResponse struct {
    Wrap Sizes `json:"sizes"`
}

type Sizes struct {
    Sizes SizeArray `json:"size"`
}

type Size struct {
    Label string `json:"label"`
    Source string `json:"source"`
}

type PhotoWithSizes struct {
    Photo *Photo
    Sizes SizeArray
}

type PhotoContent struct {
    Photo *Photo
    Content []byte
}

func getPhotosByTag(tag, apiKey, userId string, pchan chan Photo)  {
    qs := url.Values{}
    qs.Add("method", "flickr.photos.search")
    qs.Add("api_key", apiKey)
    qs.Add("user_id", userId)
    qs.Add("tags", tag)
    qs.Add("format", "json")
    qs.Add("nojsoncallback", "1")

    flickrUrl, _ := url.Parse("https://api.flickr.com/services/rest/?" + qs.Encode())

    if resp,err := http.Get(flickrUrl.String()); err == nil {
        defer resp.Body.Close()
        decoder := json.NewDecoder(resp.Body)

        photos := Response{}
        decoder.Decode(&photos)

        for _, p := range photos.Wrap.Photo {
            pchan <- p
        }
    } else {
        panic(err)
    }
}

func downloadPhotos(sizeLabel string, download chan PhotoWithSizes, downloaded chan PhotoContent)  {
    for p := range download {
        url := p.Sizes.GetSize(sizeLabel).Source
        if resp,err := http.Get(url); err == nil {
            bytes,err := ioutil.ReadAll(resp.Body)
            resp.Body.Close()

            if err != nil {
                panic(err)
            } else {
                pc := PhotoContent{ Photo: p.Photo, Content: bytes }
                downloaded <- pc
            }
        } else {
            panic(err)
        }
    }
}

func getPhotoSizes(apiKey string, photos chan Photo, photoSizes chan PhotoWithSizes) {
    for p := range photos {
        qs := url.Values{}
        qs.Add("method", "flickr.photos.getSizes")
        qs.Add("api_key", apiKey)
        qs.Add("photo_id", p.Id)
        qs.Add("format", "json")
        qs.Add("nojsoncallback", "1")

        if sizesUrl, err := url.Parse("https://api.flickr.com/services/rest/?" + qs.Encode()); err == nil {
            if resp,err := http.Get(sizesUrl.String()); err == nil {
                decoder := json.NewDecoder(resp.Body)
                sizeResp := SizesResponse{}
                decoder.Decode(&sizeResp)
                resp.Body.Close()

                photoWithSizes := PhotoWithSizes{ Photo: &p, Sizes: sizeResp.Wrap.Sizes }
                photoSizes <- photoWithSizes
            } else {
                panic(err)
            }
        }
    }
}

I had some problems where the Flickr methods would return channels and they weren't working. And I had to experiment with buffered vs unbuffered channels, internal sync.WaitGroups, and stuff that wasn't working out so well. I will play around with this more, since apparently you can use WaitGroup without using Channels. I definitely want to play more to get a better understanding and find out why stuff I was trying initially wasn't working. But it's working now, I just have to finish it by saving it to the destination folder, and checking if the image was already downloaded. For future me, this would be good to do with a func that takes a channel and outputs to another channel all of the files that haven't yet been downloaded, to keep with the passing channels paradigm I've used so far.

 

Hosting Multiple Go Websites Using Nginx

So I've obviously been playing around with Go for a little while. I'd say now I've put in like 20-30 hours of good productive learning and coding in Go. I basically rewrote the blog display of this website in Go, connecting to MongoDB and stuff, using Go html/template, downloaded Gorilla Web Toolkit (AWESOME BTW), and tried to write very modular code that can be reused for other websites.  However, there was that question of "other websites"?  Each Go program compiles into its own program, calling the http.ListenAndServe() on the port specified, which when hosted on www.jasontconnell.com, would have to be 80.

I was playing with ideas in my head, like having a server.exe (obviously not .exe when I run it on Linux), which runs on port 80, and listens on another port for websites to register with it.

Server.exe starts up, jtccom.exe starts up, sends a message through RPC or some other network protocol that says "Yo, sup. If you would be so kind to send requests for jasontconnell.com to me, that'd be mighty generous of you." Server.exe would make note of the domain name and port that it's running on, and forward requests to it. This could also be done through a config file as well. But that would mean writing another full featured webserver in Go. I've already done one in Node.js, in Go it would be a bit easier since it's more fully featured as a webserver (including a template engine), and seems a bit faster. It wouldn't be as much work as doing it in Node because of the fact that templates are included (if you want to see some interesting code, ask me for my Node.js template engine code). But as John Carmack once said, "I don't think I have another engine in me".

Wanting to avoid writing another web server, I googled "host multiple Golang websites" (you have to add golang instead of go since go is such a generic term).  I found this article, which is hosting Go websites with Nginx, and also covered a lot of other things I won't be doing. Using that article, I was able to download Nginx, set it up with some minimal configuration, and had two Go websites up and running successfully.  I would have commented on that article to thank the author, but it required a login.

Here is the configuration in my nginx.conf file. This is within the main server node within the config (I also like how the config file is structured)

server {
    listen 80;
    server_name jtccom;
    location / {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host $host;
        proxy_pass http://jtccom:8080;
    }
    }

    server {
    listen 80;
    server_name stringed;
    location / {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host $host;
        proxy_pass http://stringed:8081;
    }
    }

Then I was able to hit http://jtccom and http://stringed with both of those programs running.

This has a lot of implications. With Node, I was running everything as root. Since I can put these on ports 44444 if I wanted to, I can run these as a non-root user, increasing the security in the process. My other server where this site was hosted was hit with a virus or something that crashed the site for a few days.  My dev machine is on Windows, so it's not immediate and I wasn't getting the teen response time when just hitting the Go process directly, but that should speed up on Linux when the sites are ready to launch. Another implication is that I can continue down my path of fully committing to Go for future development, since the hosting issue is solved. I could still go down the path of writing my barebones server with proxying capabilities in Go if the Nginx server doesn't work out completely, speed-wise.

I'm intrigued by the possibilities in Go. Writing small code that is mind blowing, compiled, fast, runs on multiple different systems, has a huge corporate backing, and is just fun to write. My first foray into it a couple of months ago was just "Hmm... I have no idea what I'm looking at", which is my brain's way of saying "you should learn that". Fun stuff.

Go Recursive Diff to Verify SVN Merge

For some reason, SVN Merge over large trees, where work was simultaneously going on in both branches, is very unreliable. And, you can't tell quickly if something is wrong, especially if what is different is not compiled. This could be very bad if you merge your current work into the trunk, and deploy the trunk version live. It would require a full regression test.

Fortunately, Go exists, and is fast. However, with all of my tweaking of goroutintes and channels to try to get it to process each subdirectory in a separate goroutine, my efforts proved futile as I couldn't get it to be quicker than 9 seconds. There's a lot of content. I wrote it to ignore whitespace changes so that slowed it down immensely.

package utility

import (
    "fmt"
    "crypto/md5"
)

func MD5(content []byte) string {
    sum := md5.Sum(content)
    return fmt.Sprintf("%x", sum)
}

That's the utility class that I use to just hash large swaths of content. Then here's the huge chunk of code that is my recursive file diff, approximately 170 lines of code.

package main

import (
    "io/ioutil"
    "os"
    "fmt"
    "strings"
    "utility"
    "regexp"
    "time"
)

type Dir struct {
    Name string
    FullPath string
    BaseDir *Dir
    Subdirs []*Dir
    Files []string
    Level int
}

type FileResult struct {
    FullPath string
    Result bool
}

var reg = regexp.MustCompile(`[\W]+`)

func readDirRecursive(base *Dir, types, ignore []string) {
    content, err := ioutil.ReadDir(base.FullPath)

    if err != nil {
        return
    }
    for _, f := range content {
        name := f.Name()
        if f.IsDir() {
            addDir := true
            for _, ign := range ignore {
                addDir = addDir && !strings.EqualFold(name, ign)
            }

            if addDir {
                sub := &Dir{ Name: name, BaseDir: base, FullPath: base.FullPath + `\` + name, Level: base.Level + 1}
                readDirRecursive(sub, types, ignore)
                base.Subdirs = append(base.Subdirs, sub)
            }
        } else {
            addFile := false
            for _, t := range types {
                addFile = addFile || strings.HasSuffix(name, t)
            }
            if addFile {
                base.Files = append(base.Files, name)
            }
        }
    }
}

func spaces(times int) string{
    return strings.Repeat(" ", times)
}

func printDir (level int, dir *Dir){
    fmt.Print(spaces(level) + dir.Name + "\n")
    for _, sd := range dir.Subdirs {
        printDir(level +1, sd)
    }

    for _, f := range dir.Files {
        fmt.Println(spaces(level) + "- " + f)
    }
}

func getContentMD5(file string) string {
    b,err := ioutil.ReadFile(file)
    if err != nil {
        fmt.Println(err)
        return nil
    }

    s := reg.ReplaceAllString(string(b), "")
    return utility.MD5([]byte(s))
}

func compareFiles(file1, file2 string) bool {
    m1 := getContentMD5(file1)
    m2 := getContentMD5(file2)    
    return m1 == m2
}

func compinternal (dir1 *Dir, dir2 *Dir, results chan FileResult) {
    for _, f := range dir1.Files {
        for _, f2 := range dir2.Files {
            if strings.EqualFold(f,f2) {
                result := compareFiles(dir1.FullPath + `\` + f, dir2.FullPath + `\` + f2)
                results <- FileResult{ FullPath: dir1.FullPath + `\` + f, Result: result}
                break
            }
        }
    }

    for _, sd1 := range dir1.Subdirs {
        for _, sd2 := range dir2.Subdirs {
            if strings.EqualFold(sd1.Name, sd2.Name){
                sdchan := make(chan FileResult)
                go func(){
                    compinternal(sd1, sd2, sdchan)
                    close(sdchan)
                }()

                for sdresult := range sdchan {
                    results <- sdresult
                }

                break
            }
        }
    }
}

func rcomp(dir1, dir2 string, filetypes []string, ignore []string) []string {
    diffs := []string{}

    left := &Dir{ Name: "Root Left", FullPath: dir1, Level: 0, BaseDir: nil }
    right := &Dir{ Name: "Root Right", FullPath: dir2, Level: 0, BaseDir: nil }

    readDirRecursive(left, filetypes, ignore)
    readDirRecursive(right, filetypes, ignore)

    resultChannel := make(chan FileResult)
    go func (){ 
        compinternal(left, right, resultChannel)
        close(resultChannel)
    }()

    for result := range resultChannel {
        if !result.Result {
            diffs = append(diffs, result.FullPath)
        }
    }

    return diffs
}

func main(){
    args := os.Args[1:]

    if len(args) < 4 {
        fmt.Println("need right and left directories, file types to include, folder names to ignore")
        return
    }

    start := time.Now().Unix()

    types := strings.Split(args[2], ";")
    ignore := strings.Split(args[3], ";")

    fmt.Println("Grabbing files " + strings.Join(types, ", "))
    fmt.Println("Ignoring folders " + strings.Join(ignore, ", "))

    diffs := rcomp(args[0],args[1], types, ignore)
    for _, diff := range diffs {
        fmt.Println(diff)
    }

    end := time.Now().Unix()

    total := end - start

    fmt.Println(total, "seconds taken")
}

Running rdiff

I later added a counter to see how many files it's actually comparing. In the project I wrote this app for, the file count of those included (.js, .cs, etc) was 2,794.

Compared Count

I could stand to clean up the output a bit, but it helped me identify a few files that were out of date with the branch. Thanks svn. And thanks Go!

 

More Go MongoDB Testing Code

This is generally moving towards how I would structure a final product that was written with MongoDB as the backend.  It's big, so I'm going to play with new ways of putting code on my site. Here it is on github

Seriously Kindle Reader for PC

This is an example of how Kindle for PC shows code snippets in the Go book I'm reading. It's seriously detrimental to my learning.

func (count *Count) Increment() { *count++ } ???

Luckily it works fine on my Nexus 9.

Go and MongoDB Initial Test

This was so easy. Going against the database that runs this site:

package main

import ( 
    "fmt"
    "gopkg.in/mgo.v2"
    "gopkg.in/mgo.v2/bson"
)


type Post struct {
    Body string
    Title string    `bson:"title"`
    Tags []string
}

func main() {
    session, _ := mgo.Dial("localhost")
    defer session.Close()

    db := session.DB("jtccom")

    posts := db.C("posts")

    var first Post
    posts.Find(bson.M{}).One(&first)

    fmt.Println(first.Title)

}

Here we go.

Go Programming Language and Kindle Unlimited

Always learning.  I look at Google's Go programming language, and at first it's new syntax, there's a few foreign things to me, I take this as a challenge that I want to overcome. I will know this language. Eventually. I have aspirations of doing everything I've done in Node.js on it. As Perl started it all, and Classic ASP replaced Perl, Java replaced Classic ASP, and Node.js replaced Java (in my "side language" progression, languages I've learned that haven't earned me a dime [other than in the aspect that I've grown and stretched my brain to think differently]), possibly eventually Go will replace Node.js.  It seems more "grown up" to me. Back when I was in college, learning C++ in those early days, and then never having a difficult-to-comprehend language to deal with after that. Ideally this was because languages have been getting simpler, 4GL, more abstraction with regards to references and memory management, multi-threading is cake (a cake with a shotgun hidden inside).

Go seems neat. These things that arise, though, when starting to learn a new language, and forgetting about them because they are initial growing pains, like setting up the environment in order to work best with the new language. Now I have everything set up for Node development that I don't have to think about it, I just get in there and start hacking. And immediately you forget about what it took to get it to that point, and you get to a new language, and you're like, man, this is much worse than what Node was, is it worth it?  The answer is it's not worse, I just forgot about it and don't have to worry about it anymore.

Node was less of a leap, as well, since the language isn't new, there isn't any new syntax, I've been doing Javascript for a decade...  (Although that doesn't necessarily hold true when learning Javascript APIs like Angular)  However, Go is a bit of a leap. New syntax, new way of thinking, different environment setup. These are my favorite challenges. So I started looking on the web for how to do Go. In all honesty, I'm still lost :)  I have my Go Workspace set up with the bin, pkg and src directories, and the Go environment variable to tell where this is. I just need to inject as much knowlege as I can when it comes to programming Go. Which brings me to Kindle Unlimited.

Kindle Unlimited. So much promise. Being bright eyed and wanting to learn Go, I wanted to see if any of the books out there were available on Amazon Kindle. To my delight, they were. And I had received emails from Amazon announcing Kindle Unlimited over the course of the past few weeks or whatever. So I wanted to buy the ticket and take the ride. They offered a free trial month also, which made this a zero pain investment. There are plenty of good Go books out there, and available on Kindle, that I could be reading for 6 months and still be getting more value out of Kindle Unlimited than what I was paying.

So I signed up. Then I searched for the book I was looking to read, "Programming in Go: Creating Applications for the 21st Century", which seemed like a good start. I didn't set out to create apps for the 20th or 19th centuries, and there's 85 years left in the 21st, this seemed on par as to where I wanted to go. Strangely there's no "send to my Kindle" button, since now I'm signed up for "Unlimited", meaning not having a limit, and I would like to read this book. So I go on the Kindle app on my phone, and notice there's a category for Kindle Unlimited, meaning books are categorized as Kindle Unlimited, certain books are picked to be in the Unlimited category, so only certain books (700,000) are available as unlimited, and the book I wanted to read is not available. So I looked at what Go books were available on Kindle Unlimited. There were some but I really need the beefy, 400+ pages that are offered on the book I mentioned earlier in this paragraph.

I cancelled my Kindle Unlimited subscription after 8 minutes.