Habits

February 24, 2012
My variable/class/filename/etc naming habits have changed recently, I'd say in the past year. Previously, I'd name CSS classes, html element ids, etc, in camel case. But I've moved towards hyphen delimited.

Old way: someObjectNameOrID

New way: some-object-name-or-id

This is for file names as well. It's just one of those transitions that takes place where I'm not sure how I feel about it, and I definitely haven't taken the time to think about the possible repercussions. I'm just going with the flow. If I could name C# classes that way, I probably would.
Comments

The Power of Runtime File Combining

February 23, 2012
This can apply to any language and architecture, but I've been applying it in my Node.js programming. The idea of combining is to have all of your resources separated for development, but combined at runtime since the browser doesn't care about your structure, and fewer files is fewer requests, so the page loads faster. I did this with javascript at first, but was later faced with a huge css file, and decided to take the same approach.

So you have a tag or control or whatever your dev environment provides, where you pass it all of the files you would like to combine. On the server, if the combine file doesn't exist, you create it, writing the contents of all of the files to that single file, then you write out the html tag that points to the combined file on the server, with the correct MIME type tag (script tag for js, style or link tag for css, etc).

The "do it at runtime" version of this method takes the same list of files, but checks the modified date of each file, and compares it to the modified date of the resulting, combined file. If there's a newer version of any of the files, you overwrite the combined file. Then you write out the tag with the millisecond representation of the date modified of the combined file appended to the querystring of the file! It might look like this:

<style src="/css/combine/style.css?t=347483929" />

I'm typing on my iPad, otherwise I'd have code samples and maybe correct html syntax, if that's not correct... I don't even know :)
Comments

Javascript parseInt gotcha...

August 4, 2011

I was parsing an international date format, like this: 2011-08-04T11:23:21.345Z. After getting back December as the month for August, I was mildly perplexed. Here was the issue

From the Mozilla docs for parseInt: If the input string begins with "0", radix is eight (octal). This feature is non-standard, and some implementations deliberately do not support it (instead using the radix 10). For this reason always specify a radix when using parseInt.

Always specify a radix!!

Comments

Node.js Gzipping

July 28, 2011

Yesterday I lied when I said it would be the last Node.js post for a while! Oh well.

So today I was looking to make my project site a little faster, particularly on the mobile side. Actually this was the last three days worth of trying to figure stuff out. Node.js has plenty of compression library add-ons (modules), but the most standard compression tool out there is gzip (and gunzip). In the Accept-Encoding request header, the browser will tell you whether or not it can handle it. Most can...

This seemed like an obvious mechanism to employ to decrease some page load times... not that it's soo slow, but when the traffic gets up there and the site starts bogging down, at least the network will be less of a bottleneck. Some browsers do not support it, so you always have to send uncompressed content in those cases.

So I found a good Node.js compression module that supported BZ2 as well as gzip. The problem was, it was only meant to work with npm (Node's package manager), which for whatever reason, I've stayed away from. I like to keep my modules organized myself, I guess! So I pull the source from github and build the package if it requires it, then make sure I can use it by just calling require("package-name"); It's worked for every case except the first gzip library I found... doh! Luckily, github is a very social place, and lots of developers will just fork a project and fix it. That was where the magic started. I found a fork of the node-compress that fixed these issues, installed the package correctly by just calling ./build.sh (which calls node-waf, which I'm fine with using!), and copied the binary to the correct location within the module directory. So all I had to do was modify my code to require("node-compress/lib/compress"); I'm fine with that too.

Code - gzip.js

var compress = require("node-compress/lib/compress"); var encodeTypes = {"js":1,"css":1,"html":1}; function acceptsGzip(req){ var url = req.url; var ext = url.indexOf(".") == -1 ? "" : url.substring(url.lastIndexOf(".")+1); return (ext in encodeTypes || ext == "") && req.headers["accept-encoding"] != null && req.headers["accept-encoding"].indexOf("gzip") != -1; } function createBuffer(str, enc) { enc = enc || 'utf8'; var len = Buffer.byteLength(str, enc); var buf = new Buffer(len); buf.write(str, enc, 0); return buf; } this.gzipData = function(req, res, data, callback){ if (data != null && acceptsGzip(req)){ var gzip = new compress.Gzip(); var encoded = null; var headers = null; var buf = Buffer.isBuffer(data) ? data : createBuffer(data, "utf8"); gzip.write(buf, function(err, data1){ encoded = data1.toString("binary"); gzip.close(function(err, data2){ encoded = encoded + data2.toString("binary"); headers = { "Content-Encoding": "gzip" }; callback(encoded, "binary", headers); }); }); } else callback(data); }

So it's awesome. Here's a picture of the code working on this page
gzipped

Quite the improvement, at less than 30% of the size! Soon I'm going to work in a static file handler, so that it doesn't have to re-gzip js and css files every request, although I use caching extensively, so it won't have to re-gzip it for you 10 times in a row, only re-gzip it for 10 different users for the first time... I can see that being a problem in the long run, although, it's still fast as a mofo!

Comments

Thread Safety in Node.js

July 26, 2011

Probably my last post about Node.js for a while. My original implementation of the webserver used page objects in the following way:

index.html -> /site/pages/index.js

Meaning that when index.html was requested, index.js was located and executed. A side effect of this, using the node.js construct "require", is that the js page will only be loaded once. Which was bad because I had my code structured in the following way:

//index.js: this.title = "My page title"; this.load = function(..., finishedCallback){ this.title = figureOutDynamicTitle(parameters); finishedCallback(); }

Granted, when someone went to index.html, there was a very small time that they might get the wrong title, in this example. But for things, like setting an array on the page to the value loaded from the database, or where you might have another case in the load function where the array doesn't get set at all, there's a very good chance that someone will see something that someone else was only supposed to see.

How I fixed this was to pass back the page object to the finishedCallback. The page object was declared, built and passed back all within the context of the load function, so it never has a chance to be wrong! This is how it looks now

//index.js this.load = function(..., finishedCallback){ var page = { title: figureOutDynamicTitle(parameters) }; finishedCallback({ page: page }); }

This works. And it's super fast still.

Comments

Node.js Process

July 25, 2011

The Node.js process class is very helpful for cleanup purposes. You can imagine, when writing a rudimentary web server, you might also have a mechanism for tracking sessions. This was definitely the case for me, so we can easily keep track of who's logged in, without exposing much in the way of security.

Having not worked in a way to update all of my code without restarting the server, I have to restart the server when a change is made to the code in order for it to update. This in turn, deletes all of the sessions. I was thinking of a way to handle this, knowing of Node.js's process class, but where to put the code was not immediately obvious in my brain, but once I started coding for this exact purpose, it was a shame that I didn't think of it right away, just for the shear fact that I cannot lie, and I could say "Yeah I thought of it right away" :)

Here's the code:

function closeServer(errno){ console.log("Server closing, serializing sessions"); SessionManager.serialize(config.web.global.sessionStore, function(){ server.close(); process.exit(0); }); } process.on("SIGTERM", closeServer); process.on("SIGINT", closeServer);

Luckily, killproc sends SIGTERM, and pressing CTRL+C sends a SIGINT signal. I've noticed kill also just sends SIGTERM, with SIGKILL being a very explicit option, so we may just have to listen for SIGTERM and SIGINT. Anyway.

sessionStore is the folder to store them in, and it clears them out each time and saves what's in memory, so we don't get months old sessions in there.

I just serialize them using JSON.stringify and deserialize with the counter, JSON.parse. It works beautifully.

The session ID is just a combination of the time, plus some user variables like user agent, hashed and hexed. Here's the code that serializes and deserializes.

SessionManager.prototype.serialize = function(path, callback){ var self = this; fshelp.ensureFolder(path, function(err, exists){ fshelp.clearFolder(path, function(err, exists){ for (id in self.sessions){ if (!self.isExpired(self.sessions[id])) fs.writeFileSync(path + id, JSON.stringify(self.sessions[id]), "utf8"); } delete self.sessions; callback(); }); }); } SessionManager.prototype.deserialize = function(path, callback){ var self = this; fshelp.ensureFolder(path, function(err, exists){ fs.readdir(path,function(err, files){ files.forEach(function(d){ var id = d.substring(d.lastIndexOf("/") + 1); self.sessions[id] = JSON.parse(fs.readFileSync(path + d, "utf8")); }); callback(); }); }); }

Now no one gets logged out when I have to restart my server for any reason!

Since all of this takes place at the end or the beginning, I figured to save some brain power and just use the synchronous operations for writing and reading the files.

Comments

Can you break it?

June 23, 2011

<script type="text/javascript">
function cleanseField(str){
str = str.replace(/<(/?)(b|i|u|em)>/ig,"_$1$2_").replace(/<.*?>/g,"").replace(/_(/?)(b|i|u|em)_/ig,"<$1$2>");
return str;
} function doit(){
var str = "<h3>Header</h3>Hello, my name is <eM>jason connell</Em>. <b>You</B> are awesome, like <i>italics</i><<scr"+"ipt>script type='text/javascript'>alert('sup');</sc"+"ript </scr"+"ipt>><u>underline</u>";

var spn = document.getElementById("test");
spn.innerHTML += "<h1>Original</h1>" + str;
spn.innerHTML += "<h1>Cleansed</h1>" + cleanseField(str);
}
</script>

<a href="javascript:void(0);" onclick="doit();">Doit</a>
<span id="test"></span>


Trying to cleanse html but keep certain elements for now. Tested in Chrome and IE 8, but is there a string that will still put out a valid script tag? It's very important.

If only there was a way in HTML that you can specify, "scripts are only valid inside this segment of the page". It's a shame that you could execute a script anywhere.

Comments

Borrow the good, discard the bad

June 22, 2011

In my many years of web development, I've come across a lot of good ways that platform authors did stuff, and a lot of bad ways. So I'm writing my version of a web platform on Node.js, and I decided to keep the good stuff, and get rid of what I didn't like. It wasn't easy but I'm pretty much finished by now.

As with most things I develop, I'll decide on an architecture that allows for changes to be made in a way that makes sense, but I'll start with what I want the code to look like. Yes. When I wrote my ORM, I started with the simple line, db.save(obj); (it turns out that's how you do it in MongoDB so I didn't have to write an ORM with Mongo :) When starting a web platform, I started out the same way.

I wanted to write:

<list value="${page.someListVariable}" var="item">
Details for ${item.name}
<include value="/template/item-template.html" item="item" />
</list>


Obvious features here are code and presentation separation, SSIs, simple variable replacement with ${} syntax.

There aren't a lot of tags in my platform. There's an if, which you can use to decide whether to output something. There's an include, which you can pass variables from the main page so you can reuse it on many pages. This one takes an "item" object, which it will refer to in its own code with ${item}.

Recently I added a layout concept. So you can have your layout html in another file, and just put things into the page in the page's actual html. For instance, you might reach the file index.html, which would look like this:

<layout name="main">
<content name="left-column">
<include value="/template/navigation.html" />
</content>
<content name="main-column">
<include value="/template/home-content.html" />
</content>
</layout>


Java Server Faces used a two way data binding mechanism which was really helpful. But then you need controls, like input[type=text] or whatever. My pages will not have two way data binding, but you can use plain html. Which I like better. (However, those controls were very simple to swap due to the generous use of interfaces by Java, and their documentation pretty much mandating their use. e.g. using ValueHolder in Java instead of TextBox, and if you were to make it a "select" or input[type=hidden], your Java code would not have to change, which is one thing I absolutely hate about ASP.NET).

I borrow nothing from PHP.

ASP.NET pretty much does nothing that I like, other than it's easy to keep track of what code gets run when you go to /default.aspx. The code in /default.aspx.cs and whatever Page class that inherits, or master page that it's on. In Java Server Faces you're scrounging through xml files to see which session bean got named "mybean".

My platform is similar to ASP.NET in that for /index.html there's a /site/pages/index.js (have I mentioned that it's built on node.js), that can optionally exist, and can have 1-2 functions implemented in it, which are "load" and "handlePost", if your page is so inclined to handle posts. Another option is to have this file exist, implement neither load nor handlePost, and just have properties in it. It's up to youme.

Here's a sample sitemap page for generating a Google Sitemap xml file:

Html:

<!--?xml version="1.0" encoding="UTF-8"?-->

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://${config.hostUrl}/index</loc>
<lastmod>2011-06-16</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<jsn:foreach value="${page.entries}" var="entry">
<url>
<loc>${entry.loc}</loc>
<lastmod>${entry.lastmod}</lastmod>
<changefreq>${entry.changefreq}</changefreq>
<priority>${entry.priority}</priority>
</url>
</jsn:foreach>
</urlset>


I use the jsn prefix, which just stands for (now, anyway) Javascript Node. I wasn't creative. I guess I can call it "Jason's Site N..." I can't think of an N.

And the javascript:

var date = require("dates"), common = require("../common");

this.entries = [];

this.load = function(site, query, finishedCallback){
var self = this;
var now = new Date(Date.now());
var yesterday = new Date(now.getFullYear(), now.getMonth(), now.getDate());
var yesterdayFormat = date.formatDate("YYYY-MM-dd", yesterday);
common.populateCities(site.db, function(states){
for (var i = 0; i < states.length; i++){
states[i].cities.forEach(function(city){
var entry = {
loc: "http://" + site.hostUrl + "/metro/" + city.state.toLowerCase() + "/" + city.key,
lastmod: yesterdayFormat,
changefreq: "daily",
priority: "1"
}
self.entries.push(entry);
});
}
finishedCallback({contentType: "text/xml"});
});
}


My finishedCallback function can take more parameters, say for handling a JSON request, I could add {contentType: "text/plain", content: JSON.stringify(obj)}.

That's about all there is to it! It's pretty easy to work with so far :) My site will launch soon!

Comments

The Non-Blocking Nature of Node.js

June 9, 2011

This can lead to some pretty sweet code. For one thing, always add a callback as a parameter to functions you create, to keep with the non-blocking nature. The next thing you need to know is that you will back yourself into a corner!

Take the following code:
collection.find(search, {sort: sort}, function(err, cursor){
cursor.toArray(function(err, messages){
for (var i = 0; i < messages.length; i++){
db.dereference(messages[i].from, function(err, result){
messages[i].from_deref = result;
});
}
callback(messages);
});
});


Backstory: I'm using MongoDB as the backend, I have a message collection, a user collection, and messages have a "from" property that is a DBRef to a user.

You would run this code and find that if you had any number of messages greater than zero, you will probably get a null "from_deref" object, which means the callback at the end was called before it was finished processing. That is if you're lucky enough to not get an error stating that the code "can't set the property from_deref of undefined", which means, usually, that "i" is null or greater than the length of the array by the time the callback for db.dereference calls. If it's not obvious, I'm dereferencing the user's DBRef and storing it in the message's from_deref property.

This is because of the non-blocking nature of Node.js. It's interesting because it makes me think in new ways. Anything that makes you think differently is good in my opinion. So how do we accomplish this and not break anything? Consider the following code as a solution:

collection.find(search, {sort: sort}, function(err, cursor){
cursor.toArray(function(err, messages){
var process = messages.length - 1;
for (var i = 0; i < messages.length; i++){
(function(messages, index){
db.dereference(messages[index].from, function(err, result){
messages[index].from_deref = result;

if (index == process)
callback(messages);
});
})(messages, i);
}

if (messages.length == 0) callback(messages);
});
});


Javascript is awesome. This is basically an anonymous function that I define and call in the same block. The definition is everything inside (function(x,y){}) and the call is in the parentheses following: (messages, i); So this calls the inner block with the value of i that I'm hoping it will (or rather than hoping, I'm confident it will!). And when all dereferences are done, I know that the process variable will be equal to the index (process variable is length - 1 which is the max value the index can have).

Of course, this doesn't take advantage of the node-mongodb-native's library of the nextObject function on the cursor object. That would totally solve this without javascript magic:

cursor.nextObject(function(err, message){
db.dereference(message.from, function(err, result){
message.from_deref = result;
});
});


However, I like the Array...

So there you have it.

Comments

Task Notification with Google Talk via XMPP

May 25, 2011

I wrote a post for work about using XMPP to send task notifications through Google Talk.

Comments