Felix Crux

Technology & Miscellanea


Tags: , , ,

Doug Crockford Talk on JavaScript

Doug Crockford gave a talk at the University of Waterloo last night as part of the Yahoo! Hack-U (University Hack Day), on the same topic as his new book, JavaScript: The Good Parts. I was lucky enough to get a seat, and have tried to condense my nine pages of notes into an overview of the highlights of his talk. Though I've rewritten parts, all of the content and wisdom is Doug's -- I'm just the humble scribe. Of course, any errors of transcription are mine.

Introduction

Believe it or not, JavaScript has good parts! It's an odd language, because it contains some of the best and some of the worst ideas in programming language design, and has managed to become both the most popular and most reviled programming language out there.

Of all languages, JavaScript probably has the broadest range of skills among its users. It appeals to both computer scientists and cut'n'paste beginners with no clear idea of what they are doing. It's pretty much the only language that people use without ever learning! That's both the cause of a lot of the awful code out there, and an astonishing testament to the fact that it's actually possible to do that.

JavaScript is not what makes in-browser programming awful -- it's the DOM. Any language would be painful if you had to use it to interact with the DOM. It's also what makes things slow and inefficient.

The history of the language is incredibly diverse: it's been influenced by Scheme (lambdas, loose typing), Self (prototypal inheritance, dynamic objects), Java (syntax), and Perl (regular expressions).

The Bad Parts

Global variables: This is bad for all the same reasons as in other languages, but in addition, JavaScript will implicitly declare your typo-ed identifiers, and silently carry on.

+ both adds and concatenates: You can get away with this in Java because of the type restrictions, but in JavaScript you'll blithely try to add a number to, say, a form input which looks like a number but is actually a string.

Semicolon insertion: This seems like a nice, beginner-friendly feature, at first. It's implemented by the parser running along until it hits a syntax error, at which point it rewinds a little, inserts a semicolon in a likely place, and tries again. This should scare you.

typeof: What's the typeof an object? Object. Of an array? Object. Of null? Object.

with and eval: Security implications are bad. eval is probably the single worst misused feature. If you find that you want to use it, step away from the computer.

Fake arrays: They're actually hash tables, which is OK if that's what you want, and if that's what you call them.

== and != do type coercion. Unfortunately nobody can figure out exactly when or how. For example, 0 == '0' is true, but false == 'false' is false, and '' == '0' is false, but 0 == '' is true. Thankfully, you can just always use ===.

false, null, undefined, NaN: These are all almost the same, but not quite.

Bad Genetics

There is also a good deal of bad behaviour that's inherited and shared with other languages: block-less statements (e.g. one-line if and so on); expression statements (lone expressions on a line that will be evaluated and discarded); IEEE floating point (0.1 + 0.2 != 0.3); ++ and -- (leads developers into clever behaviour); and fall-through of switch blocks.

Doug had an amusing anecdote about an episode in the development of JSLint: A user contacted him suggesting that fall-through of cases be flagged as bad behaviour. Dough replied with an explanation of the elegance of nicely structured, intentional fall-through, which convinced the user to retract his feature request. In the user's response, in addition to withdrawing the feature request, he reported another bug. When Doug investigated it, it turned out to be... yup, unwanted switch fall-through. At that moment, Doug says, I was enlightened.

Good Parts

JavaScript was the first really mainstream language with lambda and first-class functions, which other languages are now adopting. This makes JS an influential language!

Dynamic objects are simple containers that can grow or shrink, and since they are based on prototypal inheritance, they aren't limited to just being instances of a class. This is a strictly more powerful object model, but it takes some getting used to for most people.

Loose typing is one of the controversial parts, which some people would consider one of the bad things. However, Doug's conclusion is that the added expressiveness and ease of use is well worth it, since the kind of bugs avoided by strict typing are usually easy to fix anyway.

Gotchas

Globals

Consider the code:


var names = ['zero', 'one, 'two', ...]
var digit_name = function (n) {
  return names[n];
}

Though it works, it makes use of the nasty, global, names, which could lead to all kinds of nonsense. We could move the variable into the scope of the function instead, but that would be rather inefficient. Instead, try this:


var digit_name = (function() {
  var names = ['zero', 'one', 'two', ...];
  return function (n) {
    return names[n];
  };
})();

This is an example of one of the good parts: closures. We can define the variable just once, and then have our function close over it, preserving state. The trailing () at the very end cause the anonymous function to be executed right away, binding the returned function to our variable. This is awesome.

Style Isn't Subjective

Brace positioning is more or less a holy war without any right answer -- except in JavaScript, where same-line braces are right and you should always use them. Here's why:


return
{
  ok: false;
};


return {
  ok: true;
};

What's the difference between these two snippets? Well, in the first one, you silently get something completely different than what you wanted. The lone return gets mangled by the semicolon insertion process (remember that from the list of Bad Parts?), becomes return; and returns nothing. The rest of the code becomes a plain old block statement, with ok: becoming a label (of all things)! Having a label there might make sense in C, where you can goto, but in JavaScript, it makes no sense in this context. And what happens to false? It becomes one of those expression statements mentioned in the Bad Parts: it gets evaluated and completely ignored. Finally, the trailing semicolon -- what about that? Do we at least get a syntax error there? Nope: empty statement, like in C.

Use same-line braces, folks.

JSLint

JSLint defines a professional subset of JavaScript, and imposes programming discipline. You should do everything it tells you, even if it hurts your feelings. Doug says JSLint is smarter about JavaScript than I am, and probably smarter than you are too.

History and Future

AJAX and the resurgence in popularity of JavaScript could have happened way earlier, but Netscape 4 and the other browsers of the time we so awful. Netscape 4 was a crime against humanity. IE 6 was the best browser in the world -- and think of just how bad it is.

However, all that may have been good for JavaScript: had anything happened, it would have been thrown out and replaced with something much better! JavaScript would have died with Netscape if not for Microsoft diligently duplicating it, bugs and all.

Perhaps the very best part of JavaScript: stability! No new design errors since 1999! Also, no new versions.

Thankfully, ECMAScript Fifth Edition is in the works (and is actually readable), with nice features like support for object hardening and a strict mode (invoked with "use strict";, which is an expression statement under older versions).

Unfortunately, we're still waiting on implementations. Microsoft will likely have the first working version, but they won't ship it until whenever IE 9 comes out. Mozilla seem to just be waiting to see what Microsoft does, and they'll react to that. Apple can't comment on future products. Google will just do whatever Apple does (UPDATE: Now that I am an intern at Google, I should probably clarify that that's Doug Crockford's statement; I know nothing about any Google plans on the matter).

The Really Good Parts

If you use JavaScript, you have a potential audience of billions. It's the most widespread -- and despite the bugs, the most cross-platform -- system you can use.

It is possible to write really good code. In fact, it is mandatory if you want to maintain sanity.

If you avoid the bad parts, it works really well. It's not just usable and pleasant; there is brilliance in it.

Misc. Q&A

At the Q&A afterwards, there were a few interesting gems:

  • The people in charge of the language (ECMA), and the people in charge of the DOM (the W3C), have never had a joint session or meeting, but he's trying to change that.
  • He thinks the DOM is awful, and HTML 5 is taking it in exactly the wrong direction.

The Book

I'm going to wrap this up the same way he did: with a plug for his new book, JavaScript: The Good Parts (Amazon.com, Amazon.ca). If you do any JavaScript development, get a copy! It contains all of the above wisdom, and much more.

Now excuse me; I'm off to do some JavaScript.



Tags: , ,

Creating a Basic Makefile

Makefiles are the granddaddy of build systems. Though falling out of favour relative to more modern systems like SCons and ant, make is still the lingua franca of software builds, particularly in the C and C++ parts of the open source world. Because of this, it is imperative to have at least a basic understanding of makefiles and their use.

There are plenty of tutorials introducing the fundamentals of makefile syntax, and a handful that show off some advanced features. There are very few, however, that actually show how to write a useful makefile, or that introduce makefile conventions and patterns. For me, this meant that writing makefiles became an arduous process of stringing together snippets from various places, and hoping they interoperated harmoniously. Frustratingly, I'd often learn of a new feature months later and rip out half of the file and replace it with a single line. Worst of all, I had no idea if what I was doing was conventional or even passable as a serious makefile.

I therefore want to put out this guide to basic makefile usage and conventions, and in the process, develop a basic makefile template that can be used for most small projects or as a starting point for more elaborate build systems. The resulting makefile will also roughly adhere to the GNU makefile conventions, but only where it makes sense for a small project and where support is not too onerous. For the purposes of the guide, we'll be writing a makefile for a C program, but the ideas are easily applicable to other languages. So if you'll oblige me by firing up your text editors, I'll get started.

Build Variables

At the top of our makefile, we will want to declare the variables used in the build process. Keeping everything in variables allows for easy modification of multiple build rules at once, as well as exporting variables from higher-level makefiles in the case of recursive builds (ones where this makefile is just building a particular component). For portability, we can start out by declaring our SHELL and compiler. These two variables are among Make's many special names, and are used implicitly in certain situations, so it is good practice to specify them. We can do so with the following snippet:


SHELL = /bin/sh
CC    = gcc

Next we'll define variables for the actual compiler flags used for building. My personal system is to break these up into four parts: FLAGS, used for mandatory flags without which the project will not build, RELEASEFLAGS and DEBUGFLAGS, for public release and debugging flags, respectively, and CFLAGS for user-defined C compiler settings. This last one is a standard that some users like to define for themselves, and so it should always use that name. I use it for things that are not essential but that I would always like to have in place when building. For this makefile, I've defined these variables as follows:


FLAGS        = -std=gnu99 -Iinclude
CFLAGS       = -pedantic -Wall -Wextra -march=native -ggdb3
DEBUGFLAGS   = -O0 -D _DEBUG
RELEASEFLAGS = -O2 -D NDEBUG -combine -fwhole-program

You can see above the usage of the different variables. Without the FLAGS settings (which specify to use the GNU variant of the C99 standard, and to look for #included files in the include directory, respectively) the hypothetical code would likely not compile correctly (logically, we assume that this hypothetical code does in fact keep header files in that location, and does make use of C99 extensions). The debug and release flags, on the other hand, contain various optimization directives and declarations: the NDEBUG definition causes assert()s to be taken out (among other things); the combine and fwhole-program flags instruct GCC to assume that the files it is working on comprise the whole program, and to optimize accordingly (this only works for C at present); and the O number specifies the level of optimization to apply. Finally, CFLAGS holds user-optional choices, as promised. In this example, I have chosen to make the compiler very strict about errors and warnings (pedantic, Wall, Wextra), instructed it to tune the output program for my specific machine architecture, and finally asked for the inclusion of copious amounts of GDB-specific debugger information. For maximum portability you should not assume GDB, but in practice it is fine for me.

Now let's define some variables to hold important files related to the build. We'll need the name of the program we're building, which I've called foomatic-widget, a list of source files, header files, common headers on which all files depend, and the object files that our sources will compile to. The application name and common headers we can just specify, but keeping track of all our source and header files could be a pain. I've therefore used a Make feature where we can call out to the shell, in this case to get a list of all files ending in .c and .h. Likewise, the list of object files is built by taking all the source files, and replacing their extensions with .o. This all looks like this:


TARGET  = foomatic-widget
SOURCES = $(shell echo src/*.c)
COMMON  = include/definitions.h include/debug.h
HEADERS = $(shell echo include/*.h)
OBJECTS = $(SOURCES:.c=.o)

Finally, we define some paths used for installing our program in a more permanent fashion. By convention, the DESTDIR variable is used, even though we don't declare it, as this allows the user to test installation to any directory by specifying a DESTDIR on the command-line. These variables are defined this way:


PREFIX = $(DESTDIR)/usr/local
BINDIR = $(PREFIX)/bin

Build Targets

Now we get on to the main business of Make: building things. Make uses the concept of targets to represent sets of instructions that you want it to run. The first target listed is the default one used if Make is invoked without specifying a target. Otherwise, you can run a different one with make targetname. By convention, the all target builds the project fully, and is the default. Targets can also have prerequisites: targets that will be processed prior to the current one, or files that, if changed, will cause the target to be rerun. If the target itself is a file, Make intelligently determines whether it needs to be rerun based on its prerequisites' times of last modification.

I typically just make all depend on the actual name of the executable, defined in $(TARGET) above. This ensures that the executable is built if you simply run Make. Optionally, you can also define other targets as prerequisites; for example, I often include a run of the indent or cppcheck utilities, depending on the nature of the project.

Now we have to let Make know how to build the $(TARGET) that all depends on. We do this by defining it as a new target, which depends on the $(OBJECT) files of each component, as well as the common headers. This target, however, actually contains a rule on how to build it, which will be run when all the prerequisites have been satisfied. This rule is on a new line, and must be indented with tabs. This is a common pitfall, though many editors will make sure that you don't accidentally use spaces here unless you really want to. The rule simply consists of a call to our compiler, defined above, with all the flags that we also defined, and a list of the object files to link. The first two rules then look like this:


all: $(TARGET)
 
$(TARGET): $(OBJECTS) $(COMMON)
  $(CC) $(FLAGS) $(CFLAGS) $(DEBUGFLAGS) -o $(TARGET) $(OBJECTS)

Now, you may have noticed that we're building with the debug settings. How then, do you produce something for day-to-day usage? Why, with another target, invoked with make release and looking like this:

 
release: $(SOURCES) $(HEADERS) $(COMMON)
  $(CC) $(FLAGS) $(CFLAGS) $(RELEASEFLAGS) -o $(TARGET) $(SOURCES)

You may also, later in the development cycle, wish to compile your program with profiling information. The way I've implemented this functionality is with another feature of Make, namely modifying variables. The first target below causes the CFLAGS variable to include a profiling option, and then the actual target causes the application to be built with the new set of flags.

 
profile: CFLAGS += -pg
profile: $(TARGET)

Administrative Targets

We should also define some administrative targets, which will let us move files around or remove them as needed. A subset of the ones suggested by the GNU Makefile conventions are below:


install: release
  install -D $(TARGET) $(BINDIR)/$(TARGET)
 
install-strip: release
  install -D -s $(TARGET) $(BINDIR)/$(TARGET)
 
uninstall:
  -rm $(BINDIR)/$(TARGET)

clean:
  -rm -f $(OBJECTS)
  -rm -f gmon.out
 
distclean: clean
  -rm -f $(TARGET)

The install and install-strip targets provide us with a mechanism to put our final built binary in some appropriate path, as defined in the environment variables above, and using the standard install utility (the naming is a bit confusing: we have both an install Make target and a system utility). The latter option strips debugging symbols from the binary in the process. Both targets depend on the release target, so we can expect that to be built as per the process described above. Uninstall provides the reverse functionality.

The two cleaning-related options are also standard; they differ only in that distclean restores the directory to the pristine state it would be distributed in, i.e. the compiled binary is also removed. The commands in these targets are preceded by a minus sign, telling Make to continue even if the command yields an error (like if the files don't exist).

With these targets in place, we should also take a moment to consider what would happen if we were to actually create a file named, for example, release or install. Make would start deciding whether to run these targets based on the freshness of those files -- clearly not the behaviour we want. We can work around this by defining these targets as PHONY, which tells make to always execute them (solving our problem) and to not bother searching for prerequisites (slightly improving performance). We do this as follows:


.PHONY : all profile release \
  install install-strip uninstall clean distclean

Objects

Our application target above depends on a whole bunch of object files. We could list them all individually, or we could allow Make to build them implicitly (it's pretty smart and can mostly figure it out), but we can do even better. We can define a wildcard rule that will match all object files, and build them just the way we want. We could also define one or two object files individually, if they were special cases for some reason.

This wildcard rule makes use of a few special variables. The first one you'll see is %.o. That is the actual wildcard that matches object files. We can use a similar syntax to make it depend on the right source file as a prerequisite. We also need to know about the $@ and $< variables, which refer to the current target and the first prerequisite, respectively. The rule can then be built like this:

 
%.o: %.c $(HEADERS) $(COMMON)
  $(CC) $(FLAGS) $(CFLAGS) $(DEBUGFLAGS) -c -o $@ $<

You may have noticed that the above rule has all header files as a prerequisite. This is to be on the safe side, in case other parts of the program that are relevant to that file were changed. Depending on the size of your project, that may represent a significant amount of time wasted needlessly. If you're not averse to some really gruesome syntax, and want to rectify the problem, and if you're using GNU Make only, you can do better.

Using a feature of GNU Make known as second expansion, you can dynamically determine the specific headers to care about by calling out to GCC with the -MM option, which makes it list the headers included by a particular file. Second expansion allows us to evaluate variables a second time, later on in their lifecycle, where the surrounding context may have changed. For details on the deep magic going on here, consult the actual manual, but you should be able to get a rough idea of what's going on from the following implementation:


.SECONDEXPANSION:
 
$(foreach OBJ,$(OBJECTS),$(eval $(OBJ)_DEPS = $(shell gcc -MM $(OBJ:.o=.c) | sed s/.*://)))
%.o: %.c $$($$@_DEPS)
  $(CC) $(FLAGS) $(CFLAGS) $(DEBUGFLAGS) -c -o $@ $<

The final product

Our shiny new makefile is reproduced below in its entirety:


SHELL = /bin/sh
CC    = gcc
 
FLAGS        = -std=gnu99 -Iinclude
CFLAGS       = -pedantic -Wall -Wextra -march=native -ggdb3
DEBUGFLAGS   = -O0 -D _DEBUG
RELEASEFLAGS = -O2 -D NDEBUG -combine -fwhole-program
 
TARGET  = foomatic-widget
SOURCES = $(shell echo src/*.c)
COMMON  = include/definitions.h include/debug.h
HEADERS = $(shell echo include/*.h)
OBJECTS = $(SOURCES:.c=.o)
 
PREFIX = $(DESTDIR)/usr/local
BINDIR = $(PREFIX)/bin
 
 
all: $(TARGET)
 
$(TARGET): $(OBJECTS) $(COMMON)
  $(CC) $(FLAGS) $(CFLAGS) $(DEBUGFLAGS) -o $(TARGET) $(OBJECTS)

release: $(SOURCES) $(HEADERS) $(COMMON)
  $(CC) $(FLAGS) $(CFLAGS) $(RELEASEFLAGS) -o $(TARGET) $(SOURCES)

profile: CFLAGS += -pg
profile: $(TARGET)
 
 
install: release
  install -D $(TARGET) $(BINDIR)/$(TARGET)
 
install-strip: release
  install -D -s $(TARGET) $(BINDIR)/$(TARGET)
 
uninstall:
  -rm $(BINDIR)/$(TARGET)
 
 
clean:
  -rm -f $(OBJECTS)
  -rm -f gmon.out
 
distclean: clean
  -rm -f $(TARGET)
 
 
.SECONDEXPANSION:
 
$(foreach OBJ,$(OBJECTS),$(eval $(OBJ)_DEPS = $(shell gcc -MM $(OBJ:.o=.c) | sed s/.*://)))
%.o: %.c $$($$@_DEPS)
  $(CC) $(FLAGS) $(CFLAGS) $(DEBUGFLAGS) -c -o $@ $<
 
# %.o: %.c $(HEADERS) $(COMMON)
#    $(CC) $(FLAGS) $(CFLAGS) $(DEBUGFLAGS) -c -o $@ $<
 
 
.PHONY : all profile release \
  install install-strip uninstall clean distclean

For more detailed documentation, consult the GNU Make Manual or the GNU Makefile Conventions document.



Tags: , ,

Mind the Gap

A few days ago there were reports that Korea, already a leader in telecommunications infrastructure, would be pursuing plans to provide 1 Gbps Internet connectivity across the country by 2012. An excerpt from the Slashdot summary:

The entire country is gearing up to have 1 Gbps service by 2012, or at least that is what the Korea Communications Commission (KCC) is claiming. 'Currently, Koreans can get speeds up to 100 Mbps, which is still nearly double the speed of Charter's new 60 Mbps service. The new plan by the KCC will cost 34.1 trillion ($24.6 billion USD) over the next five years. The central government will put up 1.3 trillion won, with the remainder coming from private telecom operators.

Now, whenever facts like this are mentioned, people ask why we in Canada and the US are stuck with paltry two to ten Mbps connections that also suffer from ISP bandwidth throttling and traffic shaping policies. Usually at least one response points out that the US and Canada are vastly larger countries, and it is therefore not economically feasible to cover the entire country in high-speed fibre-optic links. An unusually mild example is this comment to the Slashdot story:

Korea is roughly 1/100th the size of the US. If we estimate a similar plan in the US based on size only, it would cost $2.46 trillion USD. The Korean government is paying 1.3 trillion of the 34.1 total (or roughly 4%). If the US government did something similar, it would be about $100 billion USD.

Population, not area

Man urinating into a pool at the bottom of a large waterfall

Although the above argument is technically correct, it confuses coverage of landmass with coverage of people. The fact is, there is no need to provide high speed internet to vast tracts of US and Canadian wilderness, or even rural, regions. There are inhabited areas in both countries that have no broadband connectivity whatsoever, and likely more than a few villages that lack even dial-up. The point of expanding the capabilities of North American Internet infrastructure is not to provide everywhere with high-speed connections, but to provide them to as many people as possible. Focussing on the densely populated metropolitan centres of both countries reveals what a specious argument comparing areas is.

First, some background statistics to frame the discussion: The area of South Korea is almost exactly 100,000 square km. The US and Canada cover approximately 9,826,600 and 9,984,700 square km, respectively. The estimated population of the US is a shade under 306 million, while Canada is home to 33 and a half million souls. The GDP of Korea is just under one trillion US$; the US's a bit more than 14 trillion, and Canada's is almost exactly one tenth of that, at 1.4 trillion.

If the US government and telecoms would invest in providing a similar level of coverage to just the five most populated cities and surrounding areas (New York, Los Angeles, Chicago, Dallas-Fort Worth, Philadelphia), it would represent an area of 85,966 square kilometres (so, well under the area of Korea), and would provide coverage to 53,189,247 people. Furthermore, there are a number of areas that I suspect state governments and even local corporations would be willing to help finance the buildout; San Diego, Irvine, and San Francisco come to mind, as do Washington D.C. and Seattle. On top of that, if we use GDP as a very rough measure of the relative investment potential of the two nations, it seems clear that the US should be able to afford an investment around 15 times as large in the first place. Adding up all these factors, it's clear that the US could easily afford to extend coverage well beyond those five areas, and provide coverage to many millions more, as well as most of the country's technology hubs.

In Canada, the situation is even more extreme. The top five metropolitan areas (Toronto, Montreal, Vancouver, Ottawa, and Calgary), cover just 24,687 square kilometres, and contain just over 13 million of Canada's 33 and a half million inhabitants. In other words, almost 40% of the population in less than a quarter of South Korea's area. Extending coverage to the top ten municipalities would likely produce quickly diminishing returns, but would probably still encompass less territory than the Korean plan, while providing coverage to over half the population. Given that Canada's GDP is roughly 1.5 times that of South Korea, the proportional size of the investment would be even smaller.

No need to go overboard

Now, 1 Gbps may be an investment in the future, but in this context one must certainly mean the distant future; for the fact is that 1 Gbps is not just extremely fast, it is gratuitously fast. To put it in perspective, a network connection of that speed would be able to simultaneously carry between 50 and 200 HDTV channels (depending on quality and compression). An investment in Canada or the US to provide connectivity at 100 Mbps (the current Korean high-end class of connectivity) would require a much lower cost, while still providing connections 10 to 50 times faster than the current residential standard of 2 to 10 Mbps. I'd settle for that. So why doesn't it happen?