Tuesday, December 28, 2010

The Joy of Programming

It's ridiculous how good I can feel after accomplishing a decent programming task smoothly. I almost feel a bit foolish getting pleasure from something so ... geeky. But it's a bit like a good run. It feels like you've done something worthwhile, even if it only benefits you.

Last evening, with the new MacBook Air on my lap, I cleaned up one of the last parts of the jSuneido compiler that I was uncomfortable with. I had already figured out what I wanted to do, and for once I didn't run into any unexpected problems, finishing tidily in the amount of time I had.

Previously, I'd been building a list of constants (other than strings and int's that Java handles) and then stuffing this list into the instance of the compiled class. Then, to speed things up in the generated code, I loaded an array of the constants into a local variable. It worked ok, but a lot of functions didn't need any constants. Unfortunately, I had to generate the load before I knew if there would be any. And it would be tricky to determine ahead of time if there would be any. Plus, ideally, constants should be static final so the optimizer would know they were constant.

With the new changes, I create a static final field for each constant, and then generate a  class initializer that initialized them. Accesses to the constants are simple GETSTATIC instructions. The only trick was how to get the list of constants from the compiler to the class initializer. I ended up using a static field (basically a global variable). A bit ugly, but I couldn't come up with a better way. To handle multi-threading I used a ThreadLocal. (My first thought was a ConcurrentHashMap keyed by ClassLoader, but ThreadLocal was simpler.)

I don't know how much difference this will make to performance, but I really wanted to generate code that was clean and similar to compiled Java. I think I'm pretty much there. (Apart from boxed integers and the extra layer of method/call dispatch.)

Damn! Reviewing the code as I write this I realize I missed something. (I use that list of constants in the instance for a couple of other minor things.) Oh well, can't take away how good I felt when I thought it had all gone smoothly! Good thing I'm not superstitious or I'd think I'd jinxed myself talking about how well it had gone :-) Hopefully it won't be too hard to handle.

Monday, December 27, 2010

jSuneido Catching Up with cSuneido

It's been a while since I paid attention to the relative speed of jSuneido versus cSuneido. I just realized that with my recent optimizations, jSuneido now runs the standard library test suite in pretty much the same time as cSuneido.

In some ways that's not surprising, the JVM is a sophisticated platform - the garbage collector is very fast and multi-threaded, the JIT compiles to optimized machine code, etc.

On the other hand, Java forces a lot more overhead than C++. Integers have to be boxed, classes can only be nested by reference not embedding, you can't use pointers, you can't allocate objects on the stack, and so on. It's impressive that the JVM can overcome these "drawbacks".

It's nice to reach this point. And a nice confirmation that I wasn't totally out to lunch deciding to use Java and the JVM to re-implement Suneido.

Of course, the real advantage of jSuneido over cSuneido is that the server is multi-threaded and will scale to take advantage of multiple cores.

Sunday, December 26, 2010

informIT eBooks

As I've written before (here and here), I prefer to buy ebooks without DRM, not because I want to pirate them, but because dealing with DRM is a pain in the you-know-where, especially when you have multiple devices you want to access them from.

This is (as far as I know) almost impossible for mainstream books. But for computer books it's not too bad. Both Pragmatic Programmers and O'Reilly sell DRM free ebooks. And I've recently started buying ebooks from informIT (mostly Addison-Wesley ones).

The other issue with ebooks is format. The Kindle currently only handles mobi and pdf. And pdf's are generally painful to read on ebook readers because they have a fixed, usually large (e.g. 8.5 x 11) page size, with big margins. Because of this I bought one of the bigger 9.7" Kindle DX's, and it helps, but it's still not great.

The Kindle helpfully trims white margins on pdf's, but the problem is that the ebooks informIT is providing are images of the paper books so they have stuff like page numbers and titles in the top and bottom margins which prevent trimming from working properly. And worse, the stuff in the margins is not symmetrical so odd pages end up trimmed differently than even pages which means the zoom level keeps changing. Not the end of the world, but annoying.

Both Pragmatic and O'Reilly provide ebooks in epub, mobi, and pdf. But informIT only provides epub and pdf. At first I made do with the pdf's but eventually I got fed up and started looking for alternatives.

I thought about looking for some tool to process the pdf's and crop the margins. But that's not the only problem. Kindle doesn't let you highlight sections of text in pdf's, only in mobi. And the pdf's I've had from informIT haven't had any hyperlinks (e.g. from the table of contents) although pdf's are able to do that.

I took a look at the epub's from informIT and they seemed better, at least they were hyperlinked. So I looked for tools to convert epub to mobi. Stanza was one that was suggested and I already had it installed so I gave it a try. I like Stanza but it didn't do a very good job of converting these epubs to mobi.

The other tool that was suggested was Calibre. A plus for Calibre is that it's an open source project. The user interface is a little confusing but it did a great job of converting epub to mobi, including the cover image and hyperlinks. And it even recognized when I plugged in my Kindle and let me copy the books over. I downloaded the epub versions of all my informIT books and converted them - a big improvement.

I wish I'd got fed up sooner! I could have avoided a lot of painful pdf reading.

Although I'm happy to be able to get Addison-Wesley ebooks from informIT, I do have a minor complaint - their sleazy marketing. I got an email offering me a limited time 50% off deal. Sounded good to me so I found a few books I'd been planning to get. They came to something like $80. When I entered the 50% off code the price only dropped to around $70. That's a very strange 50%. Looking closer I found that you normally get 30% off, and over $55 it goes up to 35%. So the 50% deal was only 15% better than usual. I realize this is a common marketing scam, but it still annoys me. O'Reilly plays some similar games but not quite as bad.

Friday, December 24, 2010

jSuneido Compiler Optimizations

For my own reference, and to justify my month of work, I thought I'd go over some of the optimizations I've done recently. The changes involve:
  • how functions/methods/blocks are compiled
  • how calls are compiled
  • the run-time support code e.g. in Ops, SuValue, SuClass, SuInstance, etc.
My original scheme was that arguments were passed in an array i.e. in the style of Java's variable arguments. For simplicity additional local variables were also stored in this array. This had a nice side benefit that the array could be used to capture the local variables for blocks (aka closures).

The downside is that every call had to allocate an argument array, and often re-allocate it to accommodate default arguments or local variables. And access to variables requires larger and presumably slower code. (ALOAD, ICONST, AALOAD/STORE instead of just ALOAD/STORE)

To keep the calling code smaller I was generating calls via helper functions (with variations for 1 to 9 arguments) that built the argument array. One drawback of this approach is that it added another level to the call stack. This can impede optimization since the Hotspot JVM only in-lines a limited depth.

The first step was to compile with Java arguments and locals for functions (and methods) that did not use blocks.

Then I realized that if a block didn't share any variables with its containing scope, it could be compiled as a standalone function. Which is nice because blocks require allocating a new instance of the block for each instantiation, whereas standalone functions don't since they are "static". And it meant that the outer function could then be compiled without the argument array. Determing if a function shares variables with blocks is a little tricky e.g. you have to ignore block parameters, except when you have nested blocks and an inner block references an outer blocks parameter. See AstSharesVars

Where I still needed the argument array I changed the calling code to build it in-line, without a helper function. This is what the Java compiler does with varargs calls. Generating code that is similar to what javac produces is a good idea because the JVM JIT compiler is tuned for that kind of code.

One of the complications of a dynamic language is that when you're compiling a call you don't know anything about what you'll end up calling.

On top of this, Suneido allows "fancier" argument passing and receiving than Java. This means there can be mismatches where a call is "simple" and the function is "fancy", or where the call is "fancy" and the function is simple. So there needs to be adapter functions to handle this. But you still want a "simple" call to a "simple" function to be direct.

Each Suneido function/method is compiled into a Java class that has Java methods for simple calls and fancy calls. If the Suneido function/method is simple, then it implements the simple method and the fancy method is an adapter (that pulls arguments out of a varargs array). (for example, see SuFunction2, the base class for two argument functions) If the Suneido function/method is fancy, then it implements the fancy method and the simple method is an adapter (that builds an argument array).

In cSuneido, all values derive from SuValue and you can simply call virtual methods. But jSuneido uses native types e.g. for strings and numbers. (To avoid the overhead of wrappers.) So you need something between calls and implementations that uses "companion" objects for methods on Java native types. For example, Ops.eval2 (simple method calls with two arguments) looks like:

Object eval2(Object x, Object method, Object arg1, Object arg2) {
    return target(x).lookup(method).eval2(x, arg1, arg2);
}

For SuValue's (which have a lookup method) target simply returns x. For Java native types it returns a companion object (which has a lookup method). lookup then returns the method object.

One issue I ran into is that now I'm using actual Java locals, the code has to pass the Java verifier's checking for possibly uninitialized variables. I ran into a bunch of our Suneido code that wouldn't compile because of this. In some cases the code was correct - the variable would always be initialized, but the logic was too complex for the verifier to confirm it. In other cases the code was just plain wrong, but only in obscure cases that probably never got used. I could have generated code to initialize every variable to satisfy the verifier, but it was easier just to "fix" our Suneido code.

The "fast path" is now quite direct, and doesn't involve too deep a call stack so the JIT compiler should be able to in-line it.

The remaining slow down over Java comes from method lookup and from using "boxed" integers (i.e. Integer instead of int). The usual optimization for method lookup is call site caching. I could roll my own, but it probably makes more sense to wait for JSR 292 in Java 7. Currently, there's not much you can do about having to use boxed integers in a dynamic language. Within functions it would be possible to recognize that a variable was only ever an int and generate code based on that. I'm not sure it's worth it - Suneido code doesn't usually do a lot of integer calculations. It's much more likely to be limited by database speed.

This Developer's Life

This Developer's Life - Stories About Developers and Their Lives

I just came across this podcast and started listening to it. So far I'm really enjoying it. It's nice to hear more of the human, personal side to "the software life". And I enjoy the music interspersed. Check it out.

I also listen to Scott's Hanselminutes but it's a totally different podcast, much more technical and Microsoft focused. (I use it to keep up on Microsoft technology, which I feel I should be aware of even if I tend to avoid it.)

Wednesday, December 22, 2010

Optimizing jSuneido Argument Passing - part 2

I think I've finished optimizing jSuneido's argument passing and function calling. The "few days" I optimistically estimated in my last post turned into almost a month. I can't believe it's taken me that long! But I can't argue with the calendar.

I gained about 10% in speed. (running the stdlib tests) That might not seem like much, but that means function call overhead was previously more than 10%, presumably quite a bit more since I didn't improve it that much. Considering that's a percentage of total time, including database access, that's quite an improvement.

When I was "finished", I wanted to check that it was actually doing what I thought it was, mostly in terms of which call paths were being used the most.

I could insert some counters, but I figured a profiler would be the best approach. The obvious choice with Eclipse is their Test and Performance Tools Platform. It was easy to install, but when I tried to use it I discovered it's not supported on OS X. It's funny - the better open source tools get, the higher our expectations are. I'm quite annoyed when what I want isn't available!

NetBeans has a profiler built in so I thought I'd try that. I installed the latest copy and imported my Eclipse project. I had some annoying errors because NetBeans assumes that your application code won't reference your test code. But include the tests as part of the application so they can be run outside the development environment. I assume there's some way to re-configure this, but I took the shortcut of simply commenting out the offending code.

I finally got the profiler working, but it was incredibly slow, and crashed with "out of memory". I messed around a bit but didn't want to waste too much time on it.

I went back to manually inserting counters, the profiling equivalent of debugging with print statements. I got mostly the results I expected, except that one of the slow call paths was being executed way more often than I thought it should be.

So I was back to needing a profiler to track down the problem. I'd previously had a trial version of YourKit, so this time I downloaded a trial version of JProfiler. It ran much faster than the NetBeans profiler and gave good results. But unfortunately, it didn't help me find my problem. (Probably just due to my lack of familiarity.)

I resorted to using a breakpoint in the debugger and hitting continue over and over again checking the the call stack each time. I soon spotted the problem. I hadn't bothered optimizing one case in the compiler because I assumed it was rare. And indeed, when I added counters, it was very rare, only occurring a handful of times. The problem was, those handful of occurrences were executed a huge number of times. I needed to be optimizing the cases that occurred a lot dynamically, not statically.

Although I only optimize common, simple cases, they account for about 98% of the calls, which explains why my optimizations made a significant difference.

One of my other questions was how many arguments I needed to optimize for. I started out with handling up to 9 arguments, but based on what I saw, the number of calls dropped off rapidly over 3 or 4 arguments, so I went with optimizing up to 4.

I can't decide whether it's worth buying YourKit or JProfiler. It doesn't seem like I want a profiler often enough to justify buying one. Of course, if I had one, and learnt how it worked, maybe I'd use it more often.

Monday, December 20, 2010

Append Only Databases

The more I think about the redirection idea from RethinkDB the more I like it.

The usual way to implement a persistent immutable tree data structure is "copy on write". i.e. when you need to update a node, you copy it and update the copy. The trick is that other nodes that point to this one then also need to be updated (to point to the new node) so they have to be copied as well, and this continues up to the root of the tree. The new root becomes the way to access the new version of the tree. Old roots provide access to old versions of the tree. In memory, this works well since copying nodes is fast and old nodes that are no longer referenced will be garbage collected. But when you're writing to disk, instead of just updating one node, now you have to write every node up the tree to the root. Even if the tree is shallow, as btrees usually are, it's still a lot of extra writing. And on top of the speed issues, it also means your database grows much faster.

The redirection idea replaces creating new copies of all the parent nodes with just adding a "redirection" specifying that accesses to the old version of the leaf node should be redirected to the new leaf node. A new version of the tree now consists of a set of redirections, rather than a new root. And you only need a single redirection for the updated leaf node, regardless of the depth of the tree. There is added overhead checking for redirection as you access the tree, but this is minimal, assuming the redirections are in memory (they're small).

One catch is that redirections will accumulate over time. Although, if you update the same node multiple times (which is fairly common) you will just be "replacing" the same redirection. (Both will exist on disk, but in memory you only need the most recent.)

At first I didn't see the big advantage of redirection. I could get similar performance improvements by lazily writing index nodes.

But the weakness of delayed writes is that if you crash you can't count on the indexes being intact. Any delayed writes that hadn't happened yet would be lost.

The current Suneido database has a similar weakness. Although it doesn't delay writes, it updates index nodes, and if the computer or OS crashes, you don't know if those writes succeeded.

The current solution is that crash recovery simply rebuilds indexes. This is simple and works well for small databases. But for big databases, it can take a significant amount of time, during which the customer's system is down. Crashes are supposed to be rare, but it's amazing how often it happens. (bad hardware, bad power, human factors)

Of course, you don't need the redirection trick to make an append only index. But without it you do a lot more writing to disk and performance suffers.

Even with an append only database, you still don't know for sure that all your data got physically written to disk, especially with memory mapped access, and writes being re-ordered. But the advantage is that you only have to worry about the end of the file, and you can easily use checksums to verify what was written.

On top of the crash recovery benefits, there are a number of other interesting benefits from an append only database. Backups become trivial, even while the database is running - you just copy the file, ignoring any partial writes at the end. Replication is similarly easy - you just copy the new parts as they're added to the database.

Concurrency also benefits, as it usually does with immutable data structures. Read transactions do not require any locking, and so should scale well. Commits need to be appended to the end of the database one at a time, obviously, but writing to a memory mapped file is fast.

Another nice side benefit is that the redirections can also be viewed as a list of the changed nodes. That makes comparing and merging changes a lot easier than tree compares and merges. (When a transaction commits, it needs to merge its changes, which it has been making on top of the state of the database when it started, with the current state of the database, which may have been modified by commits that happened since it started.)

Ironically, I was already using a similar sort of redirection internally to handle updated nodes that hadn't been committed (written) yet. But I never made the connection to tree updating.

I love having a tricky design problem to chew on. Other people might like to do puzzles or play computer games, I like to work on software design problems. I like to get absorbed enough that when I half wake up in the night and stagger to the bathroom, my brain picks up where it left off and I start puzzling over some thorny issue, even though I'm half asleep.

Sunday, December 05, 2010

The Deep Threat

"It was an illuminating moment: the deep threat isn’t losing my job, it’s working on something for which I lack passion."

- John Nack

Thursday, November 25, 2010

Social Search?

"search is getting more social every day and tomorrow's recommendations from people you know via Facebook are infinitely more valuable than search results from yesterday's algorithm"
- Publishing needs a social strategy - O'Reilly Radar:

Really? What kinds of searches are we talking about? When I search for some technical question or someone searches for e.g. a solution to an aquarium problem, is Facebook really going to help? Personally, I think I'd rather have "yesterday's algorithm".

Sure, if I'm looking for something like a restaurant recommendation then I'd be interested in what my friends have to say. But unless you have a huge, well travelled circle of friends, how likely is it that they'll have recommendations for some random city you're in? And if the recommendations aren't coming from friends, then we're back to regular search.

This "everything is social" craze drives me crazy. Believe it or not, Facebook is not the ultimate answer to every problem.

Tuesday, November 23, 2010

Goodbye Google App Engine

Goodbye Google App Engine

Definitely a different picture than Google paints.

I would think twice about using Google App Engine after reading this.

Maybe we'll just stick with Amazon EC2

Sunday, November 21, 2010

Optimizing jSuneido Argument Passing

Up till now jSuneido has always passed arguments as an Object[] array, similar to a Java function defined as (Object... args)

Suneido has features (e.g. named arguments) that sometimes requires this flexible argument passing. But most of the time it could use Java's standard argument passing.

I've wanted to optimize this for a while, and it was one of the motivations for the compiler overhaul (see part 1 and part 2).

Having finished adding client mode, I sat down to start working on optimizing argument passing. When I started to think about what was required, it began to seem like a fair bit of work.

I decided that first I should determine if the change was worthwhile. (In the back of my mind I was thinking it probably wouldn't be that much better and I wouldn't have to implement it.)

First I measured what percentage of calls were simple enough to optimize. I was surprised to see it was about 90%. But it makes sense, most calls are simple.

Next I measured what kind of improvement the optimization would give. For a function that didn't do much work, I got about a 30% speedup. (Of course, for functions that did a lot of work the argument passing overhead would be negligible and there would be little or no speedup.)

Those figures were just from quick and dirty tests but I was only looking for rough orders of magnitude results.

The changes avoid allocating and filling an argument array and also avoid the argument "massaging" required for more complex cases. The calls are simpler and use less byte code. This means they have a better chance of being inlined by the JVM JIT byte code compiler. The calls are also more similar to those produced by compiling Java code and therefore have a better chance of being recognized and optimized by the JIT compiler.

To me, these numbers justified spending a few days making the changes. Thankfully, I was able to implement it in a way that allowed me to transition gradually. So far I have optimized the argument passing for calling built-in functions. I still have some work to do to handle other kinds of calls, but the general approach seems sound. I don't think it'll be quite as bad as I thought.

You can see the code on SourceForge

One Bump in an Otherwise Smooth iMac Upgrade

I recently upgraded my 24" Core 2 Duo iMac to a 27" i7 iMac.

My usual rule of thumb is not to upgrade till the new machine will be twice as fast as the old one. But that rule is getting harder to judge. For a single thread, the i7 is not twice as fast. But with 8 threads (4 cores plus hyper-threading) it definitely has the potential to run much more than twice as fast. The iPhone MacTracker app shows the new machine with a benchmark of 9292 versus the old machine at 3995. I'm not sure what benchmark that's using.

I've been thinking about upgrading for a while but as winter arrives and I spend more time on the computer, I figured it was time. Another motivation for the upgrade was to be able to test the multi-threading in jSuneido better.

I really wanted to get the SSD drive to see how that would affect performance. But that option is still very expensive. Especially since I'd still need a big hard drive as well for all my photographs. So I didn't go for it this time. I'm not sure how big a difference it would have made for me. My understanding is that it mostly speeds up boot and application start times. But I generally boot only once a day and tend to start apps and then use them for a long time. It would be interesting to see how Suneido's database performed on an SSD.

I did upgrade from 4gb of memory to 8gb. Most of the time 4gb is fine, but when I run two copies of Eclipse (Java and C++), Windows under Parallels, Chrome with a bunch of tabs, etc. then things started to get noticeably sluggish. With the new machine things don't seem to slow down at all. And I can allocate more memory and cores to my Windows virtual machine. I also upgraded from a 1tb hard disk to 2tb. I hadn't filled up the 1tb, but I figure you can never have too much hard disk space and the cost difference wasn't that big.

Migrating from one machine to the other went amazingly smoothly and quickly - the easiest migration I've done. With Windows machines I never try to migrate any settings or applications since you need to clean everything out periodically anyway. I'm sure even with OS X there is a certain amount of "junk" accumulating (e.g. old settings) but it doesn't seem to cause any problems.

I used OS X's migration tool but I wasn't sure what method to connect the machines - Firewire, direct network connection, network via LAN, or via Time Machine backup. In the end I went with migrating from the Time Machine backup, partly because it didn't tie up my old machine and I could keep working.

Some estimates from the web made me think it might take 10 or 20 hours to migrate roughly 600gb, but it was closer to 2 hours - nice.

The one speed bump in this process was my working copy of jSuneido. I keep this Eclipse workspace in my DropBox so I can access it from work (or wherever). Because I migrated from a Time Machine backup, my new workspace was a few hours out of date. DropBox on the new machine copied these old files over the newer ones. Then DropBox on the old machine copied the old files over it's newer ones. So now both copies were messed up. No big deal - DropBox keeps old versions, I'd just recover those. Except I couldn't figure out any easy way to recover the correct set of files without a lot of manual work. (I could only see how to restore one manually selected file at a time, with no way to easily locate the correct set of files that needed to be restored.) No problem - I'd get the files back from version control. Except for some reason I couldn't connect to version control anymore. Somehow the unfortunate DropBox syncing had messed up something to do with the SSL keys. Except the keys were still there since I could check out a new copy from version control. Eventually, after a certain amount of thrashing and flailing I got a functional workspace. I still ended up losing about 2 hours of work, but thankfully it was debugging work driven by failing tests and it didn't take long to figure out / remember the changes.

Although the new 27" display is only about 10% bigger than the old 24", the resolution has increased from 1920 x 1200 to 2560 x 1440 - almost a third bigger, and quite a noticeable difference. But because of the higher DPI resolution, everything got smaller. As my eyes get older, smaller text is not a good thing!

After all these years, with all the changes in display sizes and resolutions, you'd think we'd have better ways to adjust font sizes. Most people simply resort to overriding the display resolution to make things bigger, but that's a really ugly solution. But I can see why they do it. There's no easy way in OS X to globally adjust font sizes. You can only tweak them in a few specific places. Windows is actually a little better in this regard, but still not great. And even if you manage to change the OS, you still run into applications that disregard global settings.

And history continues to repeat itself. iPhone apps were all designed for a specific fixed pixel size and resolution. Then the iPad comes along and the "solution" is an ugly pixel doubling. Then the higher resolution retina display arrives and causes more confusion. When will we learn!

Even Eclipse, that lets you tweak every setting under the sun, has no way to adjust the font size in secondary views like the Navigator or Outline. This has been a well known problem in Eclipse since at least 2002 (e.g. this bug or this one) but they still haven't done anything about it. I'm sure it's a tricky problem but how hard can it be? Is it really harder than all the other tricky problems they solve? Surely there's something they could do. Instead they seem to be more interested in either denying there's a problem, arguing about which group is responsible for it, or reiterating all the reasons why it's awkward to fix.

Of course, it's open source, so the final defense is always - fix it yourself. Sure, I'll dive into millions of lines of code and find the exact spot and way to tweak it. I think it might be just a little easier for the developers that spend all there time in there to do it. I won't ask them to fix the bugs in my open source code, if they don't ask me to fix the bugs in their open source code.

On the positive side, Eclipse's flexibility with arranging all the panes virtually any way you want lets me take advantage of the extra space. It's a little funny though, because even with a big font, the actual source code pane is only about 1/3 of the screen. It's nice to have so many other panes visible all the time, but there are times when it would be nice to hide (or "dim") all the other stuff so I could focus on the code.

All in all, I'm pretty happy with the upgrade.

Note: No computers were killed in this story :-)  Shelley is taking over my old iMac and her nephew is taking over her old Windows machine.

Friday, November 19, 2010

Giving Up on Amanda

For the last few months we've been trying to implement Amanda for backups in our office.

The two main choices for open source backups seem to be Amanda and Bacula. Amanda was supposed to be easier to set up than Bacula so that's what we chose. (There are also commercial options but they tend to be expensive and even less flexible.)

Unfortunately, it hasn't gone smoothly. Every time we think we have it working it starts failing semi-randomly. Certain workstations will fail some nights with cryptic errors, then work other nights without us changing anything.

We've had some support from the Amanda community. At one point they suggested running a new beta version which appeared to solve some problems, but not all of them.

To add to the problems, when we upgraded Linux, Amanda broke. That's certainly not unique to Amanda, but it's yet another hassle.

I'm sure Amanda works well for a lot of people. Presumably there's something different in our server or network or workstations that leads to the problems. But that doesn't help us. We have a medium size network of about 60 machines - not small, but not especially big either. We're not Linux experts, but we're not totally newbies either.

I'm sure we could solve the current problems eventually, but I've lost confidence. It just seems like we'd have to expect more problems in the future. And it's complex enough that if the person that set it up wasn't here, we'd be lost.

For some things this might be acceptable, but for backups I want something that "just works", that I can count on to be reliable and trouble free. For us, Amanda just doesn't appear to be the answer.

For the last ten years or so we've been using a home-brew backup system. It's simple, but that's a good property. It has reliably done the job. And when we needed to adjust it we could. And it was simple enough that even unfamiliar people could dive in and grasp what it was doing and figure out how to change it or fix it.

The reason we tried to move to Amanda is that we wanted to improve our system. Currently we rely on the server "pulling" backups via open shares. But for security we want to get rid of the open shares which means the workstations have to "push" backups. At the same time, we wanted to start encrypting backups on the workstations. In theory Amanda will do what we want.

I finally decided to pull the plug on trying to use Amanda. And as crazy as it might sound, we're going to try building a new home-brew system. I don't think it'll take us any more time than what we spent on trying to use Amanda.

You might think backups are too critical to trust to a home-brew system. But I'm more willing to trust a simple transparent solution that I understand, rather than someone else's complex black box. (Technically Amanda's not a black box since it's open source, but practically speaking we're not likely to spend the time to figure out how it works.)

And of course, we'll use Suneido to implement it. Suneido actually fits the requirements quite well - we can use it for a central server database, and run a client on the workstations. It's small and easy to deploy, and of course, we're very familiar with it. We'll see how it goes.

RethinkDB

I just listened to a podcast from the MySQL conference from RethinkDB about better database storage engines. Apart from being an interesting talk, a lot of what they were talking about parallels my own ideas in Suneido.

For example, they talk about log structured append-only data storage. Suneido's database has always worked this way.

Next they talk about append-only indexes. Suneido does not have this, but it is something I've been thinking about. (see my post  A Faster, Cleaner Database Design for Suneido). They have a different idea for reducing index writes. It's an interesting solution, but more complex. It won't be as fast as delaying writes, but it would allow crash recovery without rebuilding indexes (as Suneido requires).

I mostly arrived at these design ideas from basic principles e.g. immutable structures are better for concurrency. But it sounds like the file system folks have been working on a lot of the same ideas. It's hard to keep up with everything that might possibly be relevant.

The other interesting part of this talk was the idea that there are a lot of pieces that have to work together and performance depends on the combination. This means there is a huge possibility space that is hard to explore.

Wednesday, November 17, 2010

Java, Oracle, GUI, and jSuneido

A Suneido developer had a few questions that he thought I should blog about, so here goes. (Disclaimer: I'm not an expert on Oracle or Java and don't have any inside knowledge.)

What will happen with Java with Oracle buying Sun? Do I regret choosing Java to rewrite Suneido?

Oracle buying Sun does make me a little nervous but I don't think Java is going to go away, there is too much of it out there.

I don't regret choosing Java to rewrite Suneido. There aren't a lot of mainstream alternatives - .Net would be a possibility, with Mono on Linux. I think .Net is a good platform, but I like being tied to Microsoft even less than I like being tied to Oracle. And I think Java is less tied to Oracle than .Net is tied to Microsoft.

There are, of course, other alternatives like LLVM or Parrot, but they don't have the same kind of support behind them.

I've heard persuasive arguments (albeit mostly from Oracle) that it is to Oracle's benefit to keep Java alive and "open" since much of Oracle's software is written in Java. They might try to charge for stuff but probably at the enterprise end, which doesn't bother me too much.

I do wish Java moved a little faster. Java 7 is taking forever, and now a lot of it has been postponed to Java 8. Meanwhile, Microsoft has moved surprizingly quickly with advancing .Net. On the positive side, the new features in Java 7 for dynamic languages (JSR 292) will be very nice. And I do want stability, so I can't complain too much.

I don't think the Oracle buyout has any effect on jSuneido in the short term. I'm using Java 6 which is readily available.

What platform are you using to develop jSuneido?

I do most of my development on Mac OS X using Eclipse. I run the Windows cSuneido client using Parallels. I'm pretty happy with this. The only minor hassle was that Apple was really slow to release new versions of Java. And Sun/Oracle don't directly release OS X versions. You could get new OS X versions of Java from other places but it was an extra hassle. Now Apple has announced that they won't be distributing Java any more. This isn't a big deal - Microsoft doesn't distribute Java either. It would be nice if Oracle would add OS X as one of their supported platforms. (Not just on Open JDK.)

But this is only the development environment. In terms of deploying the jSuneido server I expect it will be mostly Windows and Linux. There aren't many people using OS X for servers. And Apple just discontinued their rack mount server.

I also do some development on my Windows machine at work. There are slight differences, but mostly I can use the identical Eclipse setup.

I haven't done any testing on Linux. I would hope it would be fine but there could be minor issues. If we were only working on Windows then there might be more, but Linux shouldn't be too different from OS X.

We are getting close to switching our in-house accounting/crm system over to jSuneido. This will be a good test since we have about 40 users. Currently we have a Windows server to run this system and a Linux server for other things. Once we are running on jSuneido we are hoping to get rid of the Windows server and just use the Linux one. So Linux support is definitely coming.

Where can I get a copy of jSuneido?

Currently, the only way to get it is as source code from Mercurial on SourceForge:

http://suneido.hg.sourceforge.net:8000/hgroot/suneido/jsuneido

I haven't started posting pre-built jar file releases yet, but probably soon. If anyone is interested in getting a copy to experiment with, just let me know and I can send you one.

What about a GUI for jSuneido?

Currently, jSuneido does not have any GUI. cSuneido's GUI is Windows based so it's not portable.

In the long run it would be nice to add a GUI to jSuneido. Then, eventually, I could stop supporting cSuneido.  Maintaining two parallel versions is a lot of extra work.

The conventional Java approaches would be Swing or SWT.

Another idea would be to try to use the newer GUI system from Java FX, but there's not much support for using it outside of FX yet.

Another possibility would be to switch to a web based GUI, even for local use. That's an intriguing idea that would be fun to investigate. The downside is that instead of just Suneido code, you'd have HTML and CSS and JavaScript and AJAX. Not exactly the self-contained model that Suneido has had up till now.

The bigger issue for us is that we have a lot of code based on the old GUI system. So a priority for us would be to minimize the porting effort.

Sunday, November 14, 2010

Upgrading Eclipse to Helios (3.6)

I recently upgraded my development environment for jSuneido to Helios (version 3.6) from Galileo (3.5).

Helios has been out since June but I needed to wait for the plugins I use to be updated. Actually, this was one of the things that nudged me to update since one of my plugins started giving errors on Galileo after they updated it for Helios.

It went quite smoothly. The plugins I use are:
  • Mercurial Eclipse
  • Bytecode Outline
  • EclEmma Java Code Coverage
  • FindBugs
  • Metrics (State of Flow)

A new version of Eclipse used to mean a bunch of great new stuff. But like most software products, it's matured and development has slowed down, at least in terms of major new features. In normal usage I didn't notice much difference.

One welcome addition is the Eclipse Marketplace (on the Help menu with the other update functions). EclEmma, Bytecode Outline, Mercurial Eclipse, and FindBugs can all be installed through the marketplace, which is a lot nicer since you don't have to go to their web site, find the url of the update site, copy it, and then paste it into Eclipse. The other plugins show up in the marketplace, but don't have an install button. I'm not sure why, but it's a new feature so you have to expect some hiccups.

A minor complaint is that the marketplace is implemented as a wizard, even though it isn't really a multi-step process. Wizards can be a reasonable approach, but I think they're overused sometimes.

Tuesday, November 09, 2010

Email Overload

And you thought you had too much email :-)


This was on the latest Thunderbird. Not sure what I did to trigger it - looks like some kind of overflow.

Friday, October 29, 2010

jSuneido Compiler Overhaul Part 2

After my last post I was thinking more and more about switching from single pass compiling to compiling via an Abstract Syntax Tree (AST).

I wasn't really intending to work on this, but I had a bit of time and I thought I'd just investigate it. Of course, then I got sucked into it and 10 days later I have the changes done. I'm pretty happy with the results. A big part of getting sucked into it was that it became even more obvious that there were a lot of advantages.

Previously I had lexer, parser, and code generator, with the parser directly driving the code generator. Now I have lexer, parser, AST generator, and code generator.

I'm very glad that from the beginning I had separated the parser from the actions. The parser calls methods on an object that implements an interface. This seems obvious, but too many systems like YACC/Bison and ANTLR encourage you to mix the actions in with the grammar rules. And doing this does make it easier to at first. But it is not very flexible. Because I'd kept them separate, I could implement AST generation without touching the parser. Although I haven't used it, I understand Tatoo splits the parser and the actions. (Note: Although I initially tried using ANTLR for jSuneido, the current parser, like the cSuneido one, is a hand written recursive descent parser.)

While I was at it, I also split off the parts of the code generator that interface with ASM to generate the actual byte code. The code generator is still JVM byte code specific, but at a little higher level. And now it would be easy to use an alternative way to generator byte code, instead of ASM.

When the parser was directly driving the code generation, I needed quite a few intermediate actions that were triggered part way through parsing a grammar rule. These were no longer needed and removing them cleaned up the parser quite a bit.

Having the entire AST available for code generation is a lot simpler than trying to generate code as you parse. It removed a lot of complexity from the code generation (including removing all the clever deferred processing with a simulated stack). And it let me implement a number of improvements that weren't feasible with the single pass approach.

Despite adding a new stage to the compile process, because it was so much simpler I'm pretty sure I actually ended up with less code than before. That's always a nice result!

The only downside is that compiling may be slightly slower, although I suspect the difference will be minimal. Code is compiled on the fly, but the compiled code is kept in memory, so it only really affects startup, which is not that critical for a long running server.

In retrospect, I should have taken the AST approach with jSuneido right from the start. But at the time, I was trying to focus on porting the C++ code, and fighting the constant urge to redesign everything.

Monday, October 25, 2010

Continuous Deployment

I listened to a podcast recently about a company doing continuous deployment. I was interested because my company does continuous deployment, and not just of a web application, but of a desktop app. From what I can tell, this is still quite rare.

They were asked how frequently they deploy, and they said every two weeks. I was a little surprised.  Obviously it's a long way from yearly releases, but to me, every two weeks is not exactly "continuous".

Part of the issue is that "continuous" often gets mixed up with "automated". For example, continuous build or integration systems are as much about the automation as they are about continuous. But the primary goal is the "continuous" part. The automation is a secondary requirement needed to satisfy the primary goal of continuous.

Of course, "continuous" seldom actually means "continuous". It usually means "frequent". Continuous builds might happen nightly, or might be triggered by commits to version control.

Our continuous deployment means daily, Monday to Thursday nights. We don't deploy Friday in case there are problems and we don't have anyone working on the weekends.

Our big nightmare is that something goes wrong with an update and all of our roughly 500 installations will be down. Of course, we have a big test suite that has to pass, but tests are never perfect. To reduce the risk we deploy to about ten "beta" sites first. Everyone else gets that update two days later. Having ten beta sites down is something we can handle, and they're aware they're beta sites so they're a little more understanding. In practice, we've had very few major problems.

We have a single central version control (similar to Subversion). Anything committed to version control automatically gets deployed. The problem is when we're working on a bigger or riskier change, we can't send it to version control until it's finished. But not committing regularly leads to conflicts and merge issues, and also means we're only tracking the changes with a large granularity and can't revert back to intermediate steps. Plus, version control is the main way we share code. If the changes haven't been sent to version control, it's awkward for other people to get access to them for review or testing. I think the solution would be a distributed version control system like Git or Mercurial where we can have multiple repositories.

I'm looking forward to reading Continuous Delivery although I think the focus is on web applications.

Saturday, October 23, 2010

What, Why, How

Say why not what for version control comments, and comments in code. The code itself tells you "what", the useful extra information is "why". Don't say "added a new method", or "changed the xyz method" - that's obvious from the code. Do say, "changes for the xyz feature" or "fixing bug 1023".

Say what not how in names. i.e. interface and variable names. The users shouldn't need to know "how", they just care about "what". You want to be able to change the "how" implementation, without changing the users. Don't call a variable "namesLinkedList", just call it "namesList". It might currently be a linked list, but later you might want to implement it with a different kind of list.

Sunday, October 17, 2010

jSuneido Compiler Overhaul

Implementing a language to run on the JVM means deciding how to "map" the features of the language onto the features of the JVM.

When I started writing jSuneido it seemed like the obvious way to compile a Suneido class was as a Java class, so that's the direction I took.

The problem is, Suneido doesn't just have classes, it also has standalone functions (outside any class). So I made these Java classes with a single method.

Suneido also has blocks (closures). So I compiled these as additional methods inside the class currently being compiled.

As I gradually implemented Suneido's features, this approach got more and more complex and ugly. It all worked but I wasn't very happy with it. And it became quite fragile, any modification was likely to break things.

So I decided to overhaul the code and take a different approach - compiling each class method, standalone function, or block as a separate Java class with a single method. I just finished this overhaul.

Of course, the end result is never as simple and clean as you envision when you start. It's definitely better, but there are always awkward corners.

Unfortunately, more and more of the improvements I want to make are running into the limitations of single-pass compiling, an approach I carried over from cSuneido. I have a feeling that sooner or later I am going to have to bite the bullet and switch to compiling to an abstract syntax tree (AST) first, and then generate JVM byte code from it. That will open the way for a lot of other optimizations.

Friday, October 15, 2010

Java + Guava + Windows = Glitches

Some of my jSuneido tests started failing, some of them intermittently, but only on Windows. There were two problems, both related to deleting files.

The first was that deleting a directory in the tear down was failing every time. The test created the directory so I figured it probably wasn't permissions. I could delete the directory from Windows without any problems. The test ran fine in cSuneido.

I copied the Guava methods I was calling into my own class and added debugging. I tracked the problem down to Guava's Files.deleteDirectoryContents which is called by Files.deleteRecursively. It has the following:

// Symbolic links will have different canonical and absolute paths
if (!directory.getCanonicalPath().equals(directory.getAbsolutePath())) {
    return;
}

The problem was that getCanonicalPath and getAbsolutePath were returning slightly different values, even though there was no symbolic link involved - one had "jsuneido" and the other had "jSuneido". So the directory contents wasn't deleted so the directory delete failed. From the Windows Explorer and from the command line it was only "jsuneido". I even renamed the directed and renamed it back. I don't know where the upper case version was coming from. It could have been named that way sometime in the past. I suspect the glitch may come from remnants of the old short and long filename handling in Windows, perhaps in combination with the way Java implements these methods on Windows.

I ended up leaving the code copied into my own class with the problem lines removed. Not an ideal solution but I'm not sure what else to do.

My other thought at looking at this Guava code was that if that test was extracted into a separate method called something like isSymbolicLink, then the code would be clearer and they wouldn't need the comment. And that might make it slightly more likely that someone would try to come up with a better implementation.

The other problem was that new RandomAccessFile was failing intermittently when it followed file.delete. My guess is that Windows does some of the file deletion asynchronously and it doesn't always finish in time so the file creation fails because the file exists. The workaround was to do file.createNewFile before new RandomAccessFile. I'm not sure why this solves the problem, you'd think file.createNewFile would have the same problem. Maybe it calls some Windows API function that waits for pending deletes to finish. Again, not an ideal fix, but the best I could come up with.

Neither of these problems showed up on OS X. For the most part Java's write-once-run-anywhere has held true but there are always leaky abstractions.

Tuesday, September 28, 2010

Using ProGuard on jSuneido

ProGuard is a free Java class file shrinker, optimizer, obfuscator, and preverifier.

I'd seen a few mentions of ProGuard and then when I was updating to the latest version of the Guava library I saw that they were recommending it. So I decided to try using it on jSuneido.

It shrunk the size of the jSuneido jar from 2.5 mb to 1.4 mb.  That's pretty good for an hour or two of setup.

I haven't tried the optimization yet, partly because the Guava instructions disabled it. I'm not sure why.

ProGuard also does preverification which should improve startup time, although that's not a big issue for the server usage that jSuneido is aimed at.

It took some trial and error to get a configuration file that worked. I'm not sure it's optimal yet. I still get some warnings from Guava (mentioned in their instructions) and from jUnit, but they don't appear to cause any problems.

Thursday, September 23, 2010

Eclipse += Mercurial

It wasn't quite as easy as I would have liked to get the MercurialEclipse plugin working, but it certainly could have been worse.

It was easy to install the Eclipse plugin using the update site: http://cbes.javaforge.com/update

But when I tried to import my project from Mercurial I got a bunch of errors about passwords.

I found a post from someone with the same problem, with a pointer to instructions on how to solve it.

I guess the simple username + password doesn't work with MercurialEclipse, at least on OS X. Sigh.

The answer is to use an SSH key, which I'd avoided till now, but it turned out to be straightforward with the help of the SourceForge instructions. It sounds like it might be a bit trickier on my Windows box - I haven't tried that yet.

I tweaked a few things for the new setup, committed those changes, and pushed the changes to the SourceForge repository. Seems like I'm good to go :-)

The Revenge of the Intuitive

Interesting article by Brian Eno on the user experience problem of too many options, a problem we struggle with all the time with our software. The challenge is that it is the users themselves that keep asking for more options, despite the fact that it ends up making the software harder to use.

Wired 7.01: The Revenge of the Intuitive

Tuesday, September 21, 2010

Moving jSuneido from Subversion to Mercurial

I've been planning on moving from Subversion to a distributed version control system for a while. Initially I assumed it would be Git since that's what I heard the most about. But I recently listened to a podcast with Eric Sink where he said Mercurial had better Windows support and was simpler to use, especially for Subversion users. An article on InfoQ confirmed this. And Google code has Subversion and Mercurial but not Git. And Joel Spolsky and Fog Creek had picked Mercurial. Joel even wrote a tutorial for Mercurial. So I decided to give Mercurial a try.

When explaining distributed version control a lot of people start by saying there's no central repository, it's peer to peer. This throws off a lot of people because they want a central repository. It's the explanation that's wrong. You can have a central repository, and most people do. The difference is that it's a matter of convention, it's not dictated by the software. And you can also have multiple repositories.

For me, the advantages would be having the complete history locally, even when I'm off-line, and being able to easily branch and merge locally.

Here are the steps I used to convert jSuneido from Subversion to Mercurial. (I haven't converted C Suneido yet, but that shouldn't be too hard, just more history so it'll be slower.)

Download and install Mercurial on OS X from mercurial.selenic.com

Convert the existing SourceForge Subversion repository to a local Mercurial repository:

hg convert http://suneido.svn.sourceforge.net/svnroot/suneido/jsuneido

Ideally you wouldn't convert directly across the network. Instead you'd clone your Subversion repository to your local machine and convert from there. That's primarily so if you have to tweak your conversion and re-run it, you're not dealing with the network delays multiple times.

I was lucky since my Subversion history did not have anything tricky like branches to deal with. And cloning the Subversion repository looked painful. I figured I could probably get away with doing the convert directly. It ended up taking a couple of hours to run.

One nice thing about convert is that you can run it again to pick up incremental changes. (I had to do this once because I had forgotten to send my latest change to Subversion.)

To check out the result I copied the repository to Windows (under Parallels) and installed TortoiseHg. The repository looked reasonable. (A little roundabout, but I wanted TortoiseHg anyway.) TortoiseHg seems to work well.

Enable Mercurial for the Suneido SourceForge project. Add a second repository for jsuneido. See sourceforge.net/apps/trac/sourceforge/wiki/Mercurial

Push the local Mercurial repository to SourceForge.

hg push ssh://amckinlay@suneido.hg.sourceforge.net/hgroot/suneido/jsuneido

I could have also done this with TortoiseHg (but don't look for a "Push" menu option, it's under Synchronize)

I can now browse the repository on SourceForge.

I found the series of three blog posts starting with From Subversion to Mercurial. Part 1, Setting the Stage quite helpful.

Next I have to figure out the Eclipse plugin for Mercurial. I hope it's less hassle than the Subversion one!

Stylebot

Changing the Look of the Web with Stylebot - Google Open Source Blog

A very nice Chrome extension that lets you easily create custom stylesheets for web sites. For example, I often want a different text size for a given site, or to hide certain annoying elements (e.g. ads!)

This is a good example of how Chrome seems to be pulling ahead in the browser competition.

Friday, September 17, 2010

Balsamiq Mockups

I've heard several people say good things about Balsamiq Mockups. I haven't used it myself but I'm planning to give it a try. I like how it produces something that looks like a sketch so people don't get the mistaken idea that it's a finished design.

Saturday, September 11, 2010

What Can't the iPad Do?

The girl next to me on the bus had an iPad, so I asked her how she liked it. I expected the usual "it's great", but instead I got "I don't" I asked her why and she said there was too much stuff it couldn't do. I didn't get a chance to dig deeper.

Someone else I know bought an iPad and immediately got someone to jailbreak it for them. Why? Because there's too much stuff it can't do otherwise. When I asked for examples they couldn't really give me any. This is a non-technical person, it's not that they wanted to do anything special.

Obviously, there are things that an iPad is not ideal for. And there are things that an iPad can't do. But there are a huge number of apps and for most people, I just can't see what it "can't" do. All most people do is email and Internet anyway.

It can't run big programs like Photoshop, but most people don't need that. You can't run Microsoft Office, but you can get word processing and spreadsheets.

The touchscreen keyboard is not great for typing a lot, but most people don't type a lot. And if you really need it you can use a Bluetooth keyboard. Most people are happy typing on their cellphone! (I manage to type quite long blog posts only iPhone.)

I would guess one of the reasons for this idea is that people get overinflated expectations from all the hype. And when it doesn't (can't) live up to them, they have to blame it on something.

Which is where the critics come in. People don't remember the specific criticisms like the lack of multi-tasking or cameras or memory cards. All they remember is that "there's stuff it can't do".

And there's always the most common reason people say you can't do something - simply because they don't know how to do it. We get that all the time with our software. You ask people how they like the software and they say it's ok but they really wish it could do xyz. Half the time it already can, they just didn't know it.


Friday, September 10, 2010

iPhone Undo/Redo

I've was getting ready to write a blog post asking why the iPhone doesn't have undo/redo. Selecting text is tricky and it's easy to delete more than you wanted and then have no way to get it back. But when I did a quick search before writing the post, I found it does have it!

If you shake your iPhone you get an undo/redo popup. It appears you can even undo multiple steps, at least in the Notes app.

I'm surprised I didn't run across some mention of this before now. I'm not surprised I didn't discover it on my own. I don't tend to stop and shake my phone in the middle of typing! Maybe the idea is that you get so frustrated you shake your phone?

Honestly, I think this is poor design. First, it's not discoverable. And when people, even experienced ones, don't discover a feature they're probably going to assume it doesn't exist. Second, when I'm typing I generally hold the phone steady in my left hand and type with my right. Shaking is not a natural action, it's like having to take your hands off the keyboard.

Wouldn't it make sense to put an undo key on the keyboard? It could be on the secondary punctuation layout. At least then it would be discoverable. You could still keep the shake interface - presumably someone at Apple thinks it's great.

Friday, August 13, 2010

Connecting the Dots

Sometimes I'm a little slow :-( 

One of the talks at the JVM conference was about a JVM for mobile devices. It used a single pass compiler with no intermediate representation. Suneido also does this but there are some issues with it. Suneido takes the obvious approach of generating a push when it encounters an operand, and an operation when it encounters an operator (after processing its operands). This works great with simple expressions like "a + b" which compiles to "push a, push b, add". But it falls down for more complex things, even for "a = b" since you don't want to push the value of "a". I work around these issues but the code is ugly. And worse, a lot of the ugliness is in the parser, even though the issue is really with code generation.

I wasn't happy with it but, consciously or not, I assumed it was essential rather than accidental complexity. So I didn't look hard enough for a better solution. . 

So I was curious to hear more about their approach. The topic only got one bullet point, something like "code generation by abstract interpretation". 

I started thinking about what that meant. I'm pretty sure what they meant is that you only generate code when you encounter an operator. When you encounter an operand like a literal or variable, you push it on a compile time stack, "simulating" execution. Then you can defer generation of code until you know what is needed.

As I thought about this, I realized I knew about this technique, I'd read about it before. But for some reason I never connected it to my code generation issues.

So when I got home from the conferences I spent three days refactoring the code generation in jSuneido. It wasn't quite as straightforward as I imagined (it never is!). The parser code is definitely cleaner but I had to add more code to the code generator than I would have liked. Overall, I think it's better. I broke a lot of tests in the process, but I finally got them all passing again last night. 

Now I can quite easily add a number of optimizations that I've been wanting to do:
- call private methods directly (no method lookup required)
- call built-in global functions/classes directly

However, there are still things I'd like to do that are difficult with a single pass compile. For example, knowing which local variables are read or written by closures. One way to handle this would be to re-compile if the assumptions were wrong, using information from the first try to do it correctly the second time. But that seems ugly. Maybe I need to drop the single pass compile and just generate an AST. 

Wednesday, July 28, 2010

JVM Languages Summit Day 2


Another good day of talks starting with Doug Lea on parallel recursive decomposition using fork-join. Amazing how much subtle tweaking is require to get good performance.

This led into Joshua Bloch's talk on performance. There is so much complexity in all the layers from CPU to OS to language that performance varies a lot. He showed a simple example of a Java program that gave consistent times on multiple runs within a given JVM, but sometimes when you restarted the JVM it would consistently give quite different results! Cliff Click's theory was that it was caused by non-deterministic behavior of the JIT compiler since it runs concurrently. The behavior is still "correct", it can just settle into different states. The solution? Run tests over multiple (eg. 40) different JVM instances. That's on any given machine, of course you should also test on different CPU's and different numbers of cores. Easy for them to say. 

Neal Gafter talked about Microsoft's LINQ technology - pretty cool, although nothing to do with the JVM. 

Kresten Thorup talked about his Erlang implementation on the JVM using Kilim for coroutines. Erlang is an interesting language, and quite different from Java so it was interesting to see how he implemented it. He actually runs the byte code produced by the existing Erlang compiler. 

I talked to Remi Forax about whether I should use his JSR292 backport in jSuneido. This would let me use the new technology before Java 7 is released (who knows when). Of course, he said I should. But ... it means developing with the "beta" JDK 7 which still has bugs and is not supported by IDE's. And then it requires an extra run-time agent. I'm not sure I want to complicate my life that much!

Monday, July 26, 2010

JVM Languages Summit Day 1

My hotel is "behind" the Sun/Oracle campus so I had to circle around through the endless acres of parking lots to get to the right entrance. But it was pretty easy, and not as far as it looked on the map. I've never had much to do with giant organizations so it's still a little mind boggling when you use an automated kiosk (like the ones at the airport) to get your visitors badge.  


The sessions were a real mixture, from stuff where the nuances were pretty much over my head, to thinly veiled sales pitches with no technical details. It was pretty neat to be in the company of people you're used to thinking of as gurus, like Doug Lea, John Rose, Joshua Bloch, and Cliff Click. And nice to see they have their frustrations with the technology also.


It's naive of me, but unconsciously I expect really smart people to be "rational", and it's always a bit of a disappointment when that proves to be untrue. Smart people have egos, are insecure, or argumentative, or negative, or defensive, or obnoxious, just like everyone else. 


Maybe I'm just too soft, but I felt bad for one guy who basically got told he was doing it wrong. It seems like they could just have well asked "did you consider ..." or "what would you think about ...", rather than just "that's wrong, you should have done ..."


Mostly I just listened to the conversations. In the Java / JVM area I still feel like a relative novice and I'm not sure I have much to contribute yet. But for the most part I didn't feel too much out of my depth, so that's good.

Sunday, July 25, 2010

OSCON Wrapup

The last two days of OSCON were good.

I got to hear Rob Pike talk about the Go language. Rob is a legend in software and Go is a cool language. In some ways Go is more like C or C++ in that it's compiled to machine code with no VM. But unlike C++ it compiles extremely fast. For what he called a party trick he showed it compiling every time he typed a character - and it kept up. It has features I miss in Java like pointers (but safe) and values on the stack (not always allocated). It also has "goroutines" - lightweight threads like coroutines. But its attractions aren't quite sufficient to tempt me away from Java. They don't even have a Windows version yet, let alone all the support libraries and frameworks that Java has.

I also got to hear another legend, Walter Bright, talk about his D language. I used Walter's C and C++ compilers for many years. D also has some very interesting features. Andrei Alexandrescu of Modern C++ Design fame is now working on D and has written The D Programming Language book.

One feature D has that is sorely missing from other languages is the ability to declare functions as "pure" (ie. no side effects) and have the compiler verify it. This (not lambdas) is the key to functional programming. And yet languages like Scala that claim to support functional programming don't have this.

I also went to a talk by Tim Bray whose blog I read. He was working at Sun but moved to Google after the Oracle buyout. His talk was on concurrency. He was very pro Erlang and Clojure but didn't mention Scala. When asked about it he said he thought the Scala language was too big. It does have a sophisticated type system, but the actual syntax is quite simple - smaller than Java. Scala's Actor implementation has been criticized but it's been improved in 2.8 and there are alternatives like Akka.

Wednesday, July 21, 2010

OSCON Another Day

After the less than thrilling key notes, I went to another Scala session. A lot of it was repetition, but I learn a bit more at each one. The recurring theme seems to be if you're going to use Scala, use Simple Build Tool (SBT).

Next I went to a talk on GPARS, a Groovy concurrency library. The talk was as much about concurrent programming patterns as about Groovy, which suited me.

Next, I was going to go to one of the database talks, but then I realized it was a vendor presentation and they're usually more sales spiel than technical. Looking for an alternative I noticed Robert (R0ml) Lefkowitz had a talk on Competition versus Collaboration. If you've never listened to one of his talks, it's well worth it. They are as much performances as presentations and always thought provoking. (search for Lefkowitz on the Conversations Network)

Next was a talk on Clojure (a Lisp that runs on the JVM). The title was "Practical Clojure Programming" which sounds like an intro, but it was actually about build tools, IDE's, and deployment. Like most of the audience, I would have rather heard more about Clojure itself. I guess we should have read the description closer.

Finally, I snuck into the Emerging Languages Camp to hear Charles Nutter talk about Myrah (formerly Duby) his close-to-the-JVM-but-like-Ruby language. I would have been tempted to go to the Emerging Languages Camp (it was even free) but by the time they announced it, I'd already signed up for the regular sessions.

All in all, a pretty good day. Lots of food for thought, which is, of course, the point.

One thing I forgot to mention yesterday is that a lot of the Scala people are also (or ex-) Ruby/Rails people.  Maybe that's simply because they're the kind of people that like to learn/adopt new things.

But a lot of people left Java and went to Ruby, so it's surprising to see them coming almost full circle back to Scala. Scala is better than Java, but it's still a statically typed language like Java, which is part of what people seemed to reject. Maybe Ruby wasn't the silver bullet they were hoping. Maybe it was performance issues. Maybe they realized that static typing does have advantages after all. Maybe they realized the advantages of running on the JVM (although jRuby allows that). Maybe Scala's improvements over Java are enough to win people back.

Tuesday, July 20, 2010

OSCON Part One - Scala

I just finished the first day and a half of OSCON - in the Scala Summit. And contrary to my anti-social nature, I even went to a Scala meetup last night.

As I've written before, I'm quite intrigued with the Scala programming language. It runs on the JVM and interfaces well with Java. It combines functional and object-oriented programming. It has actors for concurrency. And it has a rich type system that is probably Turing complete, like C++ templates. 

Scala seems to be attracting a lot of attention. Of course, the Scala Summit attendees are all keen, but there were also several other language developers (like Charles Nutter of jRuby) interested in borrowing ideas from Scala.

Programming in Java is like driving a truck. It's not fast or fuel efficient or fun to drive or comfortable. But it gets the job done without a lot of problems and it can haul just about anything. 

Why do people like to drive a fast car? It's not like they regularly need go from 0 to 60 in 5 seconds, or hit 200 km/hr. But they like to know that if they "need" to, they could.

Similarly, I think people like to know their language "has a lot of power under the hood", even if they never use it. And it gives lots of potential for books and conference talks that keep people excited.

And like you could use C++ simply as a better C, you can use Scala as a better Java. This lets less sophisticated users get involved as well. 

The funny part is that it's very reminiscent of C++.  There were all kinds of talks and articles and books about all the cool stuff you could do with templates and other fancy C++ features. Now, all you hear are negative comments about C++, too complicated, too hard to use, etc. I start to think I'm the only person that liked C++. 

But there are differences too. Scala didn't try to be backwards compatible with Java the way C++ was backwards compatible with C. And Java already had object-oriented programming, so Scala isn't as big a jump as C++ (in that respect). 

If nothing else, it's made me really want to spend some more time with Scala. It will sound crazy, but I'm very tempted to rewrite jSuneido in Scala. Thankfully, that's something that could be done gradually. I think it would let me make the code much smaller and cleaner. I think it would also make it easier to switch back and forth with Suneido coding, since Scala has things like named and default arguments, optional semicolons, and type inference that make the code much more similar to Suneido code.

Wednesday, July 07, 2010

Time Capsules

I use an Apple Time Capsule for my home wireless router and backup storage.

Time Capsules have a bad reputation for dying and I've had mine for quite a few years so I was a little nervous about it. If it died I wouldn't have a backup of my iMac. Which would be ok unless it happened to die at the same time. This seems unlikely, but it's surprising how often you do get simultaneous failures. For example, a power surge due to lightning. I can also keep the external drive at work so I have an offsite backup in case the house burns down.

I had a 500gb external drive that I'd used for backups, but it's not big enough to do a complete backup of my 1tb iMac. So I went and bought a Lacie 2tb external drive and used SuperDuper to make a backup. I used the free version, but I'll probably get the paid version so I can update the backup without redoing it.

Then I decided I should also backup my MacBook and my MacMini. I didn't have much critical stuff on them, but a backup would save hassle if I needed to restore. But SuperDuper takes over the whole drive so how could I backup additional machines? The answer seemed to be to partition the drive, but I didn't want to have to redo my iMac backup (600gb takes a while to backup, even with Firewire 800). I searched on the web and found various complicated ways to resize partitions. Finally, I found that with recent OS X you can resize right from the Disk Utility. All the complicated instructions were for older versions of OS X.

The "funny" part of this story is that a few days later I went to use wireless and it wasn't working. I went and checked on the Time Capsule and it was turned off. Strange, because I leave it running all the time. I turned it on and about 10 seconds later it turned itself back off.

My computer was still fine, so I didn't actually need the external backup, but I was glad to have it nonetheless.

I phoned the local Apple dealer (Neural Net). The receptionist wanted me to bring it in and they would look at it in the next few days. I didn't want to be without internet for days so she let me speak to the technician. When I described the symptoms he said the power supply had died. But Apple doesn't let them repair them and doesn't supply any parts. Apple has been promoting their environmentally friendly products, but no matter how they're built, a non-repairable "disposable" product isn't very environmentally friendly.

I would have preferred to give Neural Net the business but they didn't have any Time Capsules in stock so I picked up a 2tb Time Capsule from Future Shop. 10 minutes after opening the box I was back up and running. (Although the initial Time Machine backups took considerably longer.)

One nice thing is that Time Machine backups seem to be a lot less intrusive. Before, when Time Machine kicked in it would really bog down my machine. If I was trying to work I'd often stop it. But now I don't even notice when it runs. I'm not sure how a new external device would change the load on my computer, but it's nice anyway.

My old Time Capsule ran quite hot, even when it wasn't doing anything. I was hoping the new ones would be better, but it seems much the same. I haven't measured it, but I assume heat means power consumption. I'm not sure why it can't go into a low power mode when it's not active. The other reason they run hot is that they have no ventilation or heat sink. Apparently there is an internal fan but all it does is stir the air around inside a small sealed box. You'd think they could come up with some better heat management without compromising their design. I would guess the heat is one of the big reasons they have a reputation for dying. Electronic components tend to have a shorter life at higher temperatures.

Rather than throwing out the old Time Capsule, I passed it on to one of the guys at work that tinkers with hardware. I thought he could at least extract the 1tb drive. But he managed to repair the power supply and is using the whole unit. I guess the hardest part was getting the case open! I'm glad it was saved from the landfill for a while longer.

Now that I have a bigger drive I thought I might as well backup Shelley's Windows machine as well. I have it set up with Mozy for on-line backups but just with the free 2gb account so I'm only backing up selected things. I'd seen Mozy will now backup to external drives so I thought I'd set this up. Unfortunately, it only backs up to directly attached drives (i.e. internal or USB) not to network drives. I'm not sure what the logic is behind that choice. I could use different software, but I think what I'll do (I haven't got round to it yet) is to use the old 500gb external drive.