Recently I've been reading about the various open-source Continuous Integration servers that are available. This chart gives a good feature comparision of many of the systems that are out there. We've been evaluating Cruise Control on some internal projects and generally trying to understand what the issues are with deploying a CI server on a hardware design project. One of the things I've been struggling with is the real, meaningful difference between continuous integration and the more typical daily build and check-in smoke tests. Scheduled builds are often described as an anti-pattern when considering CI, but as far as I can tell the only practical difference is in the frequency of the builds. Certainly, you have the potential to find out about a broken build more quickly with CI, hence it has less chance to impact other users. Also, you are always doing useful work, rather than maybe re-running a version because no check-in has happened since the previous build. However, these distinctions of implementing CI are maybe more significant in the software world than for the typically longer build times found in hardware design projects. This then is the key point - build time is the significant factor in CI. The real benefit of using CI successfully is that you need to refine your processes to keep the build as quick as possible, striving for close to a 10 minute turnaround time, to stop things getting backed up. The consequence of this is that the entire Checkout-Build-Test loop keeps being optimised and refined. This doesn't just help with automated processes but can significantly improve productivity for the developers who do these steps manually every day. That's great for the software world, but is it really practical for hardware design with the current compilers and the speed of typical RTL simulation? If not, is it worth even bothering with?
The source control tools you choose can have a big impact on the first phase of the Checkout-Build-Test loop. When an update can take quarter an hour or longer, the source control system can become a significant productivity drain and stymie any chance of a quick turnaround. If merging changes, updating source and checking in a new revision takes hours, then there are real problems with the process and you certainly won't be doing multiple checkins per day (another fundamental CI process axiom is at minimum daily checkins for all developers, even more frequent is preferable). In Linus' talk about Git, he describes being able to do a diff and merge on an entire kernel source tree (22k source files) in less than a second. I've used other SCM environments with similar amounts of code, where an update might take 30 minutes. These sorts of differences can significantly change how you work. It isn't just a matter of the time that the particular automated task takes, but what the developer does while they are waiting, reading email, writing documentation, switching context to other distractions.
Similarly, if the build takes half an hour or longer before it fails with a trivial syntax error, you'll have switched to something else and then have to try to mentally context switch back again to work on the problem. Each of these switches have an associated, measurable attention and productivity hit. Improving the build step can have a big impact on how you manage your attention and keep engaged with the development process. A faster turnaround can stave off the onset of Zerstreutheit.
The final Test step is significantly slower for hardware design. Often this is used as a justification not to bother optimising the Checkout and Build phases, because they are comparatively much shorter. A multi-day or week long regression might fool you into thinking that an hour long build is relatively good. However, simulation & testing is the one step where the developer can be more out of the loop, with less impact. Typically, the user is not so tightly coupled into the testing loop, once the initial bugs are ironed out. Automation can certainly help here too, re-running failing tests with waveform dumping enabled or increased logging, to present a useful working environment for debug when the developer does come back to look at the fails.
The point really is that there is still a significant advantage to be had in spending effort to optimize the SCM and compile stages in a development flow, to maximise designer productivity and attention, even if the simulation time is large. Also a build pipeline can be used in the CI server to stage the build and testing feedback, to further mitigate the length of time that running tests takes. Deploying CI brings attention to how long these processes take and might help improve the entire development environment. Having fast enough tools can help the developers keep focused on what they are doing, without breaks for swordfights or reading email. Optimizing the build is still important, even in a hardware design environment, even when the runtime for regressions might be in terms of days or weeks. You might think the build process is only a comparatively small part of the overall runtime for a regression, but the designers spend most of their time looping through that comparatively small part.
So what is the take away? Does CI have a place in a hardware design flow? I think that CI servers can certainly be used to manage running regressions and nightly builds. Smoke tests and scheduled build approaches can be controlled with most of the CI servers. However, the real continual building process required to move from scheduled builds to CI seems to be hard to map to hardware design, simply because of the length of time of the checkout/build/test loops. Tool improvements and generally faster hardware seems to be key to increasing the frequency of integration tests, at least for now. However, optimizing the interaction between the users and automated tools is a key and often overlooked part of developing an effective design flow, if you plan on using CI or not.

Good article. One quibble, you wrote:
The final Test step is significantly slower for hardware design. Often this is used as a justification not to bother optimising the Checkout and Build phases, because they are comparatively much shorter.
This is not necessarily true in my experience. The first level of testing is so-called smoke/sanity where only a very small subset of tests run for each environment. Assuming an environment does intelligent things like runtime loading or program block switching based on runtime switches your "Build" may be much more expensive than run -- especially the further up the integration chain (subsystem, chip).
Of course this depends on your cutoff of Build and Test. I'm assuming simulator compile is part of Build.
Posted by: Etan | August 28, 2008 at 06:27 PM
Hi Etan,
Thanks for your comment. What you say is certainly true for basic operation tests like smoke/sanity tests, though they can even be relatively expensive. But where the difference really comes in to effect is in the regression environments, that may take weeks to run just once. I've certainly heard on occasion this run-time used to explain away slow build & SCM tools, as only a relatively small percentage of the overall time.
I think my main concern is that given the combination of the 3, for reasonably complex HDL designs, the principles in CI don't really map too well to the current state of the art. I am curious to understand why that is - at least for the SCM & build phases. I can understand why simulation time is maybe larger than for software testing, but I'm not convinced that a similar argument applies to the other parts of the hardware design Checkout/Build/Test cycle.
Is there something very different about compiling an HDL design for simulation, compared to compiling a C/C++ program? If not, is it then possible to apply the best practices from the software world to significantly accelerate the typical HDL build? It just doesn't appear to get as much attention, unless I'm missing something fundamental (I'm certainly not a compiler expert)
Posted by: Gordon | August 28, 2008 at 07:37 PM
Hey Gordon,
Just curious - how long would you expect it takes to compile something like Microsoft Windows? What about running the regression suite? I would expect that would take significantly longer than what a large percentage of hardware design teams experience.
JL
Posted by: JL Gray | August 29, 2008 at 09:44 AM
Hi JL,
I'd really hope that it wouldn't be a single build step for the whole shebang, would you expect it to be? I would think that it would be composed of a lot of hierarchical parts, with sub environments and tests associated with each section.
But also, at least from what I've read, Microsoft uses the daily build approach too (one of the Joel on Software links in the post obliquely mentions that). I don't they are doing CI either, maybe that's changed.
Posted by: Gordon | August 29, 2008 at 10:15 AM
Gordon,
My point was that I don't know that it's a true statement that software is so much simpler than hardware (thus not explaining why software guys use continuous integration and hardware guys don't). Hardware designs obviously are broken up and tested in pieces as well.
JL
Posted by: JL Gray | August 30, 2008 at 03:32 PM
You have made very good points. A speedy checkout and build flow is key to quick resolution of the problem at hand, mostly when you need to do a bug fix. My experience is that long checkout and build leads to parallelization, pipelining and context switching for the person. Rebuilding the zone after a context switch takes time, and you get fewer things done when this happens!
Does working on multiple checkouts in parallel in order to be more effective by pipelining your work, turn into using more licenses for longer periods of time, and needing more disk space (due to the multiple checkouts)? I do not know. But I know I hate my long build times.
Posted by: Martin d'Anjou | October 21, 2008 at 10:09 PM
Hi Martin, thanks for commenting. The disk space and license issue may be true. I think the attention management might be an even higher cost, though less obvious.
Posted by: GordonMcGregor | October 22, 2008 at 07:57 AM
I was just wondering how your evaluation of CruiseControl went.
Posted by: Martin d'Anjou | February 05, 2009 at 02:00 PM