Jan 5, 2012

Pygmentize files and paste them into Microsoft Word in OS X

Writing some documents using Microsoft Word and I wanted to be able to drop in syntax highlighted versions of the source code. Pygmentize will generate RTF format, which is great for pasting in to Word, but you have to get it there. The best way I've found is using the OS X terminal (I like using iTerm2). I then use a feature of the command line open command, which makes it read from stdin and open the output in TextEdit. This text can then be cut and pasted directly in to Word, preserving all the formatting.

Run the command:

pygmentize -f rtf <file to highlight> | open -f

Then in the TextEdit window, press


c to select all and copy it. Switch to Word and hit

v to paste.

There are comments.

Simplified VPI iterators using PyHVL generators

I've been using PyHVL for a variety of verification tasks in the past few years. PyHVL is an open source Python integration for Verilog and SystemVerilog simulators. To give a quick taste of what it can do for you, consider the following SystemVerilog VPI C code.

void display_nets(mod)
    vpiHandle mod;
       vpiHandle net;
       vpiHandle itr;
       vpi_printf("Nets declared in module %s\n",
       vpi_get_str(vpiFullName, mod));
       itr = vpi_iterate(vpiNet, mod);

       while (net = vpi_scan(itr))
          vpi_printf("\t%s", vpi_get_str(vpiName, net));
          if (vpi_get(vpiVector, net))
             vpi_printf(" of size %d\n", vpi_get(vpiSize, net));
          else vpi_printf("\n");

Here is the equivalent VPI code, this time written in Python, using PyHVL.

def display_nets(module):
    print 'Nets declared in module', get_str(vpiFullName, module) 
    for net in vpi_iterator(module, vpiNet):
        print '\t%s %s' % ( get_str(vpiName, net), 
                get(vpiVector, net) and 'of size %d' % get(vpiSize, net) or '')

The magic happens in the implementation of the vpi_iterator() method, which uses a Python yield instruction to turn the method into a generator. Generators are much like functions, except they maintain a frozen stack frame, at the point where the yield occurs. All existing variables within the method maintain their state and execution picks up where it left off, just after the yield. The example also uses lazy evaluation of the result of get(vpiVector, net) to either call get(vpiSize, net) or not print the 'of size' additional string.

def vpi_iterator(handle, type=vpiNet):
    iterator = iterate(type, handle)
    if iterator:
        handle = scan(iterator)
        while handle:
            yield handle
            handle = scan(iterator)

This lets you write loops inside out as one of my colleagues aptly put it. The outcome is you can simplify the management of loops and indices and focus on the point of the loop. You write less code, you introduce fewer bugs. The code is easier to read and maintain as a result. This is just a very small example of some of the power of using a modern scripting language like Python, as an adjunct to a SystemVerilog simulator.

If you spend much time writing VPI code, you should take a look at PyHVL. It could make your life much simpler, or get in touch and I can help you with it.

There are comments.

Jan 8, 2009

maker's schedule, manager's schedule

I've always had a dread of mid-afternoon status meetings. Now maybe I understand it a bit better, because of Paul Graham's excellent essay on the difference between being on a maker's schedule and being on a manager's schedule. Seems to share a lot of ideas with Csikszentmihalyi and his ideas of creativity and flow states.

There are comments.

Jan 12, 2008

image recovery

I take a lot of pictures. On occasion, I get too impatient when downloading images from my compact flash cards. I'll swap the card without ejecting it properly and sometimes the cards get corrupted. Typically, when that happens the file allocation table of the previous card that was in the reader gets written onto the new card, or the FAT gets corrupted in some other way. The images are still there, but you can't access them. This happened to me last weekend and I didn't have any recovery software on this laptop. I had a look around online and the only recovery programs I could find were close to $100. I had a bit of free time so I decided to try writing my own instead. Turns out a basic recovery tool is actually really simple to put together.

A couple of things made it possible to do quite simple image recovery, successfully. Firstly, I always format the cards in the camera before I use them. So I know when the camera is writing images to the cards, the card is empty. Secondly, I never delete images in the camera. This means there is no fragmentation on the drive. The images are simply stored sequentially on the memory. The FAT format is fairly simple, based on sectors that are multiples of 512 bytes in size, that are collected together in clusters that vary depending on the disk formating. Images are written into linked lists of those clusters. Potentially the clusters could be fragmented across the drive, particularly if images are deleted and new ones stored on the disk. With a clean start and no images deleted, it is reasonable to assume that the images will just be stored on concurrent clusters. I think damaged sectors are managed at the a physical level on the disks, so they are mapped out of the available space (feel free to correct me on this). Anyway, with these assumptions made, it is possible to write a simple tool to parse a disk image and extract images, with a high likelihood of a successful result.

The first step is to get the data. I did the recovery on a unix system and used dd to get the initial image. You have to dump the actual physical device, not one of the disk partitions (as those are essentially what has become corrupt)

dd if=/dev/rdisk1 of=image.img bs=512

The block size is set to 512 to match the formating of the compact flash card. This step takes a while, but eventually you'll have an image file, image.img which is a low level copy of the data on the drive. The next step is to work out a way to identify the files you are looking to recover. I wrote a simple hex dump tool that prints the first few bytes of a file. I used this on a representative sample of the Canon cr2 RAW files to get a search key to identify the start of a file.

--- show_header.py ---

import sys

file = open(sys.argv[1], 'rb')

header = file.read(12)

headerhex = header.encode('hex')

print headerhex

--- end show_header.py ---

This little bit of python can be applied to a group of files with xargs

ls *.cr2 | xargs -n 1 python show_header.py

From that output, it is easy enough to find a representive number of bytes that can be used to identify the start of a file. I also had recorded some audio with the camera, so did a similar process with .wav files to extract them correctly.

Then all you have to do is iterate through the disk image in block_size chunks, checking for those file signatures at the start of each sector. When you find a file signature, start dumping all the data to a new file, until you find another signature. That's all there is to it. Note that there are no warranties with this. I'm offering no guarantees that it will work, or even will not wipe your computer. Use at your own risk. With this I was able to recover the 150+ images that I'd taken and several audio files. It actually works surprisingly quickly once the disk image has been made. Also worth mentioning that the JPEG header matching is untested, as I didn't have any JPEG files on this particular disk, but is included here for completeness.

Download the source for image_recovery.py (you'll probably need to change the file extension - web server doesn't like serving .py files)

There are comments.