Planet GLLUG

March 09, 2010

Richard WM Jones

rich

Something unknown was changing the labels on certain devices behind my back. We couldn’t find out what it was using ordinary diagnostics, so I decided to investigate if we could do this with SystemTap. I quickly found an existing script to monitor changes in ordinary file attributes. This won’t work for SELinux labels though because those are stored in ext2/3/4 extended attributes (xattrs).

Basically I had to modify that script to monitor calls to setxattr instead.

Using the LXR I found that the call is implemented in Linux in fs/xattr.c, function vfs_setxattr. I had to modify the script to probe that kernel function, and the parameters are slightly different too.

I also had to install the correct kernel-{,PAE-}debuginfo package corresponding to my installed kernel. This is how SystemTap is able to resolve symbols in the current kernel.

/* Watch changes to xattrs on an inode.
 * http://rwmj.wordpress.com/2010/03/09/tip-use-systemtap-to-monitor-selinux-changes-to-files/
 */

probe kernel.function("vfs_setxattr")
{
  dev_nr = $dentry->d_inode->i_sb->s_dev
  inode_nr = $dentry->d_inode->i_ino

  if (inode_nr == $1)
    printf ("%s(%d) %s 0x%x/%u %d %s %s\n",
      execname(), pid(), probefunc(), dev_nr, inode_nr, uid(),
      kernel_string ($name), kernel_string_n ($value, $size))
}

Then run it with:

# stap -v /tmp/inodewatchxattr.stp inodenum

The bug turned out to be udevd, which I don’t think anyone was expecting …

libvirtd(4338) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:svirt_image_t:s0:c177,c272
udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0
udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0
udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0

All in all, I’m impressed with SystemTap. It’s a simple, strongly-typed, sane programming language with type inference. Thankfully Python was not an influence on it.


March 09, 2010 12:41 PM

rich

Yesterday we improved virt-inspector so it can now fetch information about Windows guests by reading their Registries. In the XML output, this provides the ProductName and Windows internal version:

$ virt-inspector --xml Win2003x32
[...]
    <name>windows</name>
    <product_name>Microsoft Windows Server 2003</product_name>
    <arch>i386</arch>
    <major_version>5</major_version>
    <minor_version>2</minor_version>
[...]

In the raw output you get even more details from the Registry:

$ virt-inspector --perl Windows7x64
[...]
'arch' => 'x86_64',
'windows_registered_owner' => 'rjones',
'windows_current_type' => 'Multiprocessor Free',
'windows_system_hive' => '/Windows/System32/config/SYSTEM',
'windows_installation_type' => 'Client',
'os_major_version' => '6',
'os_minor_version' => '1',
'systemroot' => '/Windows',
'windows_software_hive' => '/Windows/System32/config/SOFTWARE',
'windows_software_type' => 'System',
'windows_registered_organization' => '',
'windows_current_build' => '7600',
'windows_edition_id' => 'Enterprise',
'product_name' => 'Windows 7 Enterprise',
[...]

March 09, 2010 10:16 AM

March 08, 2010

Martin A. Brooks

<b>Game review: Assassin's Creed 2</b>

Following my original review of the first Assassin's Creed game, I was dearly looking forward to reviewing the new episode in the series. Alas Ubisoft have taken the skull-smackingly stupid decision of making a single-player game need access to the Internet to work.

Don't buy this game, you will be funding idiocy if you do.

What next, Ubisoft, will you be making me not buy the upcoming Splinter Cell, too?

March 08, 2010 08:23 PM

Richard WM Jones

rich

From the spam folder of this blog today:

The layout for %BLOGURL% is a bit off in iCab. However I like your site. I may have to install a “normal” browser just to enjoy it.

I usually don’t usually post on many Blogs, yet I just has to say thank you for %BLOGTITLE%… keep up the amazing work. Ok regrettably its time to get to school.

Even more incompetently, neither of the spammy URLs they were trying to add actually worked (checked using ‘wget’, not a real browser of course).


March 08, 2010 05:02 PM

Rev. Simon Rumble

UI fail from Exetel


Cancel

Work is providing me a mobile, so I went to cancel my phone with Exetel. Unfortunately this is the UI you see. So first of all, you can helpfully cancel it in the past. But then the button is labelled "Cancel". So does that means clicking it will cancel my service, or cancel the request to disconnect?

Submit

The resulting page is even more confusing. Does that mean my "Cancel" was successful? Or do I now need to "Submit" to make it happen? Terribly confused.

Contact me

March 08, 2010 01:12 AM

March 07, 2010

www.DavidPashley.com/blog

Mod_fastcgi and external PHP

Has anyone managed to get a standard version of mod_fastcgi work correctly with FastCGIExternalServer? There seems to be a complete lack of documentation on how to get this to work. I have managed to get it working by removing some code which appears to completely break AddHandler. However, people on the FastCGI list told me I was wrong for making it work. So, if anyone has managed to get it to work, please show me some working config.

Read Comments (1)

March 07, 2010 11:02 PM

Martin A. Brooks

<b>Filesystem-based greylisting for Exim</b>

I have written a filesystem-based greylisting engine for Exim. See here for details.

March 07, 2010 06:47 PM

davblog - Dave Cross

Gigs for Old Gits

It's been a busy couple of weeks for gigs. On the assumption that at least some of my readers have similar tastes to me, here are brief reviews of the three gigs I've seen in the last couple of weeks.

Fairport Convention, Union Chapel, 20th Feb
This is the second year running that I've seen Fairport Convention on their "Wintour" at the Union Chapel. Last year was the first time I had seen them (which is bizarre for a band I've been a fan of for over thirty years). I can't quite put my finger on it, but this year's show wasn't as enjoyable as last year's. I suspect it was down to the number of songs taken from later Fairport albums that I'm not at all familiar with. Oh, and the arrangement of Matty Groves was very strange. The long instrumental that ends the song was unrecognisable.

Thomas Dolby and Friends, Union Chapel, 28th Feb
Something a little more up to date. This was Thomas Dolby bringing back together the band who had recorded and toured his second album, The Flat Earth. As an extra twist, the band (who haven't played together for over twenty-five years) didn't rehearse at all. They met on stage and worked the songs out in a two-hour "live rehearsal". They then went of for a brief break before returning to play a half-hour set.

The rehearsal was fun. And the band sounded great for a band eho hadn't played together for so long. There were also a few guest stars - including Trevor Horn who played bass on "Airwaves". The only slight disappointment was that the rehearsal overran so the final set had to be cut short.

John Cale, Royal Festival Hall, 5th March
I'm not a huge John Cale fan. I generally like the stuff of his that I hear, but I haven't really heard much of it. This concert had him playing the whole of hist album "Paris 1919" (from 1973). This isn't an album that I'd heard at all until I started to listen to it in preparation for this show and it's really not that representative of the rest of his music. But it's a great album and it was interesting to hear it all played live. It is, however, a rather short album (many were back in the early 70s) and that part of the show only lasted forty minutes. After a short break (and it was really short) the band returned to play another forty minutes of "the best of John Cale". I was pretty surprised to realise that I recognised most of these songs. All in all, a great night out.

March 07, 2010 12:21 PM

March 06, 2010

Martin A. Brooks

<b>Automatic email archiving</b>

If you're on as many mailing lists as I am, you might find this to be handy.

March 06, 2010 11:22 AM

March 05, 2010

Richard WM Jones

rich

You can run arbitrary commands, shells, editors etc during your presentation. Here, I run a gnome-terminal with a prepopulated command history:

Download it from my git repository. The requirements are fairly light: perl, perl-Gtk2 and perl-Gtk2-MozEmbed (all in Fedora).

Previous angry rant about presentation software.

The diagram in the first slide was done using PGF and Tikz 2.00 (examples) (manual).


March 05, 2010 10:28 AM

March 03, 2010

Richard WM Jones

rich

How many dull presentations have you been to where the presenter simply reads bullet points off slides?

FrobSoft Express 2.0 is:

I’m giving a talk about libguestfs on 18th March and I hate reading out slides to people as much as I hate listening to presenters reading out slides to me. In every talk I’ve given in the last few years I have tried to keep my notes separate (written on paper in front of me, or memorized) from what is on the slides. Presentation software, such as the mighty, all-pervasive OpenOffice, doesn’t make this easy. Nor does it make it easy to demonstrate software in the middle of your talk. You end up having to switch away to another virtual desktop, where (hopefully) you’ve remembered to set up some xterms “su”‘d to root and “cd”‘d into the right directory. I usually need several virtual desktops set up like this so I can demonstrate different parts of the software, so I’m standing in front of an audience using [Alt][←] and [Alt][→] while I hastily try to remember which virtual desktop has the next stage of the talk.

Enough!

Introducing “Tech Talk”. Actually, Tech Talk is too generic in Google, so we brainstormed adding extra words on the end until it became unique: Introducing “Tech Talk Platinum Supreme Edition!” (Tech Talk PSE).

The concept is simple. You create a directory and drop a mixture of HTML files and shell scripts in there:

$ ls
10-hello.html
20-shell.sh
30-goodbye.html
$ techtalk-pse

When Tech Talk PSE runs, it sorts the files numerically, and then displays the HTML ones (using Mozilla embedding) as slides and runs the shell script ones. Next and previous keys move through the slides, ensuring that your demonstrations [the shell scripts] run automatically at the right place in the talk.

Only files matching ^\d+(-.*)\.(html|sh)$ are considered, everything else is ignored. So you can style your HTML using stylesheets, include READMEs and Makefiles, and move common shell functionality into sourced shell files:

#!/bin/bash -
# Source common functions and variables.
source functions
# Pre-populate the shell history.
cat > $HISTFILE <<EOF
guestfish -a vm.img
EOF
# Open gnome-terminal.
exec $TERMINAL --geometry=+100+100

Tech Talk PSE itself doesn’t have to deal with rendering, which is pushed off to a browser, making it far more flexible, powerful and simpler than existing presentation software. This means you can show figures or play video in your presentation, or use Javascript to make your slides resolution-independent or to add animations. Additionally you can use any existing tool you want to write HTML. (If you’re like me, that tool will be emacs.)

You’ll be able to download Tech Talk PSE after my talk in two weeks time, or get early previews from my git repository. Requirements are Perl, Perl Gtk2 and Gtk2::MozEmbed.


March 03, 2010 05:41 PM

rich

KageSenshi’s HOWTO use Linux Containers (LXC) on Fedora 12 with libvirt is interesting. I discovered that they’re using febootstrap (see earlier postings) to build the Fedora root filesystem for the containers.


March 03, 2010 03:46 PM

rich


Watchdog, by Emmanuel Tabard, used with permission from Flickr

In a physical server, a watchdog is a simple piece of hardware which is supposed to restart the server if it hangs without needing any administrator intervention. Watchdogs used to come as separate cards, but nowadays the feature is found in many chipsets, often integrated with other useful bits of server / remote access functionality like remote serial port, wake-on-LAN, hardware event monitoring etc.

So how does the watchdog work? “Is the machine hung?” is a tricky question to answer if you just look at the hardware level. In a “hung” machine, the CPU is most likely still running, and the kernel might still be up and responding to pings.

Hardware watchdogs instead rely on a piece of software running on the machine which must “tickle” a particular port, say every 10 seconds. If the watchdog doesn’t get “tickled” after, say, 60 seconds (so 6 missed events), then it asserts the RESET line which results in a hard reboot. (Of course, how often you must tickle the hardware varies from watchdog device to watchdog device, and is usually configurable. Some hardware watchdogs have more elaborate states — for example, they can deliver a “second chance” interrupt shortly before they deal the final death blow to the machine. In reality no one uses anything but the basic tickle/reset function.)

So the hardware defers to some software which has to keep tickling the hardware, or else face reboot. But how does this software work? In Linux we use the venerable watchdog daemon project. This is an unusual case where you actually want the software to do lots of “useless” work. So the daemon will typically ping a remote network address, do a process listing, maybe write something to disk and access the service, and run some custom scripts, and only if all those succeed will it tickle the watchdog port. Every 10 seconds.

If you think about some ways in which a server can “hang” you can see why this works. Example (1): Hard disk drive stops sending back interrupts. Processes begin to go into the uninterruptible “D” state, and don’t come out. Watchdog daemon itself enters the D state when it writes to the disk, hence the watchdog port is never tickled and the server gets rebooted. Example (2): Web server parent process segfaults. Existing and some new requests are still being serviced, so the server appears to be working from the outside, but less reliably, at least for a little while. The watchdog daemon lists out the processes in the system and notices that the web server parent process is gone (because of a custom test). As a result, it doesn’t tickle the watchdog port, and so the machine is rebooted. Example (3): A large SQL request results in an important database table getting locked. The watchdog daemon periodically fetches a database-backed web page from the web server. The watchdog daemon’s request hangs. The watchdog port isn’t tickled. The web server reboots.

What does this have to do with virtualization? Virtual machines can also hang in various unpredictable ways, and for the same reasons I outlined above, it’s hard to know whether a VM is hanging, overloaded or just slow. And for all the same reasons, you might want to reboot a wedged VM without administrator intervention. For this reason I wrote a virtual watchdog device for qemu and KVM. It’s simple to configure the watchdog using libvirt:

# virsh edit domname

and add <watchdog model='i6300esb'/> into the devices section of the XML.

That will create a virtual Intel 6300 ESB (just the watchdog part of this multi-function Intel chipset). You’ll see this PCI device appear when the VM boots:

$ dmesg | grep 6300
i6300ESB timer: initialized (0xffffc20000016000). heartbeat=30 sec (nowayout=0)
$ /sbin/lspci | grep 6300
00:05.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer

In the guest, install the watchdog daemon. Linux contains a driver already for the i6300ESB, and for Windows you can download drivers from Intel’s website.

Configure /etc/watchdog.conf and perhaps write a few custom tests. Make sure the watchdog service is set to start at boot. Once the watchdog service is started, it will tickle the (virtual) hardware watchdog. If qemu / KVM notices that the software is no longer tickling the virtual port, it will hard reboot the VM.


March 03, 2010 03:31 PM

March 02, 2010

Richard WM Jones

rich

This question arose at work — is LVM a performance penalty compared to using straight partitions? To save you the trouble, the answer is “not really”. There is a very small penalty, but as with all benchmarks it does depend on what the benchmark measures versus what your real workload does. In any case, here is a small guestfish script you can use to compare the performance of various filesystems with or without LVM, with various operations. Whether you trust the results is up to you, but I would advise caution.

#!/bin/bash -

tmpfile=/tmp/test.img

for fs in ext2 ext3 ext4; do
    for lvm in off on; do
        rm -f $tmpfile
        if [ $lvm = "on" ]; then
            guestfish <<EOF
              sparse $tmpfile 1G
              run
              part-disk /dev/sda efi
              pvcreate /dev/sda1
              vgcreate VG /dev/sda1
              lvcreate LV VG 800
              mkfs $fs /dev/VG/LV
EOF
            dev=/dev/VG/LV
        else # no LVM
            guestfish <<EOF
              sparse $tmpfile 1G
              run
              part-disk /dev/sda efi
              mkfs $fs /dev/sda1
EOF
            dev=/dev/sda1
        fi
        echo "fs=$fs lvm=$lvm"
        sync
        guestfish -a $tmpfile -m $dev <<EOF
          time fallocate /file1 200000000
          time cp /file1 /file2
EOF
    done
done
fs=ext2 lvm=off
elapsed time: 2.74 seconds
elapsed time: 4.52 seconds
fs=ext2 lvm=on
elapsed time: 2.60 seconds
elapsed time: 4.24 seconds
fs=ext3 lvm=off
elapsed time: 2.62 seconds
elapsed time: 4.31 seconds
fs=ext3 lvm=on
elapsed time: 3.07 seconds
elapsed time: 4.79 seconds

# notice how ext4 is much faster at fallocate, because it
# uses extents

fs=ext4 lvm=off
elapsed time: 0.05 seconds
elapsed time: 3.54 seconds
fs=ext4 lvm=on
elapsed time: 0.05 seconds
elapsed time: 4.16 seconds

March 02, 2010 09:21 PM

rich

FAQ entry: The API has hundreds of methods, where do I start? A: Start with the API overview in the manual page.

$ guestfish -h
    Command              Description
help                 display a list of commands or help on a command
quit                 quit guestfish
alloc                allocate an image
echo                 display a line of text
edit                 edit a file in the image
lcd                  local change directory
glob                 expand wildcards in command
more                 view a file in the pager
reopen               close and reopen libguestfs handle
sparse               allocate a sparse image file
time                 measure time taken to run command
add-cdrom            add a CD-ROM disk image to examine
add-drive            add an image to examine or modify
add-drive-ro         add a drive in snapshot mode (read-only)
add-drive-ro-with-if add a drive read-only specifying the QEMU block emulation to use
add-drive-with-if    add a drive specifying the QEMU block emulation to use
aug-close            close the current Augeas handle
aug-defnode          define an Augeas node
aug-defvar           define an Augeas variable
aug-get              look up the value of an Augeas path
aug-init             create a new Augeas handle
aug-insert           insert a sibling Augeas node
aug-load             load files into the tree
aug-ls               list Augeas nodes under augpath
aug-match            return Augeas nodes which match augpath
aug-mv               move Augeas node
aug-rm               remove an Augeas path
aug-save             write all pending Augeas changes to disk
aug-set              set Augeas path to value
available            test availability of some parts of the API
blockdev-flushbufs   flush device buffers
blockdev-getbsz      get blocksize of block device
blockdev-getro       is block device set to read-only
blockdev-getsize64   get total size of device in bytes
blockdev-getss       get sectorsize of block device
blockdev-getsz       get total size of device in 512-byte sectors
blockdev-rereadpt    reread partition table
blockdev-setbsz      set blocksize of block device
blockdev-setro       set block device to read-only
blockdev-setrw       set block device to read-write
case-sensitive-path  return true path on case-insensitive filesystem
cat                  list the contents of a file
checksum             compute MD5, SHAx or CRC checksum of file
chmod                change file mode
chown                change file owner and group
command              run a command from the guest filesystem
command-lines        run a command, returning lines
config               add qemu parameters
cp                   copy a file
cp-a                 copy a file or directory recursively
dd                   copy from source to destination using dd
debug                debugging and internals
df                   report file system disk space usage
df-h                 report file system disk space usage (human readable)
dmesg                return kernel messages
download             download a file to the local machine
drop-caches          drop kernel page cache, dentries and inodes
du                   estimate file space usage
e2fsck-f             check an ext2/ext3 filesystem
echo-daemon          echo arguments back to the client
egrep                return lines matching a pattern
egrepi               return lines matching a pattern
equal                test if two files have equal contents
exists               test if file or directory exists
fallocate            preallocate a file in the guest filesystem
fgrep                return lines matching a pattern
fgrepi               return lines matching a pattern
file                 determine file type
filesize             return the size of the file in bytes
fill                 fill a file with octets
find                 find all files and directories
find0                find all files and directories, returning NUL-separated list
fsck                 run the filesystem checker
get-append           get the additional kernel options
get-autosync         get autosync mode
get-direct           get direct appliance mode flag
get-e2label          get the ext2/3/4 filesystem label
get-e2uuid           get the ext2/3/4 filesystem UUID
get-memsize          get memory allocated to the qemu subprocess
get-path             get the search path
get-pid              get PID of qemu subprocess
get-qemu             get the qemu binary
get-recovery-proc    get recovery process enabled flag
get-selinux          get SELinux enabled flag
get-state            get the current state
get-trace            get command trace enabled flag
get-verbose          get verbose mode
getcon               get SELinux security context
getxattrs            list extended attributes of a file or directory
glob-expand          expand a wildcard path
grep                 return lines matching a pattern
grepi                return lines matching a pattern
grub-install         install GRUB
head                 return first 10 lines of a file
head-n               return first N lines of a file
hexdump              dump a file in hexadecimal
initrd-cat           list the contents of a single file in an initrd
initrd-list          list files in an initrd
inotify-add-watch    add an inotify watch
inotify-close        close the inotify handle
inotify-files        return list of watched files that had events
inotify-init         create an inotify handle
inotify-read         return list of inotify events
inotify-rm-watch     remove an inotify watch
is-busy              is busy processing a command
is-config            is in configuration state
is-dir               test if file exists
is-file              test if file exists
is-launching         is launching subprocess
is-ready             is ready to accept commands
kill-subprocess      kill the qemu subprocess
launch               launch the qemu subprocess
lchown               change file owner and group
lgetxattrs           list extended attributes of a file or directory
list-devices         list the block devices
list-partitions      list the partitions
ll                   list the files in a directory (long format)
ln                   create a hard link
ln-f                 create a hard link
ln-s                 create a symbolic link
ln-sf                create a symbolic link
lremovexattr         remove extended attribute of a file or directory
ls                   list the files in a directory
lsetxattr            set extended attribute of a file or directory
lstat                get file information for a symbolic link
lstatlist            lstat on multiple files
lvcreate             create an LVM volume group
lvm-remove-all       remove all LVM LVs, VGs and PVs
lvremove             remove an LVM logical volume
lvrename             rename an LVM logical volume
lvresize             resize an LVM logical volume
lvs                  list the LVM logical volumes (LVs)
lvs-full             list the LVM logical volumes (LVs)
lxattrlist           lgetxattr on multiple files
mkdir                create a directory
mkdir-mode           create a directory with a particular mode
mkdir-p              create a directory and parents
mkdtemp              create a temporary directory
mke2fs-J             make ext2/3/4 filesystem with external journal
mke2fs-JL            make ext2/3/4 filesystem with external journal
mke2fs-JU            make ext2/3/4 filesystem with external journal
mke2journal          make ext2/3/4 external journal
mke2journal-L        make ext2/3/4 external journal with label
mke2journal-U        make ext2/3/4 external journal with UUID
mkfifo               make FIFO (named pipe)
mkfs                 make a filesystem
mkfs-b               make a filesystem with block size
mkmountpoint         create a mountpoint
mknod                make block, character or FIFO devices
mknod-b              make block device node
mknod-c              make char device node
mkswap               create a swap partition
mkswap-L             create a swap partition with a label
mkswap-U             create a swap partition with an explicit UUID
mkswap-file          create a swap file
modprobe             load a kernel module
mount                mount a guest disk at a position in the filesystem
mount-loop           mount a file using the loop device
mount-options        mount a guest disk with mount options
mount-ro             mount a guest disk, read-only
mount-vfs            mount a guest disk with mount options and vfstype
mountpoints          show mountpoints
mounts               show mounted filesystems
mv                   move a file
ntfs-3g-probe        probe NTFS volume
part-add             add a partition to the device
part-disk            partition whole disk with a single primary partition
part-get-parttype    get the partition table type
part-init            create an empty partition table
part-list            list partitions on a device
part-set-bootable    make a partition bootable
part-set-name        set partition name
ping-daemon          ping the guest daemon
pread                read part of a file
pvcreate             create an LVM physical volume
pvremove             remove an LVM physical volume
pvresize             resize an LVM physical volume
pvs                  list the LVM physical volumes (PVs)
pvs-full             list the LVM physical volumes (PVs)
read-file            read a file
read-lines           read file as lines
readdir              read directories entries
readlink             read the target of a symbolic link
readlinklist         readlink on multiple files
realpath             canonicalized absolute pathname
removexattr          remove extended attribute of a file or directory
resize2fs            resize an ext2/ext3 filesystem
rm                   remove a file
rm-rf                remove a file or directory recursively
rmdir                remove a directory
rmmountpoint         remove a mountpoint
scrub-device         scrub (securely wipe) a device
scrub-file           scrub (securely wipe) a file
scrub-freespace      scrub (securely wipe) free space
set-append           add options to kernel command line
set-autosync         set autosync mode
set-direct           enable or disable direct appliance mode
set-e2label          set the ext2/3/4 filesystem label
set-e2uuid           set the ext2/3/4 filesystem UUID
set-memsize          set memory allocated to the qemu subprocess
set-path             set the search path
set-qemu             set the qemu binary
set-recovery-proc    enable or disable the recovery process
set-selinux          set SELinux enabled or disabled at appliance boot
set-trace            enable or disable command traces
set-verbose          set verbose mode
setcon               set SELinux security context
setxattr             set extended attribute of a file or directory
sfdisk               create partitions on a block device
sfdiskM              create partitions on a block device
sfdisk-N             modify a single partition on a block device
sfdisk-disk-geometry display the disk geometry from the partition table
sfdisk-kernel-geometry display the kernel geometry
sfdisk-l             display the partition table
sh                   run a command via the shell
sh-lines             run a command via the shell returning lines
sleep                sleep for some seconds
stat                 get file information
statvfs              get file system statistics
strings              print the printable strings in a file
strings-e            print the printable strings in a file
swapoff-device       disable swap on device
swapoff-file         disable swap on file
swapoff-label        disable swap on labeled swap partition
swapoff-uuid         disable swap on swap partition by UUID
swapon-device        enable swap on device
swapon-file          enable swap on file
swapon-label         enable swap on labeled swap partition
swapon-uuid          enable swap on swap partition by UUID
sync                 sync disks, writes are flushed through to the disk image
tail                 return last 10 lines of a file
tail-n               return last N lines of a file
tar-in               unpack tarfile to directory
tar-out              pack directory into tarfile
tgz-in               unpack compressed tarball to directory
tgz-out              pack directory into compressed tarball
touch                update file timestamps or create a new file
truncate             truncate a file to zero size
truncate-size        truncate a file to a particular size
tune2fs-l            get ext2/ext3/ext4 superblock details
umask                set file mode creation mask (umask)
umount               unmount a filesystem
umount-all           unmount all filesystems
upload               upload a file from the local machine
utimens              set timestamp of a file with nanosecond precision
version              get the library version number
vfs-type             get the Linux VFS type corresponding to a mounted device
vg-activate          activate or deactivate some volume groups
vg-activate-all      activate or deactivate all volume groups
vgcreate             create an LVM volume group
vgremove             remove an LVM volume group
vgrename             rename an LVM volume group
vgs                  list the LVM volume groups (VGs)
vgs-full             list the LVM volume groups (VGs)
wc-c                 count characters in a file
wc-l                 count lines in a file
wc-w                 count words in a file
write-file           create a file
zegrep               return lines matching a pattern
zegrepi              return lines matching a pattern
zero                 write zeroes to the device
zerofree             zero unused inodes and disk blocks on ext2/3 filesystem
zfgrep               return lines matching a pattern
zfgrepi              return lines matching a pattern
zfile                determine file type inside a compressed file
zgrep                return lines matching a pattern
zgrepi               return lines matching a pattern
    Use -h  / help  to show detailed help for a command.

March 02, 2010 07:08 PM

rich

A shout out to Debian packaging genius Guido Gunther who demonstrates how to use virt-inspector, virt-df, virt-edit and guestfish on Debian virtual machines. You can download the Debian packages via this link on our FAQ page or direct from alioth.


March 02, 2010 07:03 PM

GLLUG

18th March - Greater London LUG Meeting

Date: Thursday, March 18, 2010
Time: 7:30pm - 9:30pm
Location: Univ of Westminster Campus
Street: New Cavendish Street
City/Town: London

Map:

View Larger Map


There will be a talk for the GLLUG by Richard Jones on libguestfs[1] on Thursday 18th March at 1930. It will be held at the Cavendish campus of the University of Westminster, in the Large Lecture Theatre. I will try and get a more accurate description of the lecture theatre's location and post it to the mailing list and website. If people could arrive at 1900 to make sure we are ready to begin at 1930. After the talk I suggest we go to a decent local pub for a social.

Please email general.mooney-at-googlemail.com if you are attending in order to get an idea of the expected number of people.

Ciaran Mooney

[1] http://libguestfs.org




March 02, 2010 12:42 PM

March 01, 2010

davblog - Dave Cross

OLB Non Enrolled Non Endorsed 1

When communicating with your customers, it's important to look at the information that you're sending from their point of view. Are they really going to be interested in the information that you send?

Earlier today I finally got round to unsubscribing from the MBNA marketing emails that have been annoying me for months. To confirm my unsubscription they sent me an email which started with this:

We are sorry that you unsubscribed from the newsletter OLB Non Enrolled Non Endorsed 1

Is there really any customer who is going to be even slightly interested in that level of detail? I don't care what your internal name for the newsletter is. I just want to stop seeing it in my inbox.

March 01, 2010 03:27 PM

The Learning Guitar

I don't play the guitar very well at all. I'll sometimes say that I play it better than average, but that's a claim that can only be justified by pointing out that the vast majority of people don't play guitar at all so anyone who knows two or three chords is already well above average.


I have, however, been playing guitar (for some loose definition of the word "playing") for a rather long time. Just how long was brought home to me this weekend.

We're having a lot of building work done in our house over the next few months and as a precursor to that we have had to clear pretty much everything out of the first floor. A lot of stuff has gone into storage, but we also took a lot of stuff to our local tip on Saturday. That load included three guitars and one of them was "The Learning Guitar".

The Learning Guitar was (as its name suggests) the guitar that I first learnt to play on. It was a cheap nylon-stringed Spanish guitar that my parents bought me when I started to take lessons. That was very soon after I started at secondary school in September 1974. There was an after school class which I joined. I think I stopped going after only a couple of months as we were learning boring stuff like "When The Saints Go Marching In" when I wanted to be playing stuff by Slade or David Bowie. At the time I assumed that we weren't learning that stuff because it was too difficult for beginners. Later I realised that a lot of the music I enjoyed was actually just as simple as the stuff we were taught - it was just that the teachers were a bit old-fashioned.

I carried on teaching myself though. I bought a Mel Bay book and spent hours practicising in my bedroom. Of course I had no real idea what I was doing and I picked up a number of bad habits that hamper my playing even now. But I was enjoying myself.

Soon after moving to London to go to university I got another guitar. It was a Fender F3. A much nicer-sounding guitar. My original guitar was somewhat ignored. For a year I shared a flat with someone who played guitar really well and by watching him my playing improved a lot.

But the Learning Guitar still had some life in it. Over the next fifteen or twenty years I took to lending it to friends who wanted to learn guitar. The story was always the same. Someone borrowed it for a couple of years and when they thought the time was right, they'd buy a better guitar and give the Learning Guitar back to me. It was during this period that the guitar acquired its nickname. The last person to borrow it like this was my step-daughter who took it with her when she went to university. As always, i came back after a couple of years.

Over the last ten years, I've played guitar a lot less. I couldn't really justify storing the four guitars that I had cluttering up my study. So this weekend they all went except the Fender. We loaded up a van and took them to the Wandsworth Council dump. Of all of the things that I threw away on Saturday, the Learning Guitar was the thing that I felt most guilty about. I threw it high up on a mountain of rubbish at the dump. At one point I considered trying to retrieve it, but it was too far away.

It was never a particularly good guitar. But a lot of people have strummed their first tentative chords on that guitar. It's a shame to see it go.

Later this week, I hope to get rid of my collection of records. That has sat in a cupboard unused for over ten years. There's really no reason to keep it. But if you think I have got needlessly sentimental about an old guitar, you haven't seen anything yet. I'll be getting far more nostalgic about the records.

March 01, 2010 12:21 PM

February 27, 2010

Richard WM Jones

rich

I enjoyed playing a Japanese import of Densha De Go (an accurate Shinkansen train simulator) on my old Nintendo Wii. I played this legally (albeit expensively) imported game using some other software called Wii Freeloader. Since Nintendo does not like people using software from outside the “right” places, “Freeloader” had to exploit a bug in the firmware to allow Densha de Go to play.

Today I upgraded the firmware on my Wii console.

I am no longer able to play Densha De Go at all. Nintendo have successfully covered all options and there is no way to play my legally purchased and imported software from other “regions” of the world.

So today I learned my lesson. Never absolutely never buy or get involved in proprietary software. Never buy anything ever again from Nintendo. Never buy another phone from Apple, or Microsoft, or any computer with proprietary software no matter how convenient it may seem in the short term.

Enough is enough.

If I bought the hardware, I want to do whatever I want with it.


February 27, 2010 09:59 PM

February 25, 2010

www.DavidPashley.com/blog

Reducing Coupling between modules

In the past, several of my Puppet modules have been tightly coupled. A perfect example is Apache and Munin. When I install Apache, I want munin graphs set up. As a result my apache class has the following snippet in it:

munin::plugin { "apache_accesses": }
munin::plugin { "apache_processes": }
munin::plugin { "apache_volume": }

This should make sure that these three plugins are installed and that munin-node is restarted to pick them up. The define was implemented like this:

define munin::plugin (
      $enable = true,
      $plugin_name = false,
      ) {

   include munin::node

   file { "/etc/munin/plugins/$name":
      ensure => $enable ? {
         true => $plugin_name ? {
            false => "/usr/share/munin/plugins/$name",
            default => "/usr/share/munin/plugins/$plugin_name"
         },
         default => absent
      },
      links => manage,
      require => Package["munin-node"],
      notify => Service["munin-node"],
   }
}

(Note: this is a slight simplification of the define). As you can see, the define includes munin::node, as it needs the definition of the munin-node service and package. As a result of this, installing Apache drags in munin-node on your server too. It would be much nicer if the apache class only installed the munin plugins if you also install munin on the server.

It turns out that is is possible, using virtual resources. Virtual resources allow you to define resources in one place, but not make them happen unless you realise them. Using this, we can make the file resource in the munin::plugin virtual and realise it in our munin::node class. Our new munin::plugin looks like:

define munin::plugin (
      $enable = true,
      $plugin_name = false,
      ) {

   # removed "include munin::node"

   # Added @ in front of the resource to declare it as virtual
   @file { "/etc/munin/plugins/$name":
      ensure => $enable ? {
         true => $plugin_name ? {
            false => "/usr/share/munin/plugins/$name",
            default => "/usr/share/munin/plugins/$plugin_name"
         },
         default => absent
      },
      links => manage,
      require => Package["munin-node"],
      notify => Service["munin-node"],
      tag => munin-plugin,
   }
}

We add the following line to our munin::node class:

File<| tag == munin-plugin |>

The odd syntax in the munin::node class realises all the virtual resources that match the filter, in this case, any that is tagged munin-plugin. We've had to define this tag ourself, as the auto-generated tags don't seem to work. You'll also notice that we've removed the munin::node include from the munin::plugin define, which means that we no longer install munin-node just by using the plugin define. I've used a similar technique for logcheck, so additional rules are not installed unless I've installed logcheck. I'm sure there are several other places where I can use it to reduce such tight coupling between classes.

Read Comments (2)

February 25, 2010 09:30 AM

davblog - Dave Cross

Homeopathy Petition

We're all, of course, very happy about the results of the House of Commons Science and Technology committee's evidence check on homeopathy. But it's important to realise exactly what has happened. This is a House of Commons committee which has produced a list of recommendations. The government is under no obligation at all to take any notice of those recommendations. Unfortunately, Richard Wiseman's tweet, "yipppeeee it's official, NHS will no longer give people smarties", is likely to be somewhat premature.

So that's why I've set up a petition on the number 10 web site. The petition says:

We the undersigned petition the Prime Minister to Implement the recommendations of the House Commons Science and Technology committee evidence check on Homeopathy.

The House Commons Science and Technology committee has recently undertaken an evidence check on the usefulness of homeopathy and has now published its report.

The conclusions are unequivocal. They say "To maintain patient trust, choice and safety, the Government should not endorse the use of placebo treatments, including homeopathy. Homeopathy should not be funded on the NHS and the MHRA should stop licensing homeopathic products."

The government should implement these recommendations as soon as possible.

If we can get enough people to sign this petition I hope we can send a message to the government letting them know that we support the committee's findings and don't want the NHS's money wasted on nonsense like homeopathy.

Of course, this close to an election, the government is likely to be very wary of making any kind of a statement that might lose them support amongst the woo-mongers. We need to persuade them that skeptical (and rational) voters outnumber the idiots. At the very least, we should be able to get more signatures than this ridiculous petition.

So please sign the petition. And please pass the details on to anyone else who might be interested. The battle has not been won yet.

February 25, 2010 09:09 AM

Karanbir Singh

London Devops mailing list, googlegroups and google accounts

Just signed upto the London Devops mailing list, and thought I'd just point out that one does not need a google account to join a google groups hosted list. Just send an email to the <list-name>+subscribe@googlegroups.com with 'subscribe' in the subject line of the email, and their list managment software will sign you up.

 

So in this case, sending an email with:


To: london-devops+subscribe@googlegroups.com
Subject: Subscribe

 

.. would be all thats needed to get you on there.

 

Google groups has a help page with details on what all services they host which dont need a google account to use.

February 25, 2010 01:49 AM

February 24, 2010

davblog - Dave Cross

NHS Money Wasted on Homeopathy

Don't have time to go into the detail that it deserves, but the House of Commons science and technology committee has published the results of its evidence check on homeopathy. The results won't, of course, come as any surprise to anyone who has been following the debate. But I have to admit to be impressed by the lack of ambiguity in their conclusions. This is paragraph 157:

By providing homeopathy on the NHS and allowing MHRA licensing of products which subsequently appear on pharmacy shelves, the Government runs the risk of endorsing homeopathy as an efficacious system of medicine. To maintain patient trust, choice and safety, the Government should not endorse the use of placebo treatments, including homeopathy. Homeopathy should not be funded on the NHS and the MHRA should stop licensing homeopathic products.

Absolutely no equivocation there.

So what's the next step? When to the homeopathic "hospitals" get closed down? When does the NHS get that money back for real medicine?

Update: The Woo-mongers in the House of Commons don't plan to take this laying down. They've proposed an Early Day Motion criticising the committee's report. Of course, only MPs with no grasp of science will be signing it. If your MP is on this list, then I suggest a strongly worded email might be in order.

February 24, 2010 02:59 PM

February 23, 2010

Richard WM Jones

rich

I just discovered that virt-top is now in Debian.

virt-top (which I wrote) is the far superior alternative to “xentop”. I wrote it after having to manage some servers with only xentop available. I wrote it because xentop was so crap.

Why is virt-top better? It supports more statistics. It works with libvirt so it works with almost every hypervisor not just Xen. It has a much nicer UI and documentation. And it has advanced logging modes so you can leave it running in the background and capture information for your website/database/stats/monitoring.


February 23, 2010 09:46 PM

rich

Guido Günther has built the very latest libguestfs 1.0.84 official packages for Debian.

Previously Ubuntu packages …


February 23, 2010 10:06 AM

February 22, 2010

davblog - Dave Cross

Homeopathic Dilutions

Like many press outlets, the Daily Mail pre-empted the publication of the Science and Technology committee report and published a story yesterday summing up the MPs' findings. Of course the Daily Mail is the home of the gullible reader and a good number of the comments on that story are attempting to defend the woo-mongers.

The Mail often stop taking comments on their stories after about a day (giving no indication that they've done so) and the number of comments on this story has stuck at eighteen since I first saw it last night so we must assume that they won't publish any more.

This is a shame as there's quite a lot of unchallenged nonsense there. In particular, the most recent comment published is from Dave in Basingstoke. Someone previously in the discussion had mentioned the ludicrous amount of dilution in homeopathic solutions. Dave replies with this:

"But scientists point to the fact that the 'cures' are so diluted that the cannot possibly contain even a single molecule of the original substance."

Ha! Maybe a climate change pseudo-scientist would say that, but a chemist never would because it isn't true. If you dilute a solution of anything by a million to one, there will still be thousands of molecules of the substance present in the diluted solution. The body can detect that amount, and work on it.

Let's ignore his childish dig at climate change and get to the meat of his argument. Dave thinks that is you dilute something by a factor of a million then there will still be molecules of the original substance in the solution. And he's right there. No one will argue with that fact at all. But homeopathic remedies aren't million to one dilutions.

Homepathic dilutions are given a number on the "C scale". Each time you dilute something by a factor of a hundred, you get another point on the C scale. A dilution of a hundred to one would be called 1C. Dilute that solution be another hundred to one (that's now ten thousand to one from the original solution) and you get to 2C. Another step to 3C gives us a dilution of a million to one from the original solution. That's about the level of dilution that Dave is talking about.

But homeopaths don't stop there. 3C dilutions are nothing. Remember a key tenet of homeopathy is that the weaker the dilution, the stronger the effect. Homeopaths carry on diluting their solutions again and again and again. Next time you're in Boots have a look at the numbers on the tubes of homeopathic remedies that they sell. You'll see that 30C is a really common dilution. That solution has been diluted by a factor of a hundred to one thirty times. The original solution has been diluted by a factor of one to a number which is one followed by sixty zeroes. That's a huge number. With numbers like that involved, it's perfectly reasonable to say that there is none of the original material left.

Here's an example to help you get to grips with those numbers. The number of water molecules in a swimming pool is going to be around a one followed by thirty two zeroes. One molecule of something else in that pool will be equivalent to a 16C homeopathic remedy. See the Wikipedia entry on homeopathic dilutions for more examples like this.

This is why the slogan for the 10:23 campaign is "There's nothing in it". It's literally true. There is no active ingredient left in any homeopathic remedy that you find.

I'd love it it if Dave from Basingstoke found this entry. It would be great if he could see just how misinformed he is.

February 22, 2010 05:00 PM

Martin A. Brooks

<b>If Apple made computer racks....</b>

* It'd be called the iRack.

* It'd be machined from a solid block of aluminium.

* Units would be 'i', not 'u'. One 'i' will equal 1.1686634u. The mounting holes will be suspiciously Apple shaped.

* The UPS would be built in and would use a non-industry standard connector. You can't change the battery yourself.

* There would be no exposed screws, bolts or hinges. There would be a glowing Apple logo on the front.

* Only equipment bought from the Apple Store can be installed.

* Got a problem with your iRack? Simply take it down to your nearest Apple shop after booking a Genius appointment.

* The built in accelerometers will automatically invalidate your warranty if your rack gets tilted.

February 22, 2010 03:59 PM

Ross Lawley

Planning to Scale

Recently at Global Radio we recently relaunched Heart FM, which now is a conglomeration of 33 local heart station websites, where previously it was 33 individual sites. So to achieve this the team refactored our inhouse CMS to handle these localisations in as sane a way as possible. With the aim being that with the CMS our editors could easily manage and share content across these sites and

February 22, 2010 02:18 PM

Daniel Roseman

Django patterns, part 4: forwards generic relations

My last post talked about how to follow reverse generic relations efficiently. However, there's a further potential inefficiency in using generic relations, and that's the forward relationship.

If once again we take the example of an Asset model with a GenericForeignKey used to point at Articles and Galleries, we can get from each individual Asset to its related item by doing asset.content_object. However, if we have a whole queryset of Assets, doing this:

{% for asset in assets %}
   {{ asset.content_object }}
{% endfor %}

will result in as many queries as there are assets - in fact it's n+m, where n is the number of assets and m is the number of different content types, as you'll have one extra query per type to get the ContentType object. (Although it might be slightly less than that if you've used ContentTypes elsewhere, as the model manager caches lookups on the assumption that they never change once they've been set.)

However, luckily we can make this much more efficient as well, again using a variation of the dictionary technique.

generics = {}
for item in queryset:
    generics.setdefault(item.content_type_id, set()).add(item.object_id)

content_types = ContentType.objects.in_bulk(generics.keys())

relations = {}
for ct, fk_list in generics.items():
    ct_model = content_types[ct].model_class()
    relations[ct] = ct_model.objects.in_bulk(list(fk_list))

for item in queryset:
    setattr(item, '_content_object_cache', 
            relations[content_type_id][object_id])

Here we get all the different content types used by the relationships in the queryset, and the set of distinct object IDs for each one, then use the built-in in_bulk manager method to get all the content types at once in a nice ready-to-use dictionary keyed by ID. Then, we do one query per content type, again using in_bulk, to get all the actual object.

Finally, we simply set the relevant object to the _content_object_cache field of the source item. The reason we do this is that this is the attribute that Django would check, and populate if necessary, if you called x.content_object directly. By pre-populating it, we're ensuring that Django will never need to call the individual lookup - in effect what we're doing is implementing a kind of select_related() for generic relations.

February 22, 2010 11:50 AM

February 21, 2010

Rev. Simon Rumble

Josephine's Cheese: Terrible packaging


Josephine's Traditional Goats Cheese in Ash

Josephine's Cheese, given your site has no contact information, and doesn't even display the product I bought, I'll have to howl into the aether and assume you'll find it when I SEO your arse (given Google doesn't even see your image-only site, that won't be hard).

So I bought your "Traditional Goat's Cheese in Ash" last week. It was delicious. Unfortunately your packaging is abysmal. After very carefully peeling off the label, you're presented with this tube of squished cheese. If you take care from this point, and happen to have scissors or a sharp knife handy, you can just about get a single blob of cheese out ready to serve. More likely you'll end up with more like a tube of cheese toothpaste squirted out onto your serving location.

Great cheese. I won't buy it again in this packaging. Buy your competitors' products and learn.

Contact me

February 21, 2010 11:17 PM

February 20, 2010

Richard WM Jones

rich

Thanks to Benjamin Donnachie for alerting me to this nice bit of kit, a simple, relatively cheap weather station that you attach to your house and plug into the USB port of a PC. (There is an open source project to support this under Linux). The photos on this page should make it clearer how these things are physically attached.

Note in the US these parts are sold for $79. Not quite sure how they translated that into a £100+ price tag in the UK, but there you go.

“1-wire” is a semi-proprietary standard for sending power and low-bandwidth serial data over any two core cable (eg. telephone cable, speaker wire or Cat 5). The main advantages seem to be very long runs, and multiple devices can be connected to a single wire.

The device measures:

Not rainfall — but there are 1-wire-compatible parts for making other measurements.


February 20, 2010 11:13 AM

February 19, 2010

Richard WM Jones

rich

I’m giving a talk at Westminster University, London about libguestfs, guestfish and all the other tools we have been writing.

Date: Thursday 18th March 2010
Time: 19.30
Location: Westminster University, Cavendish Campus, Large lecture theatre (follow signs from the entrance).


February 19, 2010 08:19 PM

rich


See all posts in this series …

Pressing the ’s’ key interrupts the boot. I set the CF card to LBA mode, although I’m not actually sure that made any difference.

 Welcome to minicom 2.3

OPTIONS: I18n
Compiled on Aug 29 2008, 07:17:10.
Port /dev/ttyS0

               Press CTRL-A Z for help on special keys

PC Engines ALIX.2 v0.99h
640 KB Base Memory
261120 KB Extended Memory

01F0 Master 848A CF Card
Phys C/H/S 3936/16/32 Log C/H/S 984/64/32

BIOS setup:

(9) 9600 baud (2) 19200 baud *3* 38400 baud (5) 57600 baud (1) 115200 baud
*C* CHS mode (L) LBA mode (W) HDD wait (V) HDD slave (U) UDMA enable
(M) MFGPT workaround
(P) late PCI init
*R* Serial console enable
(E) PXE boot enable
(X) Xmodem upload
(Q) Quit

Grub didn’t have the right configuration, so I swapped the CF card back over to my Fedora machine and used guestfish to make some quick edits to the grub configuration. (Note: grub2 has a confusingly different configuration from grub1). Also to /etc/inittab so I can get a login prompt on the serial console after boot.

# guestfish -i /dev/sdb

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for help with commands
      'quit' to quit the shell

><fs> ls /boot
System.map-2.6.32-trunk-486
config-2.6.32-trunk-486
grub
initrd.img-2.6.32-trunk-486
initrd.img-2.6.32-trunk-486.bak
vmlinuz-2.6.32-trunk-486
><fs> vi /boot/grub/
Display all 183 possibilities? (y or n) n
><fs> vi /boot/grub/grub.cfg
><fs> vi /etc/inittab

After that, it boots first time! There’s not a lot running …

rich@itx:~$ ps afx
  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
    3 ?        S      0:00  \_ [ksoftirqd/0]
    4 ?        S      0:00  \_ [watchdog/0]
    5 ?        S      0:00  \_ [events/0]
    6 ?        S      0:00  \_ [cpuset]
    7 ?        S      0:00  \_ [khelper]
    8 ?        S      0:00  \_ [netns]
    9 ?        S      0:00  \_ [async/mgr]
   10 ?        S      0:00  \_ [pm]
   11 ?        S      0:00  \_ [sync_supers]
   12 ?        S      0:00  \_ [bdi-default]
   13 ?        S      0:00  \_ [kintegrityd/0]
   14 ?        S      0:00  \_ [kblockd/0]
   15 ?        S      0:00  \_ [kseriod]
   16 ?        S      0:00  \_ [kondemand/0]
   17 ?        S      0:00  \_ [khungtaskd]
   18 ?        S      0:00  \_ [kswapd0]
   19 ?        SN     0:00  \_ [ksmd]
   20 ?        S      0:00  \_ [aio/0]
   21 ?        S      0:00  \_ [crypto/0]
  111 ?        S      0:00  \_ [ksuspend_usbd]
  112 ?        S      0:00  \_ [khubd]
  123 ?        S      0:00  \_ [ata/0]
  124 ?        S      0:00  \_ [ata_aux]
  612 ?        S      0:00  \_ [flush-3:0]
    1 ?        Ss     0:02 init [2]
  206 ?        S<s    0:00 udevd --daemon
  245 ?        S<     0:00  \_ udevd --daemon
  246 ?        S<     0:00  \_ udevd --daemon
  256 ?        S<     0:00 /bin/sh -e /lib/udev/net.agent
 2110 ?        S<     0:00  \_ sleep 1
  255 ?        S<     0:00 /bin/sh -e /lib/udev/net.agent
 2116 ?        S<     0:00  \_ sleep 1
  257 ?        S<     0:00 /bin/sh -e /lib/udev/net.agent
 2113 ?        S<     0:00  \_ sleep 1
  538 ?        Sl     0:00 /usr/sbin/rsyslogd -c4
  564 ?        Ss     0:00 /usr/sbin/cron
  582 ?        Ss     0:00 /usr/sbin/sshd
 1898 ?        Ss     0:00  \_ sshd: rich [priv]
 1936 ?        S      0:00      \_ sshd: rich@pts/0
 1937 pts/0    Ss     0:00          \_ -bash
 2117 pts/0    R+     0:00              \_ ps afx
 1323 ttyS0    Ss+    0:00 /sbin/getty -L ttyS0 38400 vt100
rich@itx:~$ free
             total       used       free     shared    buffers     cached
Mem:        255616      22240     233376          0       1412      12856
-/+ buffers/cache:       7972     247644
Swap:            0          0          0
rich@itx:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 5
model           : 10
model name      : Geode(TM) Integrated Processor by AMD PCS
stepping        : 2
cpu MHz         : 498.110
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de pse tsc msr cx8 sep pge cmov clflush mmx mmxext 3dnowext 3dnow
bogomips        : 996.22
clflush size    : 32
cache_alignment : 32
address sizes   : 32 bits physical, 32 bits virtual
power management:
rich@itx:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
tmpfs                 125M     0  125M   0% /lib/init/rw
udev                   10M   44K   10M   1% /dev
tmpfs                 125M  4.0K  125M   1% /dev/shm
rootfs                957M  458M  451M  51% /

February 19, 2010 11:49 AM

February 18, 2010

Karanbir Singh

Using screen automagically

A few days back I blog'd about my .screenrc and what it could do etc, something to get people interested in screen and start using it. But in order to really use 'screen' properly, dont forget the 'screen -xRR' in your .bash_profile.

That will make sure that when you ssh to a remote machine, it auto joins an existing screen - and if none exists then creates a new screen instance and drops you inside it. Try it, most people like it :)

- KB

February 18, 2010 03:45 PM

Richard WM Jones

rich

It’s quite popular to bash the Windows Registry in non-technical or lightly technical terms. I’ve just spent a couple of weeks reverse engineering the binary format completely for our hivex library and shell which now supports both reading and writing to the registry. So now I can tell you why the Registry sucks from a technical point of view too.

1. It’s a half-arsed implementation of a filesystem

It’s often said that the Registry is a “monolithic file”, compared to storing configuration in lots of discrete files like, say, Unix does under /etc. This misses the point: the Registry is a filesystem. Sure it’s stored in a file, but so is ext3 if you choose to store it in a loopback mount. The Registry binary format has all the aspects of a filesystem: things corresponding to directories, inodes, extended attributes etc.

The major difference is that this Registry filesystem format is half-arsed. The format is badly constructed, fragile, endian-specific, underspecified and slow. The format changes from release to release of Windows. Parts are undocumented, seemingly to the Windows developers themselves (judging by the NT debug symbols that one paper has reproduced). Parts of the format waste space, while in other parts silly “optimizations” are made to save a handful of bytes (at the cost of making access much more complex).

2. Hello Microsoft programmers, a memory dump is not a file format

The format is essentially a dump of 32 bit C structures in a C memory heap. This was probably done originally for speed, but it opens the format to all sorts of issues:

  1. You can hide stuff away in unused blocks.
  2. You can create registries containing unreachable blocks or loops or pointers outside the heap, and cause Windows to fail or hang (see point 3).
  3. It’s endian and wordsize specific.
  4. It depends on the structure packing of the original compiler circa 1992.

3. The implementation of reading/writing the Registry in Windows NT is poor

You might expect, given how critical the Registry is to Window’s integrity, that the people who wrote the code that loads it would have spent a bit of time thinking about checking the file for consistency, but apparently this is not done.

  1. All versions of Windows tested will simply ignore blocks which are not aligned correctly.
  2. Ditto, will ignore directory entries which are not in alphabetical order (it just stops reading at the first place it finds a subdirectory named B > next entry A).
  3. Ditto, will ignore file entries which contain various sorts of invalid field.

The upshot of this is you can easily hide stuff in the Registry binary which is completely invisible to Windows, but will be apparent in other tools. From the point of view of other tools (like our hivex tool) we have to write exactly the same bits that Windows would write, to be sure that Windows will be able to read it. Any mistakes we make, even apparently innocuous ones, are silently punished.

Compare this to using an established filesystem format, where everyone knows the rules, and consistency (eg. fsck/chkdsk) matters.

Writing sucks too, because the programmers don’t correctly zero out fields, so you’ll find parts (particularly the Registry header) which contain random bits of memory, presumably kernel memory, dumped into the file. I didn’t find anything interesting there yet …

I also found Registries containing unreachable blocks (and not, I might add, ones which I’d tried modifying). I find it very strange that relatively newly created Windows 7 VMs which don’t have any sort of virus infection, have visible Registry corruption.

4. Types are not well specified

Each registry field superficially is typed, so REG_SZ is a string, and REG_EXPAND_SZ is, erm, also a string. Good, right? No, because what counts as a “string” is not well-defined. A string might be encoded in 7 bit ASCII, or UTF-16-LE. The only way to know is to know what versions of Windows will use the registry.

Strings are also stored in REG_BINARY fields (in various encodings), but also raw binary data is stored in these fields.

Count yourself lucky if you only access official Microsoft fields though because some applications don’t confine themselves to the published types at all, and just use the type field for whatever they feel like.

And what’s up with having REG_DWORD (little-endian of course) and REG_DWORD_BIG_ENDIAN, and REG_QWORD, but no REG_QWORD_BIG_ENDIAN?

5. Interchange formats are not well specified

A critical part of installing many drivers is making registry edits, and for this a text format (.REG) is used along with the REGEDIT program. The thing is though that the .REG format is not well-specified in terms of backslash escaping. You can find examples of .REG files that have both:

"Name"="\Value"

and

"Name"="\\Value"

In addition the encoding of strings is again not specified. It seems to depend on the encoding of the actual .REG file, as far as anyone can tell. eg. If your .REG file itself is UTF-16-LE, then REGEDIT will encode all strings you define this way. Presumably if you transfer the .REG file to a system that changes the encoding, then you’ll get different results when you load the registry.

6. The Registry arrangement is a mess

Take a look at this forensic view of interesting Registry keys (PDF). List of mounted drives? HKEY_LOCAL_MACHINE\SYSTEM\MountedDevices. But what the user sees is stored in HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\MountPoints2\CPC\Volume\. Unless you mean USB devices which might be in the above list, or in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\USBSTOR. And the entries in those lists are by no means obvious — containing impenetrable binary fields and strange Windows paths.

If you browse through the Registry some time you’ll see it’s a giant accreted mess of non-standardized, overlapping information stored in random places. Some of it is configuration, much of it is runtime data. This is a far cry from /etc/progname.conf in Linux.

7. The Registry is a filesystem

Back to point 1, the Registry is a half-assed, poor quality implementation of a filesystem. Importantly, it’s not a database. It should be a database! It could benefit from indices to allow quick lookups, but instead we have to manually and linearly traverse it.

This leads to really strange Registry keys like:

\ControlSet001\Control\CriticalDeviceDatabase\pci#ven_1af4&dev_1001&subsys_00000000

which are crying out to be implemented as indexed columns in a real database.

8. Security, ha ha, let’s pretend

Despite the fact that the Registry is just a plain file that you can modify using all sorts of external tools (eg. our hivex shell), you can create “unreadable” and “unwritable” keys. These are “secure” from the point of view of Windows, unless you just modify the Registry binary file directly.

Windows also uses an unhealthy dose of security-through-obscurity. It hides password salts in the obscure “ClassName” field of the Registry key. The “security” here relies entirely on the fact that the default Windows REGEDIT program cannot view or edit the ClassName of a key. Anyone with a binary editor can get around this restriction trivially.

9. The Registry is obsolete, sorta

Well the good news is the Registry is obsolete. The bad news is that Vista has introduced another, incompatible way to store application data, in AppData/Local and AppData/LocalLow directories, but that Windows Vista and Windows 7 continue to rely on the Registry for all sorts of critical data, and it doesn’t look like this mess is going to go away any time soon.

* * *

libguestfs on Fedora now provides the tools you need to manage the Registry in Windows virtual machines. For more details, see hivexsh and virt-win-reg documentation.

Update

Thanks to all who commented. There is further discussion here on Reddit and here on Hacker News (including discussion of inaccuracies in what I wrote). If you want to look at our analysis code, it’s all here in our source repository. For further references on the Registry binary format, follow the links in the hivex README file.


February 18, 2010 02:49 PM

Karanbir Singh

Blog Update to b2evo 3.3

Just upgraded to b2evo 3.3, lets see if this is any better than the b2evo 2.2 install I had on here. The db migration and content move went through fairly painlessly. Just need to tweak the skin back to my own liking a bit.

First thing though - not sure if I like this 'rich text' editing textbox area that now seems to be the default in b2evo 3.3; might live with it though.

- KB

February 18, 2010 02:15 AM

February 17, 2010

Richard WM Jones

rich

Radio 4 is running a brilliant series called A History of the World in A Hundred Objects. However should you wish to download this series so you can listen to it a convenient place and time, how should you do it? Certainly not through the over-complex official website that’s for sure.

But with some shell script you can grab the MP3 files and listen to this wonderful educational series at your pleasure:

i=1
e=100
y=2010
m=1
d=18
while [ $i -le $e ]; do
  date=$(printf "%04d%02d%02d" $y $m $d)
  for x in a c; do
    for y in 1000 1005; do
      wget -nc http://downloads.bbc.co.uk/podcasts/radio4/ahow/ahow_$date-$y$x.mp3
    done
  done
  i=$(($i+1))
  d=$(($d+1))
  if [ $m -eq 1 ]; then
    endm=32
  elif [ $m -eq 2 ]; then
    endm=29
  elif [ $m -eq 3 ]; then
    endm=32
  elif [ $m -eq 4 ]; then
    endm=31
  fi
  if [ $d -eq $endm ]; then
    m=$(($m+1))
    d=1
  fi
done

Update

A reader has pointed me to the full list of MP3 (podcast) files available from the BBC.


February 17, 2010 10:49 PM

rich


Left-to-right: serial, 3xnetwork, USB, power. Note no graphics.

I bought an Alix 2D3 from LinITX. These are interesting little single board computers, featuring a slow AMD Geode processor, VIA chipset, no graphics, three 100 Mbps network ports, and a serial port. They boot from a CF card and cost around £100 including VAT and delivery.

I’ve installed Debian on the CF card, and had to deal with a Connector Conspiracy which has prevented me from mating my laptop with the serial console. Why Debian? Fedora is too bloated because of all the desktop/Python/*Kit crap that comes by default, whereas Debian can boot comfortably in 256 MB of RAM. Or indeed much less — in my previous job I used to install Debian virtual servers by default with 64 MB of RAM, and sometimes 32 MB. That’s more than an order of magnitude smaller than the minimal Fedora install. Second choice was FreeBSD which has a much more streamlined and visibly faster kernel than Linux, but Debian seems to be working out so far.

More to follow …


February 17, 2010 08:54 PM

davblog - Dave Cross

David Wright and Twitter

you can put lipstick on a scum-sucking pig but it is still a scum-sucking pig

That's apparently what Labour MP David Wright said about the Tory party on Twitter on Monday. I say "apparently" because he claims he didn't say it. He says that his tweets have been tinkered with. He says that the words "scum-sucking" have been inserted by someone else. But how likely does that seem? I think that we need to consider three alternatives.

  1. The tweet was changed by someone before it was passed to Twitter. This would imply that Wright has someone else who actually runs his Twitter account for him and who decided to have a little fun. No such Twitter assistant has been mentioned by Wright, so I think we can dismiss this possibility.
  2. The tweet has been edited since it was initially submitted to Twitter. Now Twitter doesn't offer users the facility to edit old tweets so this would involve someone hacking into the the Twitter database, tracking down one tweet by an obscure British politician and changing that. I know that if I wanted to start hacking into Twitter and changing tweets, David Wright would be a long way down my list of potential targets. I think we can dismiss this possibility too.
  3. David Wright is being dishonest. Given that his first response to people complaining about the tweet was "Oh dear, upsetting Tories again. Must've hit a nerve" and that the tampering story only emerged later, I think this is a far more likely explanation.

Which leaves us wondering why he thought that lying about it like this was a good idea. This time I think we have two options to consider:

  1. Wright knows nothing about how any of this stuff works and doesn't realise how ridiculous his story sounds.
  2. Wright knows how ridiculous he sounds, but believes that none of his constituents understand this internet stuff so they won't realise that he's talking nonsense.

To be honest, I don't think that either of those alternatives show Wright in a particularly good light. In the first option, Wright is an elected representative who is apparently jumping on the Twitter bandwagon without understanding the first thing about the tools he is using. And in the second he's someone who doesn't mind deliberately lying to the electorate as long as he thinks there's a good chance that he won't be found out.

I have way of knowing which of those two alternatives is an accurate description of what happened here. But if I was voting in Telford I'd be having a serious think about whether I wanted a man like Wright representing me in Parliament.

Update: David Wright has issued a statement. He says:

My Twitter account has been hacked into and changed," he said. "I have demanded that Twitter provide me with the identity of whoever has inputted into my site. I will make a further statement when that information is available, and I will be seeking a meeting with ministers to discuss the general protection of blog sites.

To which I can only respond "Bwah Ha Ha!"

February 17, 2010 08:05 PM

February 15, 2010

Daniel Roseman

Django patterns part 3: efficient generic relations

I've previously talked about how to make reverse lookups more efficient using a simple dictionary trick. Today I want to write about how this can be extended to generic relations.

At its heart, a generic relationship is defined by two elements: a foreign key to the ContentType table, to determine the type of the related object, and an ID field, to identify the specific object to link to. Django uses these two elements to provide a content_object pseudo-field which, to the user, works similarly to a real ForeignKey field. And, again just like a ForeignKey, Django can helpfully provide a reverse relationship from the linked model back to the generic one, although you do need to explicitly define this using generic.GenericRelation to make Django aware of it.

As usual, though, the real inefficiency arises when you are accessing reverse relationships for a whole lot of items - say, each item in a QuerySet. As with reverse foreign keys, Django will attempt to resolve this relationship individually for each item, resulting in a whole lot of queries. The solution is a little different, though, to take into account the added complexity of generic relations.

Assuming the list of items is all of one type, the first step is to get the content type ID for this model. From that, we can get the object IDs, and then do the query in one go. From there, we can use the dictionary trick described last time to associate each item with its particular related items. In this example, we have an Asset model that is the generic model, holding assets for other models such as Article and Gallery.

articles = Article.objects.all()
article_dict = dict([(article.id, article for article in articles])

article_ct = ContentType.objects.get_for_model(Article)
assets = Asset.objects.filter(
                content_type=article_type, 
                object_id__in=[a.id for a in all_articles]
              )
asset_dict = {}
for asset in assets:
    asset_dict.setdefault(asset.object_id, []).append(asset)
for id, related_items in asset_dict.items():
    article_dict[id]._assets = related_items

This is good as far as it goes, but what about when we have a heterogeneous list of items? That, after all, is the point of generic relations. So what if our starting point is a collection of both Galleries and Articles, and we still want to get all the related Assets in one go? As it turns out, the solution is not massively different: we just need to change the way we key the items in the intermediate dictionary, to record the content type as well as the object ID.

article_ct = ContentType.objects.get_for_model(Article)
gallery_ct = ContentType.objects.get_for_model(Gallery
assets = Asset.objects.filter(
                Q(content_type=article_type, 
                    object_id__in=[a.id for a in articles]) |
                Q(content_type=gallery_ct, object_id__in=[g.id for g in galleries])
             )

    asset_dict = {}
    for asset in assets:
        asset_dict.setdefault("%s_%s" % (asset.content_type_id, asset.object_id), 
                                         []).append(asset)

    for article in articles:
        article._assets = asset_dict.get("%s_%s" % (article_ct.id, article.id), None)

    for gallery in galleries:
        gallery._assets = asset_dict.get("%s_%s" % (gallery_ct.id, gallery.id), None)

Here we first of all use Q objects to get all the assets of type Article with IDs in the list of articles, plus all those of type Gallery with IDs in the list of galleries. Then we use the fact that each asset knows its own content type ID to create the dictionary keys in the form <content_type_id>_<object_id>. Finally, we loop through the articles and the galleries separately to get the relevant assets for each item.

February 15, 2010 08:04 PM

Richard WM Jones

rich

By popular demand, I’ve built Ubuntu packages for the latest libguestfs, guestfish, virt-inspector, virt-cat and virt-df:

http://www.annexia.org/tmp/debian/

These are experimental. If they break, you get to keep both pieces.

One point in particular is there is no Perl Sys::Virt package in Ubuntu, which means you have to supply this yourself, else virt-inspector etc. won’t work.


February 15, 2010 06:35 PM

February 12, 2010

Richard WM Jones

rich


What’s new in this release? Far too much to mention.

Strangely enough the guestfish manual page in this release is exactly 3333 lines long …

$ man guestfish
~
~
~
libguestfs-1.0.84                 2010-02-12                      guestfish(1)
lines 3273-3333/3333 byte 118136/118136 (END)   (press RETURN)

February 12, 2010 11:11 PM

Daniel Berrange

Controlling guest CPU & NUMA affinity in libvirt with QEMU, KVM & Xen

When provisioning new guests with libvirt, the standard policy for affinity between the guest and host CPUs / NUMA nodes, is to have no policy at all. In other words the guest will follow whatever the hypervisor's own default policy is, which is usually to run the guest on whatever host CPU is available. There are times when an explicit policy may be better, in particular to make the most of a NUMA architecture it is usually desirable to lock a guest to a particular NUMA node so that its memory allocations are always local to the node it is running on, avoiding the cross-node memory transports which have less bandwidth. As of writing, libvirt supports this capability for QEMU, KVM and Xen guests. Even on a non-NUMA system some form of explicit placement across the hosts' sockets, cores & hyperthreads may be desired.

Querying host CPU / NUMA topology

The first step in deciding what policy to apply is to figure out the host's topology is. The virsh nodeinfo command provides information about how many sockets, cores & hyperthreads there are on a host.

# virsh nodeinfo
CPU model:           x86_64
CPU(s):              8
CPU frequency:       1000 MHz
CPU socket(s):       2
Core(s) per socket:  4
Thread(s) per core:  1
NUMA cell(s):        1
Memory size:         8179176 kB

There are a total of 8 CPUs, in 2 sockets, each with 4 cores.

More interesting though is the NUMA topology. This can be significantly more complex, so the data is provided in a structured XML document, as part of the virsh capabilities output

# virsh capabilities
<capabilities>

  <host>
    <cpu>
      <arch>x86_64</arch>
    </cpu>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='2'>
        <cell id='0'>
          <cpus num='4'>
            <cpu id='0'/>
            <cpu id='1'/>
            <cpu id='2'/>
            <cpu id='3'/>
          </cpus>
        </cell>
        <cell id='1'>
          <cpus num='4'>
            <cpu id='4'/>
            <cpu id='5'/>
            <cpu id='6'/>
            <cpu id='7'/>
          </cpus>
        </cell>
      </cells>
    </topology>
    <secmodel>
      <model>selinux</model>
      <doi>0</doi>
    </secmodel>
  </host>

 ...removed remaining XML...

</capabilities>

This tells us that there are two NUMA nodes (aka cells), each containing 4 logical CPUs. Since we know there are two sockets, we can obviously infer from this that each socket is in a separate node, not that this really matters for the what we need later. If we're intending to run a guest with 4 virtual CPUs, we can that it will be desirable to lock the guest to physical CPUs 0-3, or 4-7 to avoid non-local memory accesses. If our guest workload required 8 virtual CPUs, since each NUMA node only has 4 physical CPUs, better utilization may be obtained by running a pair of 4 cpu guests & splitting the work between them, rather than using a single 8 cpu guest.

Deciding which NUMA node to run the guest on

Locking a guest to a particular NUMA node is rather pointless if that node does not have sufficient free memory to allocation for local memory allocations. Indeed, it would be very detrimental to utilization. The next step is to ask libvirt what the free memory is on each node, using the virsh freecell command

# virsh freecell 0
0: 2203620 kB

# virsh freecell 1
1: 3354784 kB

If our guest needs to have 3 GB of RAM allocated, then clearly it needs to be run on NUMA node (cell) 1, rather than node 0, sine the latter only has 2.2 GB available.

Locking the guest to a NUMA node or physical CPU set

We have now decided to run the guest on NUMA node 1, and referring back to the capabilities data about NUMA topology, we see this node has physical CPUs 4-7. When creating the guest XML we can now specify this as the CPU mask for the guest. Where the guest virtual CPU count is specified

<vcpus>4</vcpus>

we can now add the mask

<vcpus cpuset='4-7'>4</vcpus>

As mentioned earlier, this works for QEMU, KVM and Xen guests. In the QEMU/KVM case, libvirt will use the sched_setaffinity call at guest startup, while in the Xen case libvirt will instruct XenD to make an equivalent hypercall.

Automatic placement using virt-install

This walkthrough illustrated the concepts in terms of virsh commands. If writing a management application using libvirt, you would of course use the equivalent APIs for looking up this data, virNodeGetInfo, virConnectGetCapabilities and virNodeGetCellsFreeMemory. The virt-install provisioning tool has done exactly this and provides a simple way to automatically apply a 'best fit' NUMA policy when installing guests. Quoting its manual page

   --cpuset=CPUSET
   
   Set which physical cpus the guest can use. "CPUSET" is a comma separated
   list of numbers, which can also be specified in ranges. Example:

     0,2,3,5     : Use processors 0,2,3 and 5
     1-3,5,6-8   : Use processors 1,2,3,5,6,7 and 8

   If the value ?auto? is passed, virt-install attempts to automatically
   determine an optimal cpu pinning using NUMA data, if available.

So if you have a NUMA machine and use virt-install, simply always add --cpuset=auto whenever provisioning a new guest.

Fine tuning CPU affinity at runtime

The scheme outlined above is focused on the initial guest placement at boot time. There may be times where it becomes necessary to fine-tune the CPU affinity at runtime. libvirt/virsh can cope with this need too, via the vcpuinfo and vcpupin commands. First, the virsh vcpuinfo command gives you the latest data about where each virtual CPU is running. In this example, rhel5xen is a guest on a Fedora KVM host which I used for RHEL5 Xen package maintenance work. It has 4 virtual CPUs and is being allowed to run on any host CPU

# virsh vcpuinfo rhel5xen
VCPU:           0
CPU:            3
State:          running
CPU time:       0.5s
CPU Affinity:   yyyyyyyy

VCPU:           1
CPU:            1
State:          running
CPU Affinity:   yyyyyyyy

VCPU:           2
CPU:            1
State:          running
CPU Affinity:   yyyyyyyy

VCPU:           3
CPU:            2
State:          running
CPU Affinity:   yyyyyyyy

Now lets say the I want to lock each of these virtual CPUs to a separate host CPU in the 2nd NUMA node.

# virsh vcpupin rhel5xen 0 4

# virsh vcpupin rhel5xen 1 5

# virsh vcpupin rhel5xen 2 6

# virsh vcpupin rhel5xen 3 7

The vcpuinfo command can be used again to confirm the placement

# virsh vcpuinfo rhel5xen
VCPU:           0
CPU:            4
State:          running
CPU time:       32.2s
CPU Affinity:   ----y---

VCPU:           1
CPU:            5
State:          running
CPU time:       16.9s
CPU Affinity:   -----y--

VCPU:           2
CPU:            6
State:          running
CPU time:       11.9s
CPU Affinity:   ------y-

VCPU:           3
CPU:            7
State:          running
CPU time:       14.6s
CPU Affinity:   -------y

And just to prove I'm not faking it all, here's KVM process running on the host and its /proc status

# grep pid /var/run/libvirt/qemu/rhel5xen.xml 
<domstatus state='running' pid='4907'>

# grep Cpus_allowed_list /proc/4907/task/*/status
/proc/4907/task/4916/status:Cpus_allowed_list: 4
/proc/4907/task/4917/status:Cpus_allowed_list: 5
/proc/4907/task/4918/status:Cpus_allowed_list: 6
/proc/4907/task/4919/status:Cpus_allowed_list: 7

Future work

The approach outlined above relies on the fact that the kernel will always try to allocate memory from the NUMA node that matches the one the guest CPUs are executing on. While this is sufficient in the simple case, there are some pitfalls along the way. Between the time the guest is started & memory is allocated, RAM from the NUMA node in question may have been used up causing the OS to fallback to allocating from another node. For this reason, if placing guests on NUMA nodes, it is crucial that all guests running on the host have fixed placement, with none allowed to float free. In some wierd and wonderful NUMA topologies (hello Itanium !) there can be NUMA nodes which have only CPUs, and/or only RAM. To cope with these it will be necessary to extend libvirt to allow an explicit memory allocation node to be listed in the guest configuration.

February 12, 2010 05:19 PM

davblog - Dave Cross

The Pod Delusion

I've never really been a fan of podcasts. It's related to the reason why I rarely listen to talk radio - I think that audio is a really inefficient way to absorb information. I could take in the information at three or four times the speed if I could read it.

But there's an exception to every rule and I've finally found a podcast that is interesting enough to download and listen to every week.

It's called The Pod Delusion and it describes itself as "a podcast about interesting things". It's run by James O'Malley and each week it features four or five reports by James or his many co-conspirators. Subjects usually stay pretty close to the kind of skeptic-atheist-lefty areas that I often write about here, but there are also interesting other reports - for example, Mark Thompson's look at ITV's move away from regional broadcasting from episode 17.

Some subjects are covered very often. There's usually something taking the piss out of homoeopathy (often including plugs for the 10:23 campaign) and as far as I'm concerned you just just can't knock homoeopaths enough. There's also strong coverage of skeptic events in London (see Crispian Jago's report from TAM London in episode 4 or James O'Malley reporting from last night's BHA Darwin Lecture in today's new episode - including an interview with Richard Dawkins).

If you're the kind of person who is interested in the kinds of things I write about here then you'll probably find The Pod Delusion interesting too. I recommend you give it a listen.

February 12, 2010 01:42 PM

February 11, 2010

Dean Wilson

BSD Magazine - A decent read

While looking for an OpenBSD baseball cap on the BSD stalls at FOSDEM I was given a couple of issues of the BSD Magazine to flick through - and it's a lot better than I'd hoped.

As most of the UK Linux magazines have become very desktop focused it's nice to see some actual low-level code - packaging for OpenBSD, writing sound drivers for your NetBSD NSLU2, custom Jabber components and basic GDB were all in the two issues I skimmed. While it's not the dearly departed Sysadmin Magazine, and it could do with an editor or two - much as I could, it is a decent read and I'm considering a subscription.

February 11, 2010 10:11 PM

February 10, 2010

Richard WM Jones

rich


The Fedora build system Koji runs on RHEL 5 Xen and builds everything on top of that using mock. This can lead to some rather difficult to debug problems where your package builds and tests OK for you on your local Rawhide machine, but fails in Koji. The reason it can fail in Koji is because it is running on the RHEL 5 Linux kernel (2.6.18). Your program, or any program you depend on during the build, might make assumptions about system calls that work for a Rawhide kernel, but fail for a RHEL 5 kernel.

Reproducing these bugs is difficult. Hopefully this posting should be a good start.

Koji is doing roughly the equivalent of this command (on a RHEL 5 host):

mock -r fedora-rawhide-x86_64 --rebuild your.src.rpm

That command doesn’t work straightaway. There are some things you have to install and upgrade first before that works:

  1. Install RHEL 5 (or use CentOS or another no-cost alternative).
  2. Install EPEL.
  3. Install or update yum, python-hashlib, mock.
  4. Use /usr/sbin/vigr to add yourself to the “mock” group.
  5. The version of RPM from RHEL 5 is too old to understand the new xz-based compression format used by Rawhide RPMs. You have to build the Fedora 12 RPM (NB: Fedora 13 RPM definitely doesn’t work because it requires Python 2.6). The Fedora 12 specfile is a starting point, but it won’t work directly. There are some small changes you have to make, and a single patch to the source code, but hopefully those will be obvious. Update: Here for a short time is a scratch build of the Fedora 12 RPM made to work on RHEL 5.4. Once you’ve built the new rpm RPM (!), install it.

At this point you can use the mock command above to test-build SRPMs using the unusual RHEL 5 kernel / Rawhide userspace combination.

February 10, 2010 05:01 PM

davblog - Dave Cross

Comments

In the middle of last week I upgraded this site to Movable Type 5. And at (I assume) the same time comments stopped working.

It seems that it was some incompatibility between the MT5 Javascript that drives the comment system and the old MT4 templates that I was using. I've now rebuilt the site using MT5 templates and everything seems to be working.

Sorry for the inconvenience. If you've tried to leave a comment recently and just got a never-ending spinner then I hope you'll try again now.

And a tip to people upgrading to MT5 - rebuild your templates.

Update: Looks like I spoke too soon. Commenting is still completely broken. I'm looking into it. Sorry for any inconvenience.

Update 2: Ok, that seems to be fixed now. It was a corrupt database table. Should work ok now. Feel free to comment away.

February 10, 2010 04:26 PM

February 09, 2010

Richard WM Jones

rich


Unbootable virtual machine? Here are three useful guestfish commands to help. (You can also consider using virt-rescue).

1. Edit /boot/grub/grub.conf

$ guestfish -i Rawhide

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for help with commands
      'quit' to quit the shell

><fs> ls /boot/
System.map-2.6.32.1-9.fc13.x86_64
System.map-2.6.32.3-21.fc13.x86_64
System.map-2.6.33-0.40.rc7.git0.fc13.x86_64
config-2.6.32.1-9.fc13.x86_64
config-2.6.32.3-21.fc13.x86_64
config-2.6.33-0.40.rc7.git0.fc13.x86_64
[...]

Use the “edit”, “emacs” or “vi” commands to edit grub.conf:

><fs> vi /boot/grub/grub.conf

From here you can change the boot kernel, change it to boot in single user mode, enable the grub menu, remove the “rhgb quiet” option so you can see boot messages, and much more.

2. Look at the /init script

When the kernel panics because it cannot mount root, it’s often because the initrd or initramfs is broken in some way. Two commands help here:

><fs> initrd-list /boot/initramfs-2.6.33-0.40.rc7.git0.fc13.x86_64.img | less
><fs> initrd-cat /boot/initramfs-2.6.33-0.40.rc7.git0.fc13.x86_64.img init | less

The first command lists all the files in the initrd, which lets you see if the right drivers got included for the (virtual) hardware. The second command lists out the init script — which is the shell script that runs first before the OS proper starts to boot.

February 09, 2010 12:12 PM

February 08, 2010

Richard WM Jones

rich


I just added this section to the libguestfs man page. Always good to be upfront about your mistakes.

   LIBGUESTFS GOTCHAS
   Gotcha (programming): "A feature of a
   system [...] that works in the way it is documented but is
   counterintuitive and almost invites mistakes."

   Since we developed libguestfs and the associated tools, there are
   several things we would have designed differently, but are now stuck
   with for backwards compatibility or other reasons.  If there is ever a
   libguestfs 2.0 release, you can expect these to change.  Beware of
   them.

   Autosync / forgetting to sync.
       When modifying a filesystem from C or another language, you must
       unmount all filesystems and call "guestfs_sync" explicitly before
       you close the libguestfs handle.  You can also call:

        guestfs_set_autosync (handle, 1);

       to have the unmount/sync done automatically for you when the handle
       is closed.  (This feature is called "autosync",
       guestfs_set_autosync q.v.)

       If you forget to do this, then it is entirely possible that your
       changes won’t be written out, or will be partially written, or
       (very rarely) that you’ll get disk corruption.

       Note that in guestfish(3) autosync is the default.  So quick and
       dirty guestfish scripts that forget to sync will work just fine,
       which can make this extra-puzzling if you are trying to debug a
       problem.

   Read-only should be the default.
       In guestfish(3), --ro should be the default, and you should have to
       specify --rw if you want to make changes to the image.

       This would reduce the potential to corrupt live VM images.

       Note that many filesystems change the disk when you just mount and
       unmount, even if you didn’t perform any writes.  You need to use
       "guestfs_add_drive_ro" to guarantee that the disk is not changed.

   guestfish command line is hard to use.
       "guestfish disk.img" doesn’t do what people expect (open "disk.img"
       for examination).  It tries to run a guestfish command "disk.img"
       which doesn’t exist, so it fails, and it fails with a strange and
       unintuitive error message.  Like the Bourne shell, we should have
       used "guestfish -c command" to run commands.

February 08, 2010 04:53 PM