November, 2011 - psyphi.net blog

I’ve been fiddling around recently with some stuff which I’m sure I covered in my CS degree 16 (Gah! Really?) years ago but have had to re-educate myself about. Namely a few different implementations of sort. I implemented three types in Perl with some reacquaintance of the mechanisms via Wikipedia. I found a few existing examples of sort algorithms in Perl but they generally looked a bit unpleasant so like all programmers I decided to write my own (complete with errors, as an exercise to the reader). Here I’ve also added some notes, mostly to myself, which are largely unsubstantiated because I haven’t measured memory, speed or recursion depth, for example (though these are well-documented elsewhere).

1. Bubble Sort

#!/usr/bin/perl -w
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = bubblesort($set);
print "out: @{$sorted}\n";

sub bubblesort {
  my ($in)     = @_;
  my $out      = [@{$in}];
  my $length   = scalar @{$in};
  my $modified = 1;

  while($modified) {
    $modified = 0;
    for my $i (0..$length-2) {
      if($out->[$i] > $out->[$i+1]) {
	($out->[$i], $out->[$i+1]) = ($out->[$i+1], $out->[$i]);
	$modified = 1;
      }
    }
  }

  return $out;
}

Bubblesort iterates through each element of the list up to the last but one, comparing to the next element in the list. If it’s greater the values are swapped. The process repeats until no modifications are made to the list.

Pros: doesn’t use much memory – values are swapped in situ; doesn’t perform deep recursion; is easy to read

Cons: It’s pretty slow. The worst-case complexity is O(n²) passes (for each value in the list each value in the list is processed once).

2. Merge Sort

#!/usr/bin/perl
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = mergesort($set);
print "out: @{$sorted}\n";

sub mergesort {
  my ($in) = @_;

  my $length = scalar @{$in};
  if($length < = 1) {
    return $in;
  }

  my $partition = $length / 2;
  my $left      = [@{$in}[0..$partition-1]];
  my $right     = [@{$in}[$partition..$length-1]];

  return merge(mergesort($left), mergesort($right));
}

sub merge {
  my ($left, $right) = @_;
  my $merge = [];

  while(scalar @{$left} || scalar @{$right}) {
    if(scalar @{$left} && scalar @{$right}) {
      if($left->[0] < $right->[0]) {
	push @{$merge}, shift @{$left};
      } else {
	push @{$merge}, shift @{$right};
      }
    } elsif(scalar @{$left}) {
      push @{$merge}, shift @{$left};
    } elsif(scalar @{$right}) {
      push @{$merge}, shift @{$right};
    }
  }
  return $merge;
}

Mergesort recurses through the list, in each iteration breaking the remaining list in half. Once broken down to individual elements, each pair of elements/lists at each depth of recursion is reconstituted into a new ordered list and returned.

Pros: generally quicker than bubblesort; O(n log n) complexity.

Cons: quite difficult to read

3. Quicksort

#!/usr/bin/perl
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = quicksort($set);
print "out: @{$sorted}\n";

sub quicksort {
  my ($in) = @_;

  my $length = scalar @{$in};
  if($length < = 1) {
    return $in;
  }

  my $pivot = splice @{$in}, $length / 2, 1;
  my $left  = [];
  my $right = [];

  for my $v (@{$in}) {
    if($v < $pivot) {
      push @{$left}, $v;
    } else {
      push @{$right}, $v;
    }
  }

  return [@{quicksort($left)}, $pivot, @{quicksort($right)}];
}

Quicksort is probably the best known of all the sort algorithms out there. It’s easier to read than Mergesort, though arguably still not as easy as Bubblesort, but it’s a common pattern and its speed makes up for anything lacking in readability. At each iteration a pivot is selected and removed from the list. The remaining list is scanned and for element lower than the pivot is put in a new “left” list and each greater element is put into a new “right” list. The returned result is a merged recursive quicksort of the left list, the pivot and the right list.

In this example I’m picking the middle element of the list as the pivot. I’m sure there are entire branches of discrete mathematics dealing with how to choose the pivot based on the type of input data.

Pros: (One of?) the fastest sort algorithm(s) around; Reasonably efficient memory usage and recursion depth. Average O(n log n) complexity again (worst is O(n²)).

Perhaps it’s worth noting that in 25-odd years of programming computers I’ve only ever had to examine the inner workings of sort routines as part of my degree – never before, nor after, but it’s certainly brought back a few memories.

For a while now I’ve been using Test::Perl::Critic as an integral part of my daily coding. Test::Perl::Critic wraps Perl::Critic into a set of tests which can be automatically run against a distribution. Perl Critic implements Damien Conway’s set of standard Perl Best Practices.

I’m not going to go into the arguments of whether they’re good or bad rules right now. Suffice to say I use nearly all of them and my code has improved because of it. Anyway, one of the rules states you shouldn’t use parentheses for core method calls like length(), int() or rand(), so most of the time I don’t, but today I wanted to do this:

my @array = map { int rand * 45 } (1..10);

The results come out as an array of zeros. Why? Well it’s not a broken PRNG, that’s for sure. It’s a simple case of misunderstood parsing, operator precedence and Perl internals. Looking closely * 45 isn’t what you’d expect, *45 refers to the typeglob named 45 in the main package. I actually have no idea how this is cast into a number in Perl but however it’s done, it evaluates to zero. rand 0 seems to exhibit the same behaviour as rand 1, yielding a random float between zero and 1. int rand 0 will always be zero.

So? Well to stop the parser from taking *45 as an argument to rand you need to upset PerlCritic and add those parentheses back in:

my @array = map { int rand() *45 } (1..10);

and it works now so back to work!

[ The astute amongst you will realise you can just say int rand(45). I know, but that doesn’t work nearly so well as a mildly interesting example. I blame PerlCritic ;) ]

Month: November 2011

3 sorts of sort

Neat Perl Gotcha