An Interview Question

I’d like to share a basic interview question I’ve used in the past. I’ve used this in a number of different guises over the years, both at Sanger and at ONT but the (very small!) core remains the same. It still seems to be able to trip up a lot of people who sell themselves as senior developers on their CVs and demand £35k+ salaries.

You have a list of characters.

  1. Remove duplicates

The time taken for the interviewee to scratch their head determines whether they’re a Perl programmer, or at least think like one – this is an idomatic question in Perl. It’s a fairly standard solution to anyone who uses hashes, maps or associative arrays in any language. It’s certainly a lot harder without them.

The answer I would expect to see would run something like this:

#########
# pass in an array ref of characters, e.g.
# remove_dupes([qw(a e r o i g n o s e w f e r g e r i g e o n k)]);
#
sub remove_dupes {
  my $chars_in  = shift;
  my $seen      = {};
  my $chars_out = [];

  for my $char (@{$chars_in}) {
    if(!$seen->{$char}++) {
      push @{$chars_out}, $char;
    }
  }

  return $chars_out;
}

Or for the more adventurous, using a string rather than an array:

#########
# pass in a string of characters, e.g.
# remove_dupes(q[uyavubnopwemgnisudhjopwenfbuihrpgbwogpnskbjugisjb]);
#
sub remove_dupes {
  my $str  = shift;
  my $seen = {};
  $str     =~ s/(.)/( !$seen->{$1}++ ) ? $1 : q[]/smegx;
  return $str;
}

The natural progression from Q1 then follows. It should be immediately obvious to the interviewee if they answered Q1 inappropriately.

  1. List duplicates
#########
# pass in an array ref of characters, e.g.
# list_dupes([qw(a e r o i g n o s e w f e r g e r i g e o n k)]);
#
sub list_dupes {
  my $chars_in  = shift;
  my $seen      = {};
  my $chars_out = [];

  for my $char (@{$chars_in}) {
    $seen->{$char}++;
  }

  return [ grep { $seen->{$_} > 1 } keys %{$seen} ];
}

and with a string

#########
# pass in a string of characters, e.g.
# list_dupes(q[uyavubnopwemgnisudhjopwenfbuihrpgbwogpnskbjugisjb]);
#
sub list_dupes {
  my $str  = shift;
  my $seen = {};
  $str     =~ s/(.)/( $seen->{$1}++ > 1) ? $1 : q[]/smegx;
  return $str;
}

The standard follow-up is then “Given more time, what would you do to improve this?”. Well? What would you do? I know what I would do before I even started – WRITE SOME TESTS!

It’s pretty safe to assume that any communicative, personable candidate who starts off writing a test on the board will probably be head and shoulders above any other.

If I’m interviewing you tomorrow and you’re reading this now, it’s also safe to mention it. Interest in the subject and a working knowledge of the intertubes generally comes in handy for a web developer. I’m hiring you as an independent thinker!

6 thoughts on “An Interview Question”

  1. Nice one, Nick – thanks. I don’t think I’ve ever been given that one, though I’d probably be happy to see it. I’m sure there’s something very similar in Perl 6, certainly using “uniq @list” for Q1. I bet there’s something particularly snazzy in Haskell or Erlang.

    I’ve had untold amounts of complicated C++ and Java, and once even shell script using sort -u and/or uniq I can’t remember. I do remember the shell-scripter coming unstuck with Q2.

  2. I agree there are a lot of job candidates that will claim advanced knowledge of all the languages under the sun and become un-stuck with simple problems like this. Definitely worth setting a basic coding test when interviewing.

    Also, being taught something in CompSci BSc != being able to do it!

    The second part in Python, for completeness:

    print [k for k in set(the_string) if the_string.count(k) > 1]

  3. Nick, so the follow on question would be. What’s the time complexity of your solution to Q2? Is there a more efficient way of doing this?

    The implication isn’t that this isn’t the “right” way to do it, but that you should be aware of potential optimisations and complexity issues in general.

  4. Also suprised that the shell scripter didn’t get Q2. uniq has an option that lists the number of times that the string occured on the input. A bash solution bahaha:

    sed ‘s/\(.\)/\1 /g’ iii | awk ‘BEGIN{RS=” “}{print $0}’ | sort | uniq -c | grep -v ” 1 ” | awk ‘{print $2}’

    The horrible thing is that when working with large files this is /probably/ more efficient that a lots of solutions. sort can have amazing performance at times.

Comments are closed.