Bookmarks for April 25th through April 29th

These are my links for April 25th through April 29th:

Active Directory + Linux account integration

Firstly a note of warning. I’ve done this mostly using CentOS but there’s no reason it shouldn’t work just as well on other distributions. I’ve gleaned a lot of this information by scouring a lot of other resources around the internet, FAQs, newsgroups etc. but as far as I can remember I wasn’t able to find a coherent article which described all of the required pieces of the puzzle.

Secondly the objective of this article is to have unified accounting across Windows & Linux, or at least as close as possible. We’re going to use Microsoft Active Directory, Kerberos, Samba, Winbind, pam and nsswitch. We’re also going to end up with consistent uids and gids across multiple linux clients.

/etc/samba/smb.conf

[global]
	workgroup = PSYPHI
	realm = PSYPHI.LOCAL
	security = ADS
	allow trusted domains = No
	use kerberos keytab = Yes
	log level = 3
	log file = /var/log/samba/%m
	max log size = 50
	printcap name = cups
	idmap backend = idmap_rid:PSYPHI=600-20000
	idmap uid = 600-20000
	idmap gid = 600-20000
	template shell = /bin/bash
	winbind enum users = Yes
	winbind enum groups = Yes
	winbind use default domain = Yes

/etc/krb5.conf

[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = PSYPHI.LOCAL
 dns_lookup_realm = true
 dns_lookup_kdc = true
 ticket_lifetime = 24h
 forwardable = yes

[realms]
 EXAMPLE.COM = {
  kdc = kerberos.example.com:88
  admin_server = kerberos.example.com:749
  default_domain = example.com
 }

 PSYPHI.LOCAL = {
 }

[domain_realm]
 .example.com = EXAMPLE.COM
 example.com = EXAMPLE.COM

 psyphi.local = PSYPHI.LOCAL
 .psyphi.local = PSYPHI.LOCAL
[appdefaults]
 pam = {
   debug = false
   ticket_lifetime = 36000
   renew_lifetime = 36000
   forwardable = true
   krb4_convert = false
 }

Next we join the machine to the AD domain – it’s necessary to specify a user with the right privileges. It also prompts for a password.

net ads join -U administrator

We can check things are working so far by trying to create a kerberos ticket using an existing username. Again it prompts us for a password.

kinit (username)

Then klist gives us output something like this:

Ticket cache: FILE:/tmp/krb5cc_0
Default principal: username@PSYPHI.LOCAL

Valid starting     Expires            Service principal
04/28/10 10:57:32  04/28/10 20:57:34  krbtgt/PSYPHI.LOCAL@PSYPHI.LOCAL
	renew until 04/29/10 10:57:32


Kerberos 4 ticket cache: /tmp/tkt0
klist: You have no tickets cached

Cool, so we have a machine joined to the domain and able to use kerberos tickets. Now we can tell our system to use winbind for fetching account information:

/etc/pam.d/system-auth-ac

auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        sufficient    pam_krb5.so use_first_pass
auth        required      pam_deny.so

account     required      pam_unix.so broken_shadow
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 500 quiet
account     [default=bad success=ok user_unknown=ignore] pam_krb5.so
account     required      pam_permit.so

password    requisite     pam_cracklib.so try_first_pass retry=3
password    sufficient    pam_unix.so md5 shadow nullok try_first_pass use_authtok
password    sufficient    pam_krb5.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      /lib/security/pam_mkhomedir.so 
session     required      pam_unix.so
session     optional      pam_krb5.so

If we’re on a 64-bit distribution we’ll find that references to /lib need to be switched for /lib64, e.g. /lib64/security/pam_mkhomedir.so . This file will also create new home directories for users if they’re not present during first log-in.

/etc/nsswitch.conf

passwd:     files winbind
shadow:     files winbind
group:      files winbind

hosts:      files dns

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files

netgroup:   nisplus

publickey:  nisplus

automount:  files nisplus
aliases:    files nisplus

Now we need to tell a few services to start on boot

chkconfig smb on
chkconfig winbind on

and start a few services now

service smb start
service winbind start

The Winbind+pam configuration can sometimes take a few minutes to settle down – I occasionally find it’s necessary to wait 5 or 10 minutes before accounts are available. YMMV.

getent passwd

Should now list local accounts (which take precedence) followed by domain accounts. Using ssh to the box as a domain user should make new home directories in /home/PSYPHI/username. If you decide to migrate home directories from /home make sure you change uid and gid to the new domain values for that user, then remove the old local account.

There are a handful of limitations of this approach –

  1. Though usernames and groupnames map ok, linux uids still don’t map to the windows uids so permissions don’t quite work across smb/cifs mounts
  2. The standard linux tools for user & group modification don’t work for domain accounts (adduser/usermod/groupadd/… etc.)
  3. Winbind seems unstable. On a lot of systems I’ve resorted to cronning a service winbind restart every 15 minutes, which seriously sucks
  4. … and probably others too

For debugging /var/log/secure is very useful, as are the samba logs in /var/log/samba/.

A Little SQL “EXPLAIN” example

http://www.flickr.com/photos/banyan_tree/2802823634/sizes/m/

Background: I have a big table called “channel” with a few hundred thousand rows in – nothing vast, but big enough to cause some queries to run slower than I want.

Today I was fixing something else and happened to run

show full processlist;

I noticed this taking too long:

SELECT payload FROM channel  LIMIT 398800,100;

This is a query used by some web-app paging code. Stupid really – payload isn’t indexed and there’s no use of any other keys in that query. Ok – how to improve it? First of all, see what EXPLAIN says:

mysql> explain SELECT payload FROM channel  LIMIT 398800,100;
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows   | Extra |
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
|  1 | SIMPLE      | channel | ALL  | NULL          | NULL | NULL    | NULL | 721303 |       |
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
1 row in set (0.00 sec)

Ok. So the simplest way would be to limit a selection of id_channel (the primary key) then select payloads in that set. First I tried this:

SELECT payload
FROM channel
WHERE id_channel IN (
    SELECT id_channel FROM channel LIMIT 398800,100
);

Seems straightforward, right? No, not really.

ERROR 1235 (42000): This version of MySQL doesn't yet support
  'LIMIT & IN/ALL/ANY/SOME subquery'

Next!

Second attempt, using a temporary table, selecting and saving the id_channels I’m interested in then using those in the actual query:

CREATE TEMPORARY TABLE channel_tmp(
  id_channel BIGINT UNSIGNED NOT NULL PRIMARY KEY
) ENGINE=innodb;

INSERT INTO channel_tmp(id_channel)
  SELECT id_channel
  FROM channel LIMIT 398800,100;

SELECT payload
  FROM channel
  WHERE id_channel IN (
    SELECT id_channel FROM channel_tmp
  );
mysql> explain select id_channel from channel limit 398800,100;
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | channel | index | NULL          | PRIMARY | 8       | NULL | 722583 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)
mysql> explain select payload from channel where id_channel in (select id_channel from channel_tmp);
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
| id | select_type  | table       | type            | poss_keys | key     | key_len | ref  | rows   | Extra       |
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
|  1 | PRIMARY      | channel     | ALL             | NULL      | NULL    | NULL    | NULL | 722327 | Using where |
|  2 | DEP SUBQUERY | channel_tmp | unique_subquery | PRIMARY   | PRIMARY | 8       | func |      1 | Using index |
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
2 rows in set (0.00 sec)

Let’s try a self-join doing all of the above without explicitly making a temporary table. Self-joins can be pretty powerful – neat in the right places..

mysql> explain SELECT payload
  FROM channel c1,
       (SELECT id_channel FROM channel limit 398800,100) c2
  WHERE c1.id_channel=c2.id_channel;
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref           | rows   | Extra       |
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
|  1 | PRIMARY     | derived2   | ALL    | NULL          | NULL    | NULL    | NULL          |    100 |             |
|  1 | PRIMARY     | c1         | eq_ref | PRIMARY       | PRIMARY | 8       | c2.id_channel |      1 |             |
|  2 | DERIVED     | channel    | index  | NULL          | PRIMARY | 8       | NULL          | 721559 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
3 rows in set (0.21 sec)

This pulls out the right rows and even works around the “no limit in subselect” unsupported mysql feature but that id_channel selection in c2 still isn’t quite doing the right thing – I don’t like all the rows being returned, even if they’re coming straight out of the primary key index.

A little bit of rudimentary benchmarking appears to suggest that the self-join is the fastest, followed by the original query at approximately one order of magnitude slower and trailing a long way behind at around another four-times slower than that, the temporary table. I’m not sure how or why the temporary table performance happens to be the slowest – perhaps down to storage access, or more likely my lack of understanding. Some time I might even try the in-memory table too for comparison.

Bookmarks for April 22nd through April 24th

These are my links for April 22nd through April 24th:

Bookmarks for April 17th through April 20th

These are my links for April 17th through April 20th:

Bookmarks for April 14th through April 16th

These are my links for April 14th through April 16th:

Bookmarks for March 17th through April 12th

These are my links for March 17th through April 12th: