Category Archives: Know How

Debian Jessie: bye bye bind9 + dnssec-tools, hello PowerDNS

I recently upgraded my DNS server to Debian Jessie. In fact I reinstalled it from scratch and used puppet to install and configure all the required components. This DNS server, running bind9, is the authoritative nameserver for nethuis.nl.

nethuis.nl uses DNSSEC. To apply DNSSEC I used dnssec-tools, which gives you tools like zonesigner, rollerd and donuts to sign, roll and check your DNSSEC enabled zones. Two years ago I had a hard time setting this up, hitting various bugs in dnssec-tools 1.13-1 from Debian Wheezy. I ended up running a quite stable setup after packaging dnssec-tools 1.14 and using a patched version of zonesigner that didn’t increase the serial of the zone.

While installing the same setup on Debian Jessie, I noticed that dnssec-tools wasn’t in Jessie because of a bug in rollerd. I decided to install the dnssec-tools 1.14 package I used before on Debian Wheezy. This all seemed fine until I receive this email from my daily donuts run:

undefined method Net::DNS::RR::new_from_hash at /usr/lib/x86_64-linux-gnu/perl5/5.20/Net/DNS/RR.pm line 791.
Net::DNS::RR::AUTOLOAD("Net::DNS::RR", "rname", "hostmaster.nethuis.nl", "serial", 2014081039, "class", "IN", "expire", 1814400, ...) called at /usr/share/perl5/Net/DNS/ZoneFile/Fast.pm line 201
Net::DNS::ZoneFile::Fast::parse("file", "nethuis.nl.signed", "origin", "nethuis.nl.", "soft_errors", 1, "on_error", CODE(0x4ec0698)) called at /usr/sbin/donuts line 338

This thread indicated there are more related issues in the dnssec-tools package.

Time to re-evaluate. Debian Jessie is frozen, dnssec-tools didn’t get in and there is not much conversation going on in bugreport #754704 that kicked dnssec-tools out of testing. I also can’t update the signed zones as long as this is broken and the current signed zone is valid until 3 weeks from now. 😥

OpenDNSSEC looked like an alternative. I could have also used the tools that come with bind9 to sign, roll and check my zones. But I liked to try something new, PowerDNS.

# apt-get install pdns-server

As a previous bind9 user, the easiest way was to put all zone configuration from my original named.conf in /etc/powerdns/bindbackend.conf. I was amazed. It just worked. 😀

The nethuis.nl zone was still a pre-signed DNSSEC zone. While reading the PowerDNS documentation I found out that PowerDNS is able to do “Front-signing”, which is an amazing feature. PowerDNS does the signing part on-the-fly. There is no need to re-sign the zone every time you make a change to the zone.

First of all I changed the filename in /etc/powerdns/bindbackend.conf to the unsigned one. After that I created a database to manage the DNSSEC keys, added a line to the PowerDNS configuration to use this database and restarted PowerDNS.

# pdnssec create-bind-db /var/lib/powerdns/bind-dnssec-db.sqlite3
# echo "bind-dnssec-db=/var/lib/powerdns/bind-dnssec-db.sqlite3" >> /etc/powerdns/pdns.d/pdns.simplebind.conf
# systemctl restart pdns

I liked to keep the KSK and ZSKs I was already using for my zone, so I imported those.

# pdnssec import-zone-key nethuis.nl Knethuis.nl.+008+00754.private KSK
# pdnssec import-zone-key nethuis.nl Knethuis.nl.+008+43743.private ZSK
# pdnssec import-zone-key nethuis.nl Knethuis.nl.+008+63186.private ZSK
# pdnssec deactivate-zone-key nethuis.nl 3
# pdnssec rectify-zone nethuis.nl
# dig +short +dnssec nethuis.nl SOA
ns1.nethuis.nl. hostmaster.nethuis.nl. 2014081039 28800 3600 1814400 600
SOA 8 2 600 20150115000000 20141225000000 43743 nethuis.nl. lqH6nrHf6YPcLv2TgQgC4gOI4gOGORsmfj/LDJAhu+GpWpiFTnQGtj08 I2TocYQ0jwkoar370quZyvKNAyjTBGNUw6rOxdjbxAn8DhMpBPi7TMfq PP7NXJLkxbx2aIW9r1C0iMk5WAYbi01bEsJY014WiX+s+QdRDPwWaanZ zFI=

That’s it. I’m really happy PowerDNS integrated DNSSEC in it’s product instead of having an additional toolset to manage DNSSEC pre-signed zones.

Update

On January 19th, 20:39:59 UTC, it got completely out of hand. The images below from dnsviz.net showed me the nethuis.nl zone was expired on all the Authoritative DNS slaves.

nethuis.nl-dnssec-issues

Hovering with my mouse over the purple lines showed me the expired status:

nethuis.nl-dnssec-issues2

While the nethuis.nl zone hosted on the Authoritative DNS master was completely fine:

nethuis.nl-dnssec-issues3

What was going on here? 😕

It was clear that the slaves didn’t transfer the zone after it was re-signed by the Authoritative DNS master. According to RFC 1996 the SOA record should be increased if you want the Authoritative DNS slaves to update their zones. This is something that was clearly not done in my case.

I found the SOA-EDIT setting. My current SERIAL is configured in the YYYYMMDDSS format, so I configured the SOA-EDIT setting to use INCEPTION-INCREMENT.

# pdnssec set-meta nethuis.nl SOA-EDIT INCEPTION-INCREMENT

This overrules the SERIAL that is configured in the on-disk zone-file. Every Thursday after the zone is re-signed the SERIAL is automatically increased and all Authoritative DNS slaves will transfer the new zone.

SSD caching using Linux and bcache

A couple of manufacturers are selling solutions to speed up your big HDD using a relative small SSD. There are techniques like:

But there is also a lot of development on the Linux front. There is Flashcache developed by Facebook, dm-cache by VISA and bcache by Kent Overstreet. The last one is interesting, because it’s a patch on top of the Linux kernel and will hopefully be accepted upstream some day.

Hardware setup

In my low power homeserver I use a 2TB Western Digital Green disk (5400 RPM). To give bcache a try I bought a 60GB Intel 330 SSD. Some facts about these data-carriers. The 2TB WD can do about 110 MB/s of sequential reads/writes. This traditional HDD does about 100 random operations per second. The 60GB Intel 330 can sequentially read about 500 MB/s and write about 450 MB/s. Random reads are done in about 42.000 operations per second, random writes in about 52.000. The SSD is much faster!

The image below shows the idea of SSD caching. Frequently accessed data is cached on the SSD to gain better read performance. Writes can be cached on the SSD using the writeback mechanism.

Prepare Linux kernel and userspace software

To be able to use bcache, there are 2 things needed:

  1. A bcache patched kernel
  2. bcache-tools for commands like make-bcache and probe-bcache

I used the latest available 3.2 kernel. The bcache-3.2 branch from Kent’s git repo merged successfully. Don’t forget to enable the BCACHE module before compiling.

On my low power home server I use Debian. Since there is no bcache-tools Debian package available yet, I created my own. Fortunately damoxc already packaged bcache-tools for Ubuntu once.

Debian package: pommi.nethuis.nl/…/bcache/
Git web: http://git.nethuis.nl/?p=bcache-tools.git;a=summary
Git: http://git.nethuis.nl/pub/bcache-tools.git

Bcache setup

Unfortunately bcache isn’t plug-and-play. You can’t use bcache with an existing formatted partition. First you have to create a caching device (SSD) and a backing device (HDD) on top of two existing devices. Those devices can be attached to each other to create a /dev/bcache0 device. This device can be formatted with your favourite filesystem. The creation of a caching and backing device is necessary because it’s a software implementation. Bcache needs to know what is going on. For example when booting, bcache needs to know what devices to attach to each other. The commands for this procedure are shown in the image below.

After this I had a working SSD caching setup. Frequently used data is stored on the SSD. Accessing and reading frequently used files is much faster now. By default bcache uses writethrough caching, which means that only reads are cached. Writes are being written directly to the backing device (HDD).

To speed up the writes you have to enable writeback caching. But you have to take in mind, there is a risk of losing data when using a writeback cache. For example when there is a power failure or when the SSD dies. Bcache uses a fairly simple journalling mechanism on the caching device. In case of a power failure bcache will try to recover the data. But there is a chance you will end up with corruption.

echo writeback > /sys/block/sda/sda[X]/bcache/cache_mode

When writes are cached on the caching device, the cache is called dirty. The cache is clean again, when all cached writes have been written to the backing device. You can check the state of the writeback cache via:

cat /sys/block/sda/sda[X]/bcache/state

To detach the caching device from the backing device run the command below (/dev/bcache0 will still be available). This can take a while when the write cache contains dirty data, because it must be written to the backing device first.

echo 1 > /sys/block/sda/sda[X]/bcache/detach

Attach the caching device again (or attach another caching device):

echo [SSD bcache UUID] > /sys/block/sda/sda[X]/bcache/attach

Unregister the caching device (can be done with or without detaching) (/dev/bcache0 will still be available because of the backing device):

echo 1 > /sys/fs/bcache/[SSD bcache UUID]/unregister

Register the caching device again (or register another caching device):

echo /dev/sdb[Y] > /sys/fs/bcache/register

Attach the caching device:

echo [SSD bcache UUID] > /sys/block/sda/sda[X]/bcache/attach

Stop the backing device (after unmounting /dev/bcache0 it will be stopped and removed, don’t forget to unregister the caching device):

echo 1 > /sys/block/sda/sda[X]/bcache/stop

Benchmark

To benchmark this setup I used two different tools. Bonnie++ and fio – Flexible IO tester.

 

Bonnie++

Unfortunately Bonnie++ isn’t that well suited to test SSD caching setups.

This graph shows that I’m hitting the limit on sequential input and output in the HDD-only and SSD-only tests. The bcache test doesn’t show much difference to the HDD-only test in this case. Bonnie++ isn’t able to warm up the cache and all sequential writes are bypassing the write cache.

In the File metadata tests the performance improves when using bcache.

 

fio

The Flexible IO tester is much better to benchmark these situations. For these tests I used the ssd-test example jobfile and modified the size parameter to 8G.

HDD-only:
seq-read: io=4084MB, bw=69695KB/s, iops=17423, runt= 60001msec
rand-read: io=30308KB, bw=517032B/s, iops=126, runt= 60026msec
seq-write: io=2792MB, bw=47642KB/s, iops=11910, runt= 60001msec
rand-write: io=37436KB, bw=633522B/s, iops=154, runt= 60510msec

SSD-only:
seq-read: io=6509MB, bw=110995KB/s, iops=27748, runt= 60049msec
rand-read: io=1896MB, bw=32356KB/s, iops=8088, runt= 60001msec
seq-write: io=2111MB, bw=36031KB/s, iops=9007, runt= 60001msec
rand-write: io=1212MB, bw=20681KB/s, iops=5170, runt= 60001msec

bcache:
seq-read: io=4127.9MB, bw=70447KB/s, iops=17611, runt= 60001msec
rand-read: io=262396KB, bw=4367.8KB/s, iops=1091, runt= 60076msec
seq-write: io=2516.2MB, bw=42956KB/s, iops=10738, runt= 60001msec
rand-write: io=2273.4MB, bw=38798KB/s, iops=9699, runt= 60001msec

In these tests the SSD is much faster with random operations. With the use of bcache random operations are done a lot faster in comparison to the HDD-only tests. It’s interesting that I’m not able to hit the sequential IO limits of the HDD and SSD in these tests. I think this is because my CPU (Intel G620) isn’t powerful enough for these tests. fio hits the IO limits of the SSD in an another machine with a Intel i5 processor.

Less CPU overhead with Qemu-KVM from Debian Wheezy

An interesting thing happened last week when I upgraded qemu-kvm from version 0.12.5 (Debian Squeeze) to 1.1.2 (Debian Wheezy). After a reboot (shutdown and start) of all my VM’s, they are using less CPU in total! I noticed this from the stats Collectd is collecting for me.

I’m running about 5 Debian Linux VM’s on my low power home server (Intel G620, 8G DDR3, DH67CF). Most of the time the VM’s are idle. As you can see in the graph below the CPU usage dropped. In particular the System CPU usage. The Wait-IO usage is mostly from Collectd, saving all the stats.

Looking a bit further I also noticed that the Local Timer Interrupts and Rescheduling Interrupts are halved.

They’ve done a nice job at Qemu-KVM!

Testing a D-Link Green Switch

Since a while I’ve been monitoring the power consumption of devices in my home using a power meter from BespaarBazaar.nl (advised by Remi). This power meter is a good one because it is very precise. It starts measuring at 0.2 Watt.

I needed an Ethernet switch to connect my TV, NMT and PS3 to my home network. While searching for a proper switch, I came across DLinkGreen.com. It looked promising. The Green Calculator, a 8.7MB Flash app which is using a lot of CPU (hello D-Link! is this Green?!? what about HTML5?), showed me I could save 70,98% of energy (2h, 1-5 ports > 28.7Wh D-Link Green vs. 99Wh Conventional per day) using D-Links Green technology.

I couldn’t find any green switches from other manufacturers so gave it a try. I bought a D-Link DGS-1005D. It’s a 5-ports unmanaged Gigabit Ehternet switch, supporting IEEE802.3az (Energy Efficient Ethernet), IEEE802.3x (Flow Control), 9000 bytes Jumbo Frames and IEEE802.1p QoS (4 queues).

So I did some tests using the power meter. As reference I used a HP Procurve 408 (8 ports 100Mbit switch).

HP Procurve 408

Port # 1 2 3 4 5 Watt 24h 2h + 22h idle kWh annually
Adapter 1.4 33.6 33.6 12.264
4.4 105.6 105.6 38.544
m 4.9 117.6 106.6 38.909
m m 5.4 129.6 107.6 39.274
m m m 5.9 141.6 108.6 39.639
m m m m 6.4 153.6 109.6 40.004
m m m m m 6.8 163.2 110.4 40.296

D-Link DGS-1005D

Port # 1 2 3 4 5 Watt 24h 2h + 22h idle kWh annually
Adapter 0.0 0.0 0.0 0.0 🙂
1.1 26.4 26.4 9.636
g 1.6 38.4 27.4 10.001
g m 1.8 43.2 27.8 10.147
g m g 2.1 50.4 28.4 10.366
g m g g 2.5 60 29.2 10.658
g m g g g 2.9 69.6 30 10.950
g m m g g 2.7 64.8 29.6 10.804
g m m m g 2.5 60 29.2 10.658
g m m m m 2.3 55.2 28.8 10.512

m = 100 Mbit, g = 1 Gbit

First of all it’s interesting to see that the power adapter from HP is using 1.4 watts on it’s own already. Besides that it’s nice to know that a 100Mbit port uses less energy than a Gigabit port. The Green Calculator is quiet right in my case. I’m saving about 72~74% of energy.

ext3 overhead on 1TB storage

Recently I bought a portable harddrive from Western Digital. The Western Digital Elements (WDE1U10000E) carries 1 Terabyte of space and can be connected via USB 2.0. 1TB, for just 99,00 Euro (2008-12-27). According to Wikipedia and the SI standard the drive must contain 1,000,000,000,000 bytes (1TB). fdisk shows us:

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes

So that is correct.

The disk is preformatted FAT32. After mounting the disk, df shows me there is actually 976283280 KiB (932GiB) available. This is about 999714078720 bytes. It looks like 490807296 bytes (468MiB) is gone, but it must be used for File Allocation Tables.

Because FAT32 is crappy old and I use Linux and want a journaling filesystem, I will reformat the device with ext3. After setting the right partition table type via fdisk.

$ mkfs.ext3 -m0 /dev/sda1
mke2fs 1.41.3 (12-Oct-2008)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
61054976 inodes, 244190000 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
7453 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 36 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

After a minute or 10 (yeah! USB 2.0) it was all done. First of all 2646016 bytes (2584 KiB) is not formatted (1000204886016 – (244190000 * 4096)). After mounting the disk, df shows me this time there is 961432072 KiB (917 GiB) available. This is less then FAT32, but we have a journaling filesystem now. 15327928 (97676000 – 961432072) KiB is used for that. But why and how?

dumpe2fs /dev/sda1 shows us:

dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          0bdd2888-06fc-4b22-a6e5-987ac65236ee
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              61054976
Block count:              244190000
Reserved block count:     0
Free blocks:              240306876
Free inodes:              61054965
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      965
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Sun Dec 28 14:22:22 2008
Last mount time:          Sun Dec 28 14:35:20 2008
Last write time:          Sun Dec 28 14:35:20 2008
Mount count:              1
Maximum mount count:      36
Last checked:             Sun Dec 28 14:22:22 2008
Check interval:           15552000 (6 months)
Next check after:         Fri Jun 26 15:22:22 2009
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      0bda5622-6cc2-4a1a-8135-c3f810580d43
Journal backup:           inode blocks
Journal size:             128M
  • /dev/sda1 has 7453 block groups.
  • Inode size is 256 bytes.
  • 8192 inodes for each block group.

7453 * 256 * 8192 makes 15630073856 bytes (15263744 KiB) for inode space.

15327928 – 15263744 = 64184 unexplained KiB left

Besides one primary superblock, 18 extra backup superblocks are stored on the disk. A superblock is 256 bytes, though it is stored in a 4 KiB block. 19 * 4096 makes 77824 bytes (76 KiB).

64184 – 76 = 64108 unexplained KiB left

If someone has an explanation for it, please leave a reply.

61054976 inodes means there can be stored over 61 million files on the formatted 917 GiB. This is way too much for me. 10% of it is enough too, so there will also be less space needed for storing the inodes. Formatting the disk with option -i 131072 sould better fit me.