Adding two SATA SSD devices to a RAID10 six SATA array.

I needed a bit more space on my NAS’s RAID10 array which was 6 x 2TB drives.To be honest iI’m not sure why I am using RAID10. The array was initially on my main Linux workstation. Then I decided I needed a separate NAS as my existing NAS was way too small, so I moved it across. RAID5/6 would give me more space. But I guess I like the flexibility and the ability to survive failure of two disks even though it does reduce space by 50%!

The server is running Debian and is headless. I know there’s loads of NAS OSs. But I do prefer to do things myself. The boot/root partitions are on a single SATA SSD card and the (now) eight drives are plugged into a Seagate Smart Host bus Adaptor H240.

I found two “consumer” Crucial 2TB SSD disks on Black Friday for £65 which seemed reasonable. I did wonder how well two SSDs would do in a RAIDarray with spinning drives. Let’s find out….! So this is what I did (which I am blogging about so I do not have to remember next time!). Interestingly the last time I blogged about growing a RAID array was quite some time ago. Also that’s the only time I’ve ever got comments on my blog (127 to be exact!). I think growing RAID arrays with MDADM was quite new back then.

The procedure to add new devices and grow the raid array is:

Procedure to grow an array with two new disks

  • physically add new disks
  • Partition
  • add disks to array
  • increase number of active disks in array to grow the array
  • grow filesystem

The new disks are /dev/sda /dev/sdd

Partition

For GPT use sgdisk to copy the partition table from one disk to a new one

Backup first of course!

sgdisk /dev/sdX -R /dev/sdY

sgdisk -G /dev/sdY

“The first command copies the partition table of sdb to sda/d

sgdisk /dev/sdb -R /dev/sda

sgdisk /dev/sdb -R /dev/sdd

Now randomise the GUID of each device:

sgdisk -G /dev/sdd

sgdisk -G /dev/sda

Add new devices

mdadm --add /dev/md1 /dev/sdd1 /dev/sda1

mdadm: added /dev/sdd1

mdadm: added /dev/sda1

These are added as spares as the number of active devices does not change. Let’s check:

# cat /proc/mdstat
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
md1 : active raid10 sda1[8](S) sdd1[7](S) sdc1[5] sdg1[0] sde1[6] sdh1[3] sdb1[1] sdf1[4]
5860141056 blocks super 1.2 512K chunks 2 near-copies [6/6] [UUUUUU]
bitmap: 4/22 pages [16KB], 131072KB chunk

Increase number of devices to include new ones

mdadm --grow --raid-devices=8 --backup-file=/mnt/USB/grown_md1.bak /dev/md1

The –backup-file creates a backup file in case the power goes. Not essential as I have a UPS. Also the filesystem is still mounted. However, to speed it up I turned off all services except the DNS/DHCP server. The less disk activity the quicker the reshape will finish.

The reshaping took about 20 hours. Much less than I thought.

Now we need to resize the filesystem. Unmounting is not essential for growing a ext4 filesystem (although it is for shrinking), but hey it’s a lot safer so I shut off everything and unmounted it

systemctl stop smbd
systemctl stop docker
umount /mnt/storage

resize2fs /dev/md1

This gave an error that the filesystem needed checking first.

e2fsck -f /dev/md1

resize2fs /dev/md1

This took about 30 minutes.

Finishing off.

Now let’s get it all back up and running.

mount /mnt/storage

systemctl start docker

systemctl start samba

systemctl start smbd

systemctl start docker

systemctl restart docker

The entry for the mdadm device does not need updating. Previous it did but I think that’s when I was using the 0.9 metadata block.

mdadm --detail --scan

cat /etc/mdadm/mdadm.conf

One bizarre issue was that when I restarted all the docker containers they downloaded a new image rather than using existing images. I have no idea why that happened.

Migrating My Root Drive Raid1 Array to a Pair of NVME Drives.

I’ve always tried to keep upgrading my own Linux boxes. I enjoy it and as I found out a few years ago if I do not regularly keep updating hardware then I completely lose knowledge of doing so.
My latest project is to move all the services from my workstation to a separate box. I’ve got a few RPis around the house running a few services, but I’ve always used my workstation as a games machine/server/development/everything else. In fact for a number of years I stopped using Linux as a workstation at all and this machine was a headless server.
Because of this cycle of continuous upgrades this computer has existed for probably twenty years. Always running some form of Linux (mainly Gentoo).
Currently it’s using some space heater of a server motherboard with a pair of E5 2967v2 on a Supermicro X9dri-LN4 motherboard with 128Gb of ECC DDR3 RAM (very cheap!). That’s fine over winter (my office has no heating) but I do need to fully transfer all the services to the low power Debian box under my desk instead!
Anyway this computer boots from pair of SATA SSD drives in a RAID1 array, with a six disk RAID10 array for data. That array needs to be replaced by a single large drive when I’ve finished moving services to a new machine….!
The motherboard is too old to EFI boot from NVME drives. However, whilst browsing Reddit I came across some people talking about using an adaptor card to add in four NVME drives and using bifurcation toggle each drive proper access to the four PCIe channels that NVME devices need. So x4,x4,x4,x4 instead of x16.
This was not supported on this board, but turns out Supermicro did release a new BIOS that does support bifurcation.
So I bought the card they suggest and a pair of 1TB NVMe drives. The drives are only PCIev3 as that’s all the motherboard supports. PCIe is backwards/forwards compatible. but PCIev4 drives are considerably more expensive than PCIeV3 ones. I may as well get a pair of these, then when I upgrade to a PCIeV4 motherboard the available drives will likely be larger and cheaper!
– Asus M.2. X16 Gen 4
– 2 x WD Blue SN570 NVMe SSD 1TB
The adaptor and cards came. The adapter has got a lovely heatsink that sandwiches the drives in with a small low noise fan.
The adaptor took ten minutes to install. When booted up the BIOS setup disk was a little tricky to enable bifurcation as the slots are numbered from the bottom. This one was CPU1/slot 1.
I had to recompile the kernel to add NVME support, but once booted the pair of drives were there.
After many, many years of using /dev/sdX to refer to storage devices (I was using SCSI hardware before SATA), it does seem a little strange to be running parted on /dev/nvme1n1 then getting partition devices like /dev/nvme1n1p2
I know I should likely move to ZFS, but I’m knowledgeable enough about mdadm not to completely mess up things! ..and replacing a pair of RAID1 devices is just so easy with mdadm.

Workflow is:
– partition drive.
– add to raid array as spare
– fail drive to be removed and then remove it.
– Wait until raid1 array is synced again.
– repeat with second drive
– resize array, then resize the filesystem

procedure

fdisk /dev/nvme0n1

We can use fdisk again as fdisk is GPT aware. Previously we’d always used parted. But I prefer fdisk as I know it!
– partition
– Label the drive as GPT
– Make 256mb partition and mark as EFI boot.
– Make 2nd partition for the rest of the drive and mark that as type Linux raid.

Now add that drive to our raid1 array. For some reason it was not added to a spare, but was instead immediately synced to make a 3 drive raid1 array. I think this is because I previously created this array as a three drive array (for reasons I forget). I guess that’s stored in the metadata of the array.

mdadm /dev/md127 --add /dev/nvme0n1p2

We can watch the progress:

watch cat /proc/mdstat

Once completed we can fail and then remove the drive

mdadm --manage /dev/md127 --fail /dev/sdh3
mdadm /dev/md127 --remove /dev/sdh3

Then let’s update our mdadm.conf file

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

Then remove the oldlines:

vi /etc/mdadm/mdadm.conf

Finally let’s wipe that old drive of RAID data so the array does not try to reassemble it to the array.

wipefs -a /dev/sdh3

A reboot is a good idea now to ensure the array is correctly assembled (and new PAT reread).

Now let#s copy the partition table (PAT) to the second new drive.

sgdisk /dev/nvme0n1 -R /dev/nvme1n1

Then randomise the UIDs

sgdisk -G /dev/nvme1n1

Check all is OK

Now repeat adding the second new drive:

mdadm /dev/md127 --add /dev/nvme1n1p2
mdadm --manage /dev/md127 --fail /dev/sdg3
mdadm /dev/md127 --remove /dev/sdg3
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
wipefs -a /dev/sdg3

resize after adding new devices

mdadm –grow –size max /dev/md127
df -h
Then resize the filesystem
resize2fs -p /dev/md127

benchmarks

dd if=/dev/zero of=/home/chris/TESTSDD bs=1G count=2 oflag=dsync 

2+0 records in
2+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 4.7162 s, 455 MB/s

dd if=/dev/zero of=/mnt/storage/TESTSDD bs=1G count=2 oflag=dsync

2+0 records in
2+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 10.0978 s, 213 MB/s

dd if=/home/chris/TESTSDD of=/dev/null bs=8k                                                              !10032

262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 0.666261 s, 3.2 GB/s

dd if=/mnt/storage/TESTSDD of=/dev/null bs=8k                                                             !10034

262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 0.573111 s, 3.7 GB/s

I did think that the write speed would be faster. But I guess dd is not the most accurate of benchmark tools.

Use of Backup in Anger!

I lost my entire RAID10 array yesterday. In a fit of “too much noise in office” I removed the hot swap SCSI array box from my workstation box, attached it to a wooden platform, and suspended it in a large plastic box using an old inner tube from my bike. This really reduced the noise, however, like a moron, I did not attach the scsi cable properly and 2 drives got kicked from the array. That was not a problem. However, what was, is when I tried to re-assemble the array without checking the cable. I ended up wiping one of the raid partitions. Still not a major issue, except I subsequently zeroed out the superblock of the missing drive in order to add it back in. Anyway, that was my array lost!

As a main backup strategy I use an homebrewed incremental Rsync script to back up my Linux workstation everynight to a 2Tb ReadyNas+ box (Macs are backed up with a combination of Time Machine and Super Duper). So now I had a chance to test it out. So after recreating the array and copying the data across the network I was back up and running!

mdadm --create /dev/md1 --chunk=256 -R  -l 10 -n 4 -p f2 /dev/sd[abcd]3 
echo 300000 >> /sys/block/md1/md/sync_speed_max 
watch cat /proc/mdstat 
mkfs.xfs /dev/md1
mount /home
rsync -avP /mnt/backup/SCOTGATEHome/current/ /home/

It took about 1 hour to sync, and then 3 hours to copy across the 156Gb of files over the network.

It all worked great, and I’m very pleased to know that my backup strategy is working!

Now back to complete the “silent and suspended hard drive array!”

Linux RAID Wiki

A lot of people keep coming back to a post from 2006 I wrote about expanding a raid5 array. Although that post is still very much relevant 2 years is a looong time in the Linux kernel world, and things have moved on. Rather than use the information here you’re better off going to the Linux Raid wiki. This is very much an active and up to date resource for playing/creating and using Linux software raid and MDADM. It’s contains the thoughts and the resources of many people from the very interesting Linux Kernel raid mailing list.

Linux Raid Wiki

There’s a lot of outdated stuff concerning Linux software raid out there. Over the last 6 months there’s been a concerted effort by people on the Linux Raid mailing list to improve this situation. The continuing results can be seen on this Wiki:

http://linux-raid.osdl.org/

It’s a great resource already, and getting better. Go have a look if you want to know about the current status of just what cool stuff you can do.

RAID10 over 6 devices

I’ve been struggling to find the best layout for my 6 drive RAID10 device: I settled for near with 2 copies, and a 1M chunksize. As far as I understand this I get the r/w speed equivalent to RAID0 with this. It means that each block has two copies, which are striped across 2 drives. So you get the RAID0 speed, with the redundancy of RAID1. Plus since this is kernel RAID10, it’s a lot easier to create and more flexible than a mirror of raid0 devices.
mdadm --create /dev/md1 --chunk=1024 --level=raid10 --layout=f2 --raid-devices=6 /dev/sd[defghi]1

So far the speed seems to be OK. Nothing at all spectacular though. At the moment I think my limitation is that I am running a 64bit PCI-X u320 card in a 32bit PCI slot since my motherboard only has these…. I need to get myself a PCI-X dual socket 604 motherboard. Trouble is these are quite expensive…..! In my quest to silence the machine I replaced the fan in my Graphics card with a quieter one. Unfortunately whilst doing so I knocked of a small component. The card works fine except that I get visual aretefacts on any hw accelerated graphics now. I’ve alwasy steered away from upgrading my motherboard due to the fact I would need to also buy a new GPU to go with PCI-E slots. But if this is now damaged perhaps I should splurge out. Hmmm.

Raid 5 to 6 conversion possible?

A recent post on Raj’s blog made me think about giving RAID6 a play. This then makes me wonder whether a RAID 5 to 6 conversion is possible? I guess this is completely impossible without a complete reformatting. But there again growing a RAID 5 array is also a similar experience, and this is now possible. I would suggest that both require adjustment of stripes. or am I really talking out of my behind?

I recently used mdadm’s new grow function to increase my 3 drive RAID5 array to 4 drives. This worked perfectly. I trust my backups, so am quite happy to risk its integrity to convert to RAID6. Hmmmm!