04/29/2023 (Sat) 00:34
I broke it. Not the internet, my PC.
Do any of you know what a RAID-50 array is? Actually, knowing anything about RAID should let you follow the end of this story. Some time ago I bought an Epyc 7343, and the parts to put it on, and an RTX3060 thinking single-GPU-passthrough was a solved problem. Turns out, the Ampere engine refuses those solutions.
Now, during the previous iteration of hardware I'd never heard of "passthrough" so I assumed dual-booting was the only way forward, and because I was tired of linux and windows sitting in separate cathedrals I bought a hardware-RAID card so they would share a logical hard drive, making file sharing a little easier.
Returning to the current Epyc build, I decided after getting non-GPU passthrough to work, and wishing I could have more than one virtual machine running across my thirty-two vCPUs, I set up a trial-run of OpenMediaVault, like the result, and after some further consideration buy a second video card; this time a w5700 Radeon-Pro.
Much difficulties getting the 300w to be not only available but believably (by the card) available (I'm using a single sata-power line for the 150w plug. It's "officially" rated for 54w but is pretty stout, can probably pass more. But still...) and then other troubles, such as getting keyboard/mouse to pass over to the matching video output.
I finally hit a winning combination. virtual-monitor displaying the QXL video output, which brought sound in addition to accepting "direct" keystrokes. Nomachine (.net) to peek in at the accelerated EDID loopback plug's output (Heaven unigene test ran from 140fps to 175+; working nicely, it seemed).
I was ready! I ... couldn't just copy my working game directory from the dual-booted_currently-mounted directory to the virtual machine's filesystem.
SO! I tried copying it to the NAS virtual machine but there was a permissions issue. SO I copied it to my home directory, adjusted the permissions, and proceeded to copy the 15+GB directory to the OVM NAS.
Then everything stopped.
Oh, the windows were still there, but the window manager had become completely unresponsive, and the xterm window insisted /bin/ls was not actually a file, but only a figment of my fevered typing.
It seems that that copy job overheated the recently-moved RAID card (always a touch flaky in the heat of summer, I had moved it up next to the 3060 -- and LATER realized the case fan nearest it was frozen solid, never to turn again). The card had 'failed' a drive, and during the copy had spuriously marked another -- on the same side of the _0 half of the raid-50 -- as offline.
The system as a whole didn't realize until a full stripe of data had been written to the still-functioning half.
Hard powerdown, and after some time for it to cool, I booted a recsucesystem to examine my Linux half. And it was a mess. Thousands of inodes, links, with every error I've ever encountered, and after that, hundreds (maybe thousands, I gave up after a while) of files with inodes that, according to fsck, were being argued over by sometime half a dozen other files in far-reaching sections of the filesystem.
Not only is the data drive, where I keep downloaded things in addition to games I play, on the Windows half trashed and useless, everything not in a separate physical device is gone from Linux, and I'll never actually know what I lost.
I think I still have 85% on other machines -- I had plans to centralize, to call the migration complete and re-unify my data, but the need to dualboot had actually been the reason I held off.
Now it's that I have to install linux all over again. Mind you, now that I've finished the much, MUCH earlier job of slowly replacing each 3TB member of the array with a 4TB drive, I can make some good of this. Destroy the array, create a new one, gain 4TB in the process.
And start filling it, I guess. And maybe develop an actual backup plan since, as any sysadmin will tell you, "RAID" isn't enough protection by its lonesome.