[packman] BuildService API error when branching a package
Stefan Botter
jsj at jsj.dyndns.org
Mon Apr 27 17:49:52 CEST 2020
Hi Packmans,
I know, it is lame to self-reply, ... but anyhow ...
a tl;dr is at the end of the mail :)
Am Montag, 20. April 2020, 15:21:42 CEST schrieb Stefan Botter:
...
> I hope, I can give more insight in the next few days.
Now is "next few days".
What happened?
The initial problem arose during the evening hours of Apr 1st, when a
rather unusual blackout hit the part of town, where my servers are
hosted.
I have a UPS, but it supports for 8-10 minutes only, and the blackout
lasted 30 minutes. There should be emergency power by means of a diesel
generator (which by-the-way was scheduled to be replaced the following
weekend, but this is postponed due to COVID-19), but for unknown reason
the generator did not kick in. I could restart everything Thursday
morning.
A secondary problem surfaced, it affected the whole system badly, and I
have been rather clueless until today.
PMBS runs along my personal VMs as a VMware guest on my lab system (two
ESXi hosts). The lab is setup according to best practices, with two
network facing switches, and two separate switches for storage. The
storage device is a Synology DS620 with 4 1TB SSDs, connected via iSCSI.
Backup is done inside the storage network to a separate DS216+II, and
until Apr 10th was done by Synology's Advanced Backup for Business,
which basically does snapshots of the VMs, and copies the changed blocks
to the backup storage space.
Since the blackout every time backup ran, at least one of the ESXi hosts
froze or lost network connectivity.
Since Apr 15th PMBS is now backed up by simple means of rsync, there is
one backup copy created daily. This does not seem to put such a heavy
strain on the network.
I am still contemplating a versioned backup with rdiff-backup, which I
use regularly with my other machines, but I am not sure, if my available
backup space will be sufficient, and how long backup runs take on PMBS.
So this is on the "maybe-ToDo-list".
Still I did not know the cause of the lock-ups.
By chance I discovered an almost similar behavior with network
interruptions early last week, when upon a download of a VM image to my
home system network connectivity was lost. It recovered automagically
after 10-30 minutes, and was reproducible.
Over the course of the weekend and today I managed to investigate
further, and found that one of the network add-in cards in one of the
servers acted strangely under load. I reconfigured the ESXi servers to
use the lan-on-mainboard (LOM) adapters only, and am now more convinced,
that the system runs stable again.
I have some spare quad-port cards lying around, and will replace the
thought-to-be-defective adapters some time in the future, to have the
lab again conforming to best practices, but for now everything should
work without frequent interruptions.
As the world-wide COVID-19 calamity and the now emergency-emerging ;)
changes to schooling environment is putting a heavy demand for immediate
action by the school's IT, I have been having rather few time to work on
"personal fun", it took a while longer to resolve the branching issue,
which caused this thread. The cause of the reported errors were based on
the frequent unwanted shutdowns, which left some state-recording files
for sourceserver and schedulers with binary garbage at the end.
I thought it was a good idea to document the events and sort-of-
solution, for you to enjoy, and me to remember, as I will probably
forget what happened and what I did in a few weeks :)
tl;dr: everything should work again without frequent interruptions.
Greetings,
Stefan
--
Stefan Botter zu Hause
Bremen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.links2linux.de/pipermail/packman/attachments/20200427/8734a356/attachment.sig>
More information about the Packman
mailing list