Add 'computing_htcondor'

Blake Leverington 2021-07-28 09:18:58 +02:00
parent 718be34255
commit 7c14429048

36
computing_htcondor.md Normal file

@ -0,0 +1,36 @@
*Email from Alexey:*
Dear colleagues,
I have achieved what I have planed as default behavior in our batch system (HT Condor), when submission is done from CentOS7 container (currently only from lhcba1 port 30).
`ssh -p 30 lhcba1.physi.uni-heidelberg.de`
Without any extra flags in the configuration, jobs shoud run under
CentOS7 (local container), after login scripts applied and in the current (at the time of submission) directory. Also jobs will run on "LHCb software compatible"
servers only.
There are currently 3 servers with 90 slots in total which support that model. There will be one more with 16 slots.
4 other servers are interactive at the moment, lhcba1, d0new, lhcbi1 and not updated yet d0bar-new. They can be added for batch processing (also time limited, f.e. at nights and
weekends) but there is no such plans at the moment.
All other servers (many...) are "old". They will be updated to support mentioned submission, but they can fail to run particular versions of LHCb software.
Simple test can be started from lhcba1 port 30, with command
`condor_submit -interactive`
An example of submission file is in
/auto/work/zhelezov/singularity/batch_centos7. Do not forget to start job submission from directory into which you can write, otherwise log files can not be written and your jobs will be in "on hold" state forever.
I still propose to use Singularity based approach when possible, demonstrated in /auto/work/zhelezov/singularity/FCNCfitter.
That allows to use SLC6 / CentOS8 / etc. without local installation on all servers.
While not really checked, I believe the environment closely mimics current CERN/DESY HTCondor. Note that defaults are conservative (everywhere) in reserved resources (1 core, 512MB RAM).
It is better specify required resourced explicitly (as documented in general HTCondor manual).
For the moment there is no multi-core slots and up to 8GB RAM per slot.
Jobs with higher requirements will find no working nodes. Please let me know if you hit the problem.
Regards,
Alexey.