From 03667e99797d6ec539e514c3f921667479279f60 Mon Sep 17 00:00:00 2001 From: Blake Leverington Date: Fri, 25 Jun 2021 11:21:40 +0200 Subject: [PATCH] Update 'computing_cluster' --- computing_cluster.md | 51 ++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/computing_cluster.md b/computing_cluster.md index b634bce..76e8f3d 100644 --- a/computing_cluster.md +++ b/computing_cluster.md @@ -47,10 +47,14 @@ Note: the LbEnv script works for both bash and tcsh. # Computing resources ## Interactive nodes. - - **sigma0 _(SLC6)_** - local "/work", relatively low in memory. So not for running any long tasks (can be killed by admin any time he see them and they disturb other people...) - - **d0new, d0bar-new _(SLC6)_** - general perpose interactive nodes. d0bar-new is also SciFi Web server. So "killing" it (running jobs which eat all RAM, etc.) can have administrative consequences... - - **delta _(CentOS7)_** - the only general perpose CentOS7 node we have at the moment. It is in testing phase, please report any problems. - - **lhcbi1 _(CentOS7)_** - this server is for special perpose. Please avoid using it in case you can (till you are sure it is there for you). +``` + lhcba1 (Ubuntu 20.04 / CentOS7) - general perpose modern AMD based interactive server + d0new (Ubuntu 20.04 / SLC6 / CentOS7) - general perpose older AMD based interactive server + delta (Ubuntu 18.04 / CentOS7) - general pervpose very old AMD based interactive server + lhcbi1 (CentOS7) - special perpose modern Intel based interactive server. Please avoid using it in case you can (till you are sure it is there for you). + d0bar-new (SLC6) - general perpose interactive node, also SciFi Web server at the moment. "Killing" it (running jobs which eat all RAM, etc.) can have administrative consequences... + sigma0 (SLC6) - local "/work", relatively low in memory. So not for running any long tasks (can be killed by admin any time he see them and they disturb other people...) +``` ## OS. to login into particular OS, use different ports (not all OSes exist on all nodes): @@ -61,24 +65,35 @@ to login into particular OS, use different ports (not all OSes exist on all node 28 - SLC6 30 - CentOS7 ``` -Note that default "CMTCONFIG" is historical, each user should specify what he/she is using (f.e. gcc incarnation) -### Batch system. -Most batch nodes are dead (in hardware). We have 4 servers which are working at the moment. +Note that default "CMTCONFIG" is historical, each user should specify what he/she is using (f.e. gcc incarnation). +CentOS7 support CVMFS based LHCb environment only. + + +### Batch system. + +SGE is deprecated + +Default configuration for HTCondor is deployed. Currently ~200 slots. Submit hosts are lhcba1, d0new and delta (Ubuntu). Note that jobs are running under Ubuntu (18.04 / 20.04) on hosts WITHOUT local CentOS7 environment. So at the moment it is usable with singularity containers only (sufficient for PAT group). + + +### Containers. +``` + singularity is supported from Ubuntu environment, you can add the path to it with: + + export PATH=/work/software/singularity/latest/`/work/software/os_version`/bin:$PATH +``` -### Custom tuned (very old) SGE. -Supports submission to particular OS incarnation, resource limitation, interactive jobs and has affinity protection against multi-thread jobs (which are not specified as such). Concrete recommendations how to use it (including binding to Ganga) theoretically can be written, but because of flexible features of SGE and rapid development of Ganga will be comparable in size with own SGE documentation. -See the ***[qsub Wiki Page](/computing/qsub)*** ### Best Practices: -1. store the code on /work, compile on delta, d0new (d0bar-new, not intensive compilations on sigma0, special on lhcbi1) -2. run tests: d0new, d0bar-new, delta -3. run jobs: batch system, d0new, d0bar-new -4. GANGA: sigma0, d0new, d0bar-new, delta (depends what it really does and where it runs jobs) -5. store big files: /auto/data (only), that in general also includes GRID produced files from GANGA. But having job Ganga directory on /auto/data is not a good idea. -6. if you need CentOS7 use delta (in special cases lhcbi1). If there are too many people on delta, please ask me to add one more node (that is possible). - - as with everything on "common" resources, it is ok to do things which are required and do not disturb other more then necessary. But first and more important, the consequences of any operation should be clear before starting the operation. - - So "do not do anything you do not understand". True for computers, conference systems, touching high voltage cables and working with radioactive materials... + 1. Compile: store the code on /work, compile on lhcba1, d0new, delta (d0bar-new, not intensive compilations on sigma0, special on lhcbi1) + 2. run tests: lhcba1, d0new, delta, d0bar-new + 3. run jobs: batch system, lhcba1, d0new, d0bar-new + 4. GANGA: lhcba1, d0new, d0bar-new, delta (depends what it really does and where it runs jobs) + 5. store big files: /auto/data (only), that in general also includes GRID produced files from GANGA. But having job Ganga directory on /auto/data is not a good idea. + 6. as with everything on "common" resources, it is ok to do things which are required and do not disturb other more then necessary. But first and more important, the consequences of any operation should be clear before starting the operation. + 7. So "do not do anything you do not understand". True for computers, conference systems, touching high voltage cables and working with radioactive materials... + # Environment Setup