Update 'computing_cluster'

Blake Leverington 2021-06-25 11:21:40 +02:00
parent 5e5e22b711
commit 03667e9979

@ -47,10 +47,14 @@ Note: the LbEnv script works for both bash and tcsh.
# Computing resources # Computing resources
## Interactive nodes. ## Interactive nodes.
- **sigma0 _(SLC6)_** - local "/work", relatively low in memory. So not for running any long tasks (can be killed by admin any time he see them and they disturb other people...) ```
- **d0new, d0bar-new _(SLC6)_** - general perpose interactive nodes. d0bar-new is also SciFi Web server. So "killing" it (running jobs which eat all RAM, etc.) can have administrative consequences... lhcba1 (Ubuntu 20.04 / CentOS7) - general perpose modern AMD based interactive server
- **delta _(CentOS7)_** - the only general perpose CentOS7 node we have at the moment. It is in testing phase, please report any problems. d0new (Ubuntu 20.04 / SLC6 / CentOS7) - general perpose older AMD based interactive server
- **lhcbi1 _(CentOS7)_** - this server is for special perpose. Please avoid using it in case you can (till you are sure it is there for you). delta (Ubuntu 18.04 / CentOS7) - general pervpose very old AMD based interactive server
lhcbi1 (CentOS7) - special perpose modern Intel based interactive server. Please avoid using it in case you can (till you are sure it is there for you).
d0bar-new (SLC6) - general perpose interactive node, also SciFi Web server at the moment. "Killing" it (running jobs which eat all RAM, etc.) can have administrative consequences...
sigma0 (SLC6) - local "/work", relatively low in memory. So not for running any long tasks (can be killed by admin any time he see them and they disturb other people...)
```
## OS. ## OS.
to login into particular OS, use different ports (not all OSes exist on all nodes): to login into particular OS, use different ports (not all OSes exist on all nodes):
@ -61,24 +65,35 @@ to login into particular OS, use different ports (not all OSes exist on all node
28 - SLC6 28 - SLC6
30 - CentOS7 30 - CentOS7
``` ```
Note that default "CMTCONFIG" is historical, each user should specify what he/she is using (f.e. gcc incarnation) Note that default "CMTCONFIG" is historical, each user should specify what he/she is using (f.e. gcc incarnation).
### Batch system. CentOS7 support CVMFS based LHCb environment only.
Most batch nodes are dead (in hardware). We have 4 servers which are working at the moment.
### Batch system.
SGE is deprecated
Default configuration for HTCondor is deployed. Currently ~200 slots. Submit hosts are lhcba1, d0new and delta (Ubuntu). Note that jobs are running under Ubuntu (18.04 / 20.04) on hosts WITHOUT local CentOS7 environment. So at the moment it is usable with singularity containers only (sufficient for PAT group).
### Containers.
```
singularity is supported from Ubuntu environment, you can add the path to it with:
export PATH=/work/software/singularity/latest/`/work/software/os_version`/bin:$PATH
```
### Custom tuned (very old) SGE.
Supports submission to particular OS incarnation, resource limitation, interactive jobs and has affinity protection against multi-thread jobs (which are not specified as such). Concrete recommendations how to use it (including binding to Ganga) theoretically can be written, but because of flexible features of SGE and rapid development of Ganga will be comparable in size with own SGE documentation.
See the ***[qsub Wiki Page](/computing/qsub)***
### Best Practices: ### Best Practices:
1. store the code on /work, compile on delta, d0new (d0bar-new, not intensive compilations on sigma0, special on lhcbi1)
2. run tests: d0new, d0bar-new, delta
3. run jobs: batch system, d0new, d0bar-new
4. GANGA: sigma0, d0new, d0bar-new, delta (depends what it really does and where it runs jobs)
5. store big files: /auto/data (only), that in general also includes GRID produced files from GANGA. But having job Ganga directory on /auto/data is not a good idea.
6. if you need CentOS7 use delta (in special cases lhcbi1). If there are too many people on delta, please ask me to add one more node (that is possible).
- as with everything on "common" resources, it is ok to do things which are required and do not disturb other more then necessary. But first and more important, the consequences of any operation should be clear before starting the operation. 1. Compile: store the code on /work, compile on lhcba1, d0new, delta (d0bar-new, not intensive compilations on sigma0, special on lhcbi1)
- So "do not do anything you do not understand". True for computers, conference systems, touching high voltage cables and working with radioactive materials... 2. run tests: lhcba1, d0new, delta, d0bar-new
3. run jobs: batch system, lhcba1, d0new, d0bar-new
4. GANGA: lhcba1, d0new, d0bar-new, delta (depends what it really does and where it runs jobs)
5. store big files: /auto/data (only), that in general also includes GRID produced files from GANGA. But having job Ganga directory on /auto/data is not a good idea.
6. as with everything on "common" resources, it is ok to do things which are required and do not disturb other more then necessary. But first and more important, the consequences of any operation should be clear before starting the operation.
7. So "do not do anything you do not understand". True for computers, conference systems, touching high voltage cables and working with radioactive materials...
# Environment Setup # Environment Setup