Add 'computing_cluster'
parent
36d368988d
commit
17f85a9ebf
120
computing_cluster.md
Normal file
120
computing_cluster.md
Normal file
@ -0,0 +1,120 @@
|
|||||||
|
# The D0 Cluster
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
d0 cluster got his name from the first node, "d0". It was deployed during 2006. The original hardware has died, but was replaced with equivalent. It is still hosting login scripts, cluster monitoring, the batch system and GuestNet. But since it is less powerful then a modern phone and still runs SLC4 OS, it make no sense to use it interactively.
|
||||||
|
|
||||||
|
We plan to restructure the whole cluster. Work is in progress.
|
||||||
|
|
||||||
|
# Storage
|
||||||
|
## "/work" storage.
|
||||||
|
Relatively high performance 22 TB. With daily backups. Physically it is inside "sigma0" node, so all data related operations (copy, backup, delete, etc.) is better do interactive on "sigma0".
|
||||||
|
Please do not use this space for data sets greater then O(100 GB). By dividing the size by the number of user, it is clear we can not provide several TB for each on that storage.
|
||||||
|
Note that sigma0 is an old server. With 32 GB RAM it is not a good option for intensive parallel compilation, analysis, etc.
|
||||||
|
## "/auto/data" storage.
|
||||||
|
Slow distributed 260TB storage for big files. No backups. Physically it is on several servers, so there is no particular node from which the access is faster.
|
||||||
|
|
||||||
|
If you need place for huge data sets, backups, etc., that is correct place. Note that working with many small files is slow, so having your GANGA directory there is not the best option (better put it on /work, but configure to put data files on /auto/data or at least clean GANGA regulary, moving results when required).
|
||||||
|
CVMFS.
|
||||||
|
|
||||||
|
# Login
|
||||||
|
- To automatically start LHCb login scripts from CVMFS
|
||||||
|
the file ".cvmfs" shoud be created in the home directory. If you need particular resource (other then lhcb.cern.ch) please contact me.
|
||||||
|
|
||||||
|
- To not automatically start this:
|
||||||
|
For a supported OS with a working /cvmfs mount (e.g. lxplus.cern.ch or lxplus7.cern.ch), you have to disable the default login environment by creating the file ~/.nogrouplogin:
|
||||||
|
```
|
||||||
|
touch ~/.nogrouplogin
|
||||||
|
```
|
||||||
|
and log in again to the machine.
|
||||||
|
|
||||||
|
Then set the environment up:
|
||||||
|
```
|
||||||
|
source /cvmfs/lhcb.cern.ch/lib/LbEnv
|
||||||
|
```
|
||||||
|
or, to test the upcoming new features:
|
||||||
|
```
|
||||||
|
source /cvmfs/lhcb.cern.ch/lib/LbEnv-dev
|
||||||
|
```
|
||||||
|
Note: the LbEnv script works for both bash and tcsh.
|
||||||
|
|
||||||
|
# Computing resources
|
||||||
|
## Interactive nodes.
|
||||||
|
|
||||||
|
- **sigma0 _(SLC6)_** - local "/work", relatively low in memory. So not for running any long tasks (can be killed by admin any time he see them and they disturb other people...)
|
||||||
|
- **d0new, d0bar-new _(SLC6)_** - general perpose interactive nodes. d0bar-new is also SciFi Web server. So "killing" it (running jobs which eat all RAM, etc.) can have administrative consequences...
|
||||||
|
- **delta _(CentOS7)_** - the only general perpose CentOS7 node we have at the moment. It is in testing phase, please report any problems.
|
||||||
|
- **lhcbi1 _(CentOS7)_** - this server is for special perpose. Please avoid using it in case you can (till you are sure it is there for you).
|
||||||
|
|
||||||
|
## OS.
|
||||||
|
to login into particular OS, use different ports (not all OSes exist on all nodes):
|
||||||
|
```
|
||||||
|
22 (default)- SLC4
|
||||||
|
24 - host Ubuntu
|
||||||
|
26 - SLC5
|
||||||
|
28 - SLC6
|
||||||
|
30 - CentOS7
|
||||||
|
```
|
||||||
|
Note that default "CMTCONFIG" is historical, each user should specify what he/she is using (f.e. gcc incarnation)
|
||||||
|
### Batch system.
|
||||||
|
Most batch nodes are dead (in hardware). We have 4 servers which are working at the moment.
|
||||||
|
|
||||||
|
### Custom tuned (very old) SGE.
|
||||||
|
Supports submission to particular OS incarnation, resource limitation, interactive jobs and has affinity protection against multi-thread jobs (which are not specified as such). Concrete recommendations how to use it (including binding to Ganga) theoretically can be written, but because of flexible features of SGE and rapid development of Ganga will be comparable in size with own SGE documentation.
|
||||||
|
See the ***[qsub Wiki Page](/computing/qsub)***
|
||||||
|
|
||||||
|
### Best Practices:
|
||||||
|
1. store the code on /work, compile on delta, d0new (d0bar-new, not intensive compilations on sigma0, special on lhcbi1)
|
||||||
|
2. run tests: d0new, d0bar-new, delta
|
||||||
|
3. run jobs: batch system, d0new, d0bar-new
|
||||||
|
4. GANGA: sigma0, d0new, d0bar-new, delta (depends what it really does and where it runs jobs)
|
||||||
|
5. store big files: /auto/data (only), that in general also includes GRID produced files from GANGA. But having job Ganga directory on /auto/data is not a good idea.
|
||||||
|
6. if you need CentOS7 use delta (in special cases lhcbi1). If there are too many people on delta, please ask me to add one more node (that is possible).
|
||||||
|
|
||||||
|
- as with everything on "common" resources, it is ok to do things which are required and do not disturb other more then necessary. But first and more important, the consequences of any operation should be clear before starting the operation.
|
||||||
|
- So "do not do anything you do not understand". True for computers, conference systems, touching high voltage cables and working with radioactive materials...
|
||||||
|
|
||||||
|
# Environment Setup
|
||||||
|
|
||||||
|
- The LHCb User Environment: https://twiki.cern.ch/twiki/bin/view/LHCb/LbEnv
|
||||||
|
- Tools for the LHCb Software Environment: https://twiki.cern.ch/twiki/bin/view/LHCb/SoftwareEnvTools#Runtime
|
||||||
|
|
||||||
|
See what is already defined in your environment:
|
||||||
|
```sh
|
||||||
|
env
|
||||||
|
```
|
||||||
|
|
||||||
|
Basic environment which will give you access to the python-based _lb-XXXX_ env-tools:
|
||||||
|
```sh
|
||||||
|
source /cvmfs/lhcb.cern.ch/lib/LbEnv
|
||||||
|
```
|
||||||
|
|
||||||
|
Check which platforms <arch-os-complier> are available on the server you logged in to:
|
||||||
|
```sh
|
||||||
|
lb-describe-platform
|
||||||
|
```
|
||||||
|
Most recent available platforms: (last checked 26.02.2021)
|
||||||
|
- sigma0: x86_64-slc6-gcc49-opt
|
||||||
|
- delta: x86_64-centos7-gcc49-opt, x86_64-slc6-gcc49-opt
|
||||||
|
- d0new: x86_64-slc6-gcc8-opt
|
||||||
|
- d0bar-new: x86_64-slc6-gcc8-opt
|
||||||
|
- lhcbi1: x86_64-centos7-gcc9+py3-opt, x86_64-centos7-clang10-opt
|
||||||
|
- *new server to be delivered soon (Intel). A lack of AMD support is part of the reason for the lack of recent platform availability.
|
||||||
|
|
||||||
|
You can produce a list of the installed Projects and their version names and compatible platforms.
|
||||||
|
```sh
|
||||||
|
lb-export-project-info out.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
You can run an executable with the Project environment:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
lb-run -c <platform> <Project>/<version> executable
|
||||||
|
```
|
||||||
|
|
||||||
|
You can create a shell in the terminal with the Project environment by using _bash_ as the executable.
|
||||||
|
For example, on sigma0:
|
||||||
|
```sh
|
||||||
|
lb-run -c x86_64-slc6-gcc49-opt Urania/v7r0 bash
|
||||||
|
```
|
||||||
|
will provide a nominal environment for compiling and running C++ code (gcc) with ROOT.
|
Loading…
Reference in New Issue
Block a user