# How to index, slice, modify, and delete data points

Very usually, we need to select, modify and even delete the data of certain shots. Here there is a short introduction on how to do that.

Let us start with importing the supporting packages and loading the data. In order to keep things ealier, here we will import all the packages, but not all of them will be us

## Load some example data

### Import supporting packages

In [1]:
# Set the system path for importing packages
# This is just because I put all example scripts in another folder
# You DO NOT need to do this 
# -------------- You do NOT need following part --------------
import sys
import os
sys.path.insert(0, os.path.abspath('..'))
# -------------- You do NOT need above part --------------

import copy
import glob
from datetime import datetime

# The package for data structure
import xarray as xr
import pandas as pd
import numpy as np

# The packages for working with uncertainties
from uncertainties import ufloat
from uncertainties import unumpy as unp
from uncertainties import umath

# The package for plotting
import matplotlib.pyplot as plt
plt.rcParams['font.size'] = 18 # Set the global font size

# -------------- The modules written by us --------------

# The packages for read data
from DataContainer.ReadData import read_hdf5_file, read_hdf5_global, read_hdf5_run_time, read_csv_file

# The packages for data analysis
from Analyser.ImagingAnalyser import ImageAnalyser
from Analyser.FitAnalyser import FitAnalyser
from Analyser.FitAnalyser import ThomasFermi2dModel, DensityProfileBEC2dModel, Polylog22dModel
from Analyser.FFTAnalyser import fft, ifft, fft_nutou
from ToolFunction.ToolFunction import *

# Add errorbar plot to xarray package
from ToolFunction.HomeMadeXarrayFunction import errorbar, dataarray_plot_errorbar
xr.plot.dataarray_plot.errorbar = errorbar
xr.plot.accessor.DataArrayPlotAccessor.errorbar = dataarray_plot_errorbar

### Start a client for parallel computing

In [2]:
from dask.distributed import Client
client = Client(n_workers=6, threads_per_worker=10, processes=True, memory_limit='10GB')
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 6
Total threads: 60,Total memory: 55.88 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:62125,Workers: 6
Dashboard: http://127.0.0.1:8787/status,Total threads: 60
Started: Just now,Total memory: 55.88 GiB

0,1
Comm: tcp://127.0.0.1:62161,Total threads: 10
Dashboard: http://127.0.0.1:62163/status,Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:62128,
Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-doscq09y,Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-doscq09y

0,1
Comm: tcp://127.0.0.1:62167,Total threads: 10
Dashboard: http://127.0.0.1:62168/status,Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:62129,
Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-7b8nhgmg,Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-7b8nhgmg

0,1
Comm: tcp://127.0.0.1:62156,Total threads: 10
Dashboard: http://127.0.0.1:62159/status,Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:62130,
Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-y1tkfpfy,Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-y1tkfpfy

0,1
Comm: tcp://127.0.0.1:62162,Total threads: 10
Dashboard: http://127.0.0.1:62165/status,Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:62131,
Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-613gqycm,Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-613gqycm

0,1
Comm: tcp://127.0.0.1:62150,Total threads: 10
Dashboard: http://127.0.0.1:62153/status,Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:62132,
Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-nf2w0fki,Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-nf2w0fki

0,1
Comm: tcp://127.0.0.1:62155,Total threads: 10
Dashboard: http://127.0.0.1:62157/status,Memory: 9.31 GiB
Nanny: tcp://127.0.0.1:62133,
Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-xm8jsrrg,Local directory: C:\Users\data\AppData\Local\Temp\dask-worker-space\worker-xm8jsrrg


### Set the path for different cameras

In [3]:
groupList = [
    "images/MOT_3D_Camera/in_situ_absorption",
    "images/ODT_1_Axis_Camera/in_situ_absorption",
    "images/ODT_2_Axis_Camera/in_situ_absorption",
]

# give a short name to each path (or let's say each camera)
dskey = {
    "images/MOT_3D_Camera/in_situ_absorption": "camera_0",
    "images/ODT_1_Axis_Camera/in_situ_absorption": "camera_1",
    "images/ODT_2_Axis_Camera/in_situ_absorption": "camera_2",
}

### Set global path for experiment

In [4]:
img_dir = '//DyLabNAS/Data/'
SequenceName = "Evaporative_Cooling" + "/"
folderPath = img_dir + SequenceName + '2023/04/17'# get_date()

### Load shot 0058

In [5]:
shotNum = "0058"
filePath = folderPath + "/" + shotNum + "/*.h5"

dataSetDict = {
    dskey[groupList[i]]: read_hdf5_file(filePath, groupList[i])
    for i in [0] # range(len(groupList)) # uncommont to load data for all three cameras
}
dataSet = dataSetDict["camera_0"]

dataSet = swap_xy(dataSet)

scanAxis = get_scanAxis(dataSet)

dataSet = auto_rechunk(dataSet)

dataSet.load()

## Index and select data

Since we use the xarray package, in fact, it already has a well developed function to index and select the data. Therefore, have a look of the following link

https://docs.xarray.dev/en/stable/user-guide/indexing.html

## Modify the data

There are two ways to select an element in xarray and modify the value of it.

### Select by value

The first method is to select it by the value of coordiante.

Here is an emaxple of selecting the element in 'ShotNum' at 'runs'=0, truncation_value=0.8, and change it to '-1'.

In [6]:
dataSet.shotNum.loc[
        {
            'runs': 0,
            'truncation_value': 0.8,
        }
    ] = '-1'

# Or wirte it in short
dataSet.shotNum.loc[0, 0.8] = '-1'

dataSet.shotNum

### Select by index

The second method is to select it by index.

Here is an emaxple of selecting the element in 'ShotNum' at (0, 1), and change it to '-2'.

In [7]:
dataSet.shotNum[0, 1] = '-2'
dataSet.shotNum

## Remove the data

### Simply remove element or variables

For simply remove elements or variables in xarray, please read the introduction and examples from xarray package in the following link

    https://docs.xarray.dev/en/stable/generated/xarray.Dataset.drop_vars.html
    https://docs.xarray.dev/en/latest/generated/xarray.Dataset.drop_sel.html
    https://docs.xarray.dev/en/latest/generated/xarray.Dataset.drop_isel.html

### Remove bad shot

However, if one wants to remove the bad shot, it is highly recommended to set the value to 'np.nan', instead of directly delete it.

In order to do that, except the two ways we demenstrate in the section 'Modify the data', we also implent a function to do it.

In [8]:
data = copy.deepcopy(dataSet.atoms)
# we first need to change the dtype from unsign 16-bit integer (uint16) to float, 
# since the uint16 doesn't support to store nan value.
data = data.astype(float)
data = remove_bad_shots(data, runs=0, truncation_value=0.8)
data