Google Cloud Machine Learning

1. Create an Google Compute Engine Instance
2. Activate the Google Machine Learning API under this project
3. Create an Google Storage Bucket
4. Login to the Google Compute Engine, create a folder for the ML project, in my case I called it MLtest. Inside this folder, two basic configuration files are required.

a. config.yaml

It is important to set "runtimeVersion" to be the latest, otherwise, some functions may not be available.

b. setup.py

5. The file inside the folder shows the structure blow.

to run a training module LOCALLY, such as 8-output.py, you type

In order to submit the job to Google Cloud ML, the following command is required,

However, users will not be able to read and write directory to the cloud storage bucket. As this being said, below commands will NOT work.

File IO can to be handle through

Similarly, you can read a file through the same way.

6. To check the current status of jobs,

 

Anaconda virtual environment setup python 2, 3 and R

After install anaconda 3

 

Point In Time vs. Though The Cycle

Point In Time (PIT):  1 year period (short-term)

Through The Cycle (TTC): 5 year period observation (long-term)

The following article well explains why there are two terms used.

https://www.z-riskengine.com/media/1029/point-in-time-versus-through-the-cycle-ratings.pdf

In case the link broke in the original website, a copy is kept here.

 

 

Run Cobo with internal pulser

After decompressing the tarball:

- open a terminal and, from the config subdirectory, run getEccServer
- open a terminal and, from the data directory, run dataRouter (GetController must not be running as dataRouter uses the same port)
- open a terminal and, from the config subdirectory, run getEccClient
- from the prompt of getEccClient, execute "exec init_and_start.ecc"

Installing GET software

For Ubuntu 16.04 LTS, the DAQ software prerequisites can be fulfilled by simply typing
The software package can be found in
but the installation path needs to be changed to PREFIX=/usr, as /usr/local/bin is not the default path Ubuntu will look through.

 

The 20170928 is the release date, which you can find in https://dsm-trac.cea.fr/get/wiki/Releases.

 

VertexAnalyzor.py

The overall functions should be used in the sequence below

Each of the functions calls sub functions in the flow presented in the graph below

 

(old, replaced)

 

An example of the original image is shown below,

 

FilterBackground(image): as the image is messy, this function take the connected convexHull to clean up everything outside this hull.

image (numpy 2d uint8 array): the original image

GetEventPositions(pic,debug_mode=0):get all the three tip points and also the vertex point

pic (numpy 2d uint8 array): the original image
debug_mode (bool): plot some debug features, this should be turned off in batch mode

return (numpy array 3*2, (float)*2): three tip points, and the vertex point

GetEventPositions_(pic,debug_mode=0, center_width = 12, quadrant_thresh=100, center_thresh=300, err_thresh =12, spread_thresh=6 ): get all the three tip points and also the vertex point  (currently not used)

pic (numpy 2d uint8 array): the original image
debug_mode (bool): plot some debug features, this should be turned off in batch mode
center_width (float): not used for now
quadrant_thresh (float): threshold for number of pixel in the reaction product part
center_thresh (float): threshold for the beam part
err_thresh (float):  threshold for average distance to the fit
spread_thresh (float): threshold for x,y spread out

 

AveDist(x,y,k,b): calculate the average distance from (x,y) to a straight line with (k,b) parameters.

x (numpy float array): the x positions
y (numpy float array): the y positions
k (float): the slope of the line
b (float): the y-intercept of the line

return (float): average distance from (x,y) to the line (k,b)

r2(x,y,k,b): just to calculate the r2 score of the fitting

parameters are the same to function above

return (float): r2 score

 

VertexPos_(fits,y0): using all fitting results and the y position from the right most tip point to estimate the vertex position (currently not sued)

The function divide the calculation to 2 scenarios. 1. you have 2 or 3 fitted lines, then you just pick the parameters of the first two lines for the calculation. 2. if you have only 1 line and this one will not be you center line (because of previous fitting condition), you assume the center line is straight on y0.

fits ([int,float,float,float]*3): fit results for three parts of the image

return (float,float): (x,y)

VertexPos(image_,ps): estimate the vertex position

The function calculate the minimal distance for each pixel points to each of the three lines (semi-open rays from the vertex point to the direction of the tip point). If a pixel is on the closed end of the ray, a high penalty (1e5) will be given in distance calculation.

image_ (numpy 2d uint8 array): the filtered image

ps_ (numpy array 3x2): three positions of the tip points

return (float,float): (x,y)

tbjcfit(xs,ys): use SVD to calculate the least square DISTANCE (not y) fitting

xs (numpy float array): the x positions
ys (numpy float array): the y positions

return (int,float,float,float):(number of pixels, k,b,average distance)

 

GetFit(image_, part_thresh=60, err_thresh =1.2,spread_thresh=6): this function extract the x,y positions of each pixel above 0 value. Then fit the x,y points using tbjcfit. The results will be filtered through a few conditions to see if the fitting is good, like if the average distance from the points to the line is within the err_thresh and if the scattered positions does give a reasonable line shape distribution.

image_ (numpy 2d uint8 array): the part of the image you want to obtain a line fitting
part_thresh (float): if the number of pixels in the image is large enough for a fitting
err_thresh (float):the average distance to the fitting line of all the points
spread_thresh (float):the threshold for requiring a spread out distributed data on either x or y axis

return (numpy 2d uint8 array): a copy of filtered image

 

 

 

GetLineInfo(p1,p2, L_thre = -5): calculate the length and angle between two points

p1 (float,float): the tip point
p2 (float,float): the vertex point
L_thre = -5: not used for now

GetEventInfo(points,p0): calculate the length and angle for between each pair of the tip point and the vertex point

points ((float,float)*3): the positions of the tip points
p0 (float,float): the position of the vertex point

return ((float,float)*3,float): (theta,length) for each pair, and the reaction range

Distance(contours,n1,n2): calculate the minimum distance between two contour

contours [numpy.array (n,1,2)]:all the contours
n1 (int): index of the first contour
n2 (int): index of the second contour

return (float): the minimum distance

Groups(contours): combine all adjacent contours

contours [numpy.array (n,1,2)]:all the contours

return ((float)*n, (float)*n): grouped contours

convexHull(thresh, debug_mode = 0): calculated the convexHull using the largest grouped contour

thresh (numpy 2d uint8 array): image after preliminary processing

return ((float,float)*n): the hull points

MaxEnclosedTriangle(hull): calculate the maximum enclosed triangle using the hull points

hull ((float,float)*n): the hull points

return (int, int, int): index of the hull points to form the maximum enclosed triangle

TipFinder(thresh, debug_mode = 0): return the position of the tip on the maximum enclosed triangle

thresh (numpy 2d uint8 array): image after preliminary processing

return ((float,float)*3): the position of the tip on the maximum enclosed triangle

DataFactory.py

The file read in two SQLite databases, the data file and the map file.

In the ADCdf table, each of the entry is the signal trace for each of the 253 channel of the chamber.

The first preliminary filter is that for each signal trace, there are two threshold, 1. must be larger than the 20% of the larget amplitude; 2 must be larger than 20. Each of the time bin must be larger than both of the threshold.

A list of positions are extract from the traces (R,z), R is radial of the centroid of the pad , z is the number of the time bin.

The four quadrants are used to construct two images using each opposite pairs. The overlapping score (a overlapping on the edge (y=0 or 300) yield larger score than a overlapping in the center (y=150) ) is calculated to determine the direction of aligning the two images.

A enlarged reconstructed image is presented below

Besides that, as the image is fully contaminated by noisy data and disconnected points. The prepossessing steps consist GaussianBlur, threshold, erode and dilate.

__init__(self,data_path,map_path): initialization of DataFactory

data_path (str): the relative path to the data file
map_path (str): the relative path to the ATTPC map file

this module loads the ADC table into pandas spreadsheet. Then the function iterate through all channels to see at which time bin the signal amplitude is above threshold and store all the filtered signals into t3.

ConstructImage(self,EID):

EID (int): the EventID of the event for constructing the image

This function takes the spreadsheet t3 from __init__ function, and then produce an image using the positions for each "EventID".