Multiprocessing 1

I have been using python multiprocessing for quite a bit time, but it was all simple pool and I would like to understand deeper than just blindly using the multiprocessing package.

  1. given a list of word and start two process to print out the list in sequence
import multiprocessing
from multiprocessing import Pool, Queue, Process

def helper1(words, q1, q2):
   
    while 1:
        i = q1.get(timeout=1)
        if i >= len(words):
            q2.put(i+1)
            return
        print(words[i])
        q2.put(i+1)

def helper2(words, q1, q2):
   
    while 1:
        i = q2.get(timeout=1)
        if i >= len(words):
            q1.put(i+1)
            return

        print(words[i])
        q1.put(i+1)


def main():
   
    words = list(map(str, range(10)))
    q1 = Queue()
    q2 = Queue()
    q1.put(0)
    p1 = Process(target=helper1, args=(words, q1, q2), name='p1')
    p2 = Process(target=helper2, args=(words, q1, q2), name='p2')

    p1.start()
    p2.start()
    p1.join()
    p2.join()

The same functionality can also be achieved using pipe



import multiprocessing
from multiprocessing import Pool, Queue, Process, Pipe

def helper(words, pe, ps):
   
    while 1:
        i = pe.recv()
        if i >= len(words):
            ps.send(i+1)
            return
        print(words[i])
        ps.send(i+1)


def main():
   
    words = list(map(str, range(10)))
    p1e, p1s = Pipe(False)
    p2e, p2s = Pipe(False)
    
    p1s.send(0)
    p1 = Process(target=helper, args=(words, p1e, p2s), name='p1')
    p2 = Process(target=helper, args=(words, p2e, p1s), name='p2')

    p1.start()
    p2.start()
    p1.join()
    p2.join()


if __name__ == "__main__":
    main()

Google Compute Engine enable ssh using password

chown -R tbjc1magic:tbjc1magic .ssh
chmod 700 .ssh
chmod 600 .ssh/authorized_keys

sudo vi /etc/ssh/sshd_config
PasswordAuthentication yes
AllowUsers tbjc1magic

## also you need to reset the password of the user (tbjc1magic) ## but you will fail because of no permission sudo mount -rw -o remount / sudo passwd tbjc1magic

Find Nearest K points

Efficiently find nearest points KD tree

The basic idea is illustrated here

k-d Tree and Nearest Neighbor Search

though I don’t think the pruned areas are plotted correctly.

This algorithms are used for KNN

Similar tree structure quad-tree/octree are explained here

https://www.quora.com/What-is-the-difference-between-kd-tree-and-octree-Which-one-is-advantageous

 

SVM vs LR

SVM, LR typically give out similar results.

  1. LR is probabilistic, while SVM is non-probabilistic binary classifier (there are ways of get around of this)
  2. SVM is determined by support vectors (points lie between soft margin, this region size is determined by C/lambda), while RL is affected by all points. This is only true when no kernel is applied. Some ppl says SVM is less sensitive to outlier, while I also see opposite statement.
  3. Due to same above reason, linear SVM (no kernel) needs normalization,while LR does not. It is also likely SVM may have worse performance than LR due to the complex space distance measurement in high dimensional space.
  4. when applied with kernel tricks, it is found that SVM hold higher sparsity. Thus, SVM is better in computational complexity. (this is commonly brought up, but did not see why it is though).