After discussion with Mengxi Wu, the poor prediction rate may come from the reason that the inputs are not normalized. Mengxi also points out that the initialization of weights may need to be related to the size of the input for each hidden neuron. In http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network, it points out that the weights needs to be uniformly initialized between $$(-\frac{1}{\sqrt{d}},\frac{1}{\sqrt{d}})$$, where $$d$$ is the number of inputs to a given neuron.

The first step of modification is normalizing all inputs to its std equals 1.

1 2 |
std = df.std() df = df/std |

The training results is consistently above 96% and is obviously better than previous ones without normalization.

This comparison is repeated for hidden =[20,30,40,50,60,80,100], the prediction rate is more converged with larger numbers of hidden neurons. And the same plot is made without normalization.

Later on, we added weights initialization using Xaiver method. The results are also attached.

1 2 |
W1 = tf.get_variable("W1", shape=[FN,hidden],initializer=tf.contrib.layers.xavier_initializer()) W2 = tf.get_variable("W2", shape=[hidden,1],initializer=tf.contrib.layers.xavier_initializer()) |