## Use neuronetwork(tensorflow) for regression 2

Alright, I also try to see what if only features (x) are normalized, y range in previous formula is too close to 1. So I change the formula to be

Then the calculation is repeated for the ones with and without normalized y’s. The results and conclusions are obvious. Yes, you definitely want to do that.  ## Use neuronetwork(tensorflow) for regression 1

After discussion with Mengxi Wu, the poor prediction rate may come from the reason that the inputs are not normalized. Mengxi also points out that the initialization of weights may need to be related to the size of the input for each hidden neuron. In http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network, it points out that the weights needs to be uniformly initialized between $$(-\frac{1}{\sqrt{d}},\frac{1}{\sqrt{d}})$$, where $$d$$ is the number of inputs to a given neuron.

The first step of modification is normalizing all inputs to its std equals 1.

The training results is consistently above 96% and is obviously better than previous ones without normalization. This comparison is repeated for hidden =[20,30,40,50,60,80,100], the prediction rate is more converged with larger numbers of hidden neurons. And the same plot is made without normalization.  Later on, we added weights initialization using Xaiver method. The results are also attached. ## Use neuronetwork(tensorflow) for regression 0

This post is not a tutorial, but rather a logbook of what we attempted.

The learning logbook starts with using neutonetwork to do regression.

The data is manually generate using a very simple formula.

$$y=3*x_0+sin(x_1)$$, initially, we do not add any noise term.

accuracy function is below

In the first attempt, a single intermediate layer neuronetwork is applied.

It is observed that, with 60 hidden neurons and 25000 training steps, the prediction accuracy can be largely fluctuated according to the initialization value.

## Tensorflow strides

strides determines how much the window shifts by in each of the dimensions. The first (shift in the batch) and last (shift in the depth) stride needs to be 1

input size [batch, in_height, in_width, in_channels]
filter size [filter_height, filter_width, in_channels, out_channels]
stride size [batch_shift, width_shift, height_shift, color_shift]

https://www.tensorflow.org/api_docs/python/tf/nn/conv2d

## Softmax vs softmax_cross_entropy_with_logits

cross entropy means $$H =-\sum_i y_i’ log(y_i)$$

is equivalent to

However, Softmax_cross_entropy_with_logits is somehow numerically more stable.

## ReLU vs Sigmoid vs Softmax

ReLU: Rectified Linear Unit;  $$y = max(0,x)$$

Sigmoid: $$y(x)= \frac{1}{1+e^{-x}}$$

Softmax: $$y(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K} e^{z_k}}$$ for $$j=1,2…,K$$

ReLU and Sigmoid are used for hidden layers. Softmax used for the last layer and normalized the groups.