*keep checking this blog post as I will always be updating this post.(I will also write their answers soon)*

**Analytics and Consulting firms**

- Explain logistic regression? why do we use it? Assumptions of linear regression
- Clustering questions- How do you choose between K means and Hierarchical clustering?
- Explain ROC curve, Precision- Recall.
- What do you mean by p-value(My favorite question. Most people don’t know answer to this question)
- Explain the steps in a data science project.
- Difference between machine learning and statistical modeling.
- Explain me logistic regression in LAYMEN TERMS. (Without using technical words)
- What is the correlation? Is it bad or good?
- What do you mean by data science.(Another fav)
- Types of join.(Must)
- What is R square?
- What is random forest?
- Explain any algorithm end to end.(Most often logistic regression and decision tree)
- Whats the most challenging project you have done? How did you overcome ?
- Explain Central limit theorem.

]]>

Deep-learning networks are distinguished from the more commonplace single-hidden-layer neural networks by their depth; that is, the number of node layers through which data is passed in a multistep process of pattern recognition. Three or more including input and output is deep learning. Anything less is simply machine learning.

Deep learning is motivated by intuition, theoretical arguments from circuit theory, empirical results, and current knowledge of neuroscience.

- The main concept in deep learning algorithms is automating the extraction of representations (abstractions) from the data.

- A key concept underlying Deep Learning methods is distributed representations of the data, in which a large number of possible configurations of the abstract features of the input data is feasible, allowing for a compact representation of each sample and leading to a richer generalization.

- Deep learning algorithms lead to abstract representations because more abstract representations are often constructed based on less abstract ones.An important advantage of more abstract representations is that they can be invariant to the local changes in the input data.

- Deep learning algorithms are actually Deep architectures of consecutive layers.

- Stacking up the nonlinear transformation layers is the basic idea in deep learning algorithms.

- It is important to note that the transformations in the layers of deep architecture are non-linear transformations which try to extract underlying explanatory factors in the data.

- The final representation of data constructed by the deep learning algorithm (output of the final layer) provides useful information from the data which can be used as features in building classifiers, or even can be used for data indexing and other applications which are more efficient when using abstract representations of data rather than high dimensional sensory data.

Let’s understand in layman’s terms-

Imagine you’re building a shopping recommendation engine, and you discover that if an item is trending *and *a user has browsed the category of that item in the last day, they are very likely to buy the trending item.

These two variables are so accurate together that you can combine them into a new single variable, or **feature **(Call it “interested_in_trending_category”, for example).

Finding connections between variables and packaging them into a new single variable is called **feature engineering**

Deep learning is *automated *feature engineering.

References:

1. http://stats.stackexchange.com/

2. http://stackoverflow.com/

3. https://www.quora.com/

]]>

I was curious to check deep learning performance on my laptop which has GeForce GT 940M GPU.

Today I will walk you through how to set up GPU based deep learning machine to make use of GPUs. I have used Tensorflow for deep learning on a windows system. Using GPU in windows system is really a pain. You can’t get it to work if you don’t follow correct steps. But if you follow the steps it will be very easy to set up Tensorflow with GPU for windows.

**Requirement:**

- Python 3.5 – Currently Tensorflow on windows doesn’t support python 2.7.
- nvidia cuda GPU

**Installation:**

**CUDA toolkit**Use this link to install cuda- https://developer.nvidia.com/cuda-downloads

According to your windows version, you can install this toolkit.

Recommended version:*Cuda Toolkit 8.0*- cuDNN

Use this link to install cuDNN -https://developer.nvidia.com/cudnn

You need to register to install this. You need to choose*cuDNN v5.1*. I have tried latest version but it didn’t work out.After downloading, You need to copy and replace these files

into this location*C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0*Now you also need to set path for environment variables. Check below snapshots and make the required changes. If they are not there you have to do it manually.

- Python

Install using anaconda . Use whatever anaconda python 2.7 or 3.5 you want to use for your daily tasks because we will create a separate environment for python 3.5 . - Tensorflow with GPU

Create a virtual environment for tensorflow`conda create --name tensorflow-gpu python=3.5`

Then activate this virtual environment:

`activate tensorflow-gpu`

And finally, install TensorFlow with GPU support:

`pip install tensorflow-gpu`

Test the TensorFlow installation

`python ... >>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() >>> print(sess.run(hello)) Hello, TensorFlow!`

If you run into any error check below link-

**https://www.tensorflow.org/install/install_windows**

Any other link might lead you to different problems. **Let’s play with Tensorflow GPU**Let’s check performance on MNIST data using convolution neural network.

download the code- https://github.com/tensorflow/models/blob/master/tutorials/image/mnist/convolutional.pyNow lets run it and check its performance**GPU based Tensorflow**

We can see each step is taking roughly around ~40 ms. Now we want to see if this gpu performanvce worth or not.

**CPU Tensorflow**

Let’s take a look at CPU performance. Really? Each step is taking ~370 ms . **Wow what a performance!! Tensorflow with GPU is 10x faster than Tensorflow with CPU.**

**Next steps:**

Further, You can install Keras library to do more advance things in deep learning. Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation.Keras uses Tensorflow as backend. Keras also work seamlessly on CPU and GPU. Follow below commands. Install jupyter notebook too if you love working with notebooks.

```
conda install jupyter
conda install scipy pandas
conda install mingw libpython (theano dependencies)
conda install theano
pip install keras
```

In case of any trouble, leave comments and let me know your thoughts about this articles.

Happy hunting with deep learning !!

]]>

]]>

we will create bar chart, histograms, scatter-plot, box-plots etc. We will also check association between two variables. Let’s start.

]]>

Derived variables will help to understand more about them. For example, We have derived variable

How do we decide these value of cutoff points?

This is answered by the business people or you have to explore the data to divide them into different buckets.

If you have any query, let me know in comment section.

]]>I have been asked many times how can I become data scientist or what should I do to become a data scientist. Here I have written about some technical skills that are required to be a data scientist.

https://d4datascience.wordpress.com/2016/02/13/how-to-make-career-into-data-science/

But above article was lacking one thing, and **Monica Rogati** has summed it up perfectly.

https://www.quora.com/How-can-I-become-a-data-scientist-1/answer/Monica-Rogati

This advice from Monica Rogati is a must read for all beginners who want to make into data science. In one sentence, she says

**Do a project you care about. Make it good and share it.**

Now the question is how to choose a data science project?

So *I thought how about I can give you some ideas about the projects and how to start a project and finish it. I am starting a series of blog posts in which step by step I will show you the journey of a data science project.*

I am sharing one of my project which I have undertaken during a course. This is a basic project and read this article to understand the approach that you will follow in any project. You will figure out why hypothesis generation is necessary. You will also get to know that why people say every good data analysis starts with a question. You will understand what type of questions data science can answer. You can choose any project topic as long as you have relevant data available.

My research Project Topic was **How life expectancy is related to social, economic, environmental factors.**

The dataset I have downloaded is from Gapminder http://www.gapminder.org/data/. Details of the data were given in codebook.

This is gapminder data’s codebook link

https://www.dropbox.com/s/lfzmkvnwb5r84ff/GapMinder%20Codebook%20.pdf?dl=0

After looking through the codebook for the * gap minder* study, I have decided that I am particularly interested in

I want to study the life expectancy behavior with

- alcconsumption(
*alcohol consumption*) - co2emissions(
*cumulative CO2 emission*) - oilperperson(
*oil Consumption*) - incomeperperson

I have performed a literature review to see what the work has been done on this topic so that I can bring out some new research. I have taken references from below research:

- Life expectancy of people with intellectual disability:

a 35-year follow-up study

*K. Patja,1 M. Iivanainen,1 H.Vesala,2 H. Oksanen3 & I. Ruoppila4* - Income distribution and life expectancy

*R G Wilkinson* - Changes in life expectancy in Russia in the mid-1990s

*Vladimir Shkolnikov, PhD,Prof Martin McKee,Prof David A Leon, PhD* - Drinking Pattern and Mortality:

The Italian Risk Factor and Life Expectancy Pooling Project

- After going through those literature I generated
that I believe, people who are having better income, less alcohol consumption and better environment conditions are having better life expectancy. So**hypothesis**.**These factors are positively correlated with life expectancy according to my hypothesis**

We would like to find out what is the association between life expectancy and social,economic and environmental factors. We would like to address several questions i.e.

- Does alcohol consumption and oil consumption determine life

expectancy? - Does better income means better life expectancy?
- Is there causal relationship ?
- If there is more CO2 emission then would it means lower life

expectancy?

The Variables that we will consider for the hypothesis are

- incomeperperson (
*Gross Domestic Product per capita in constant 2000 US$)* - alcconsumption
*(alcohol consumption per adult (age 15+), litres*

Recorded and estimated average alcohol consumption, adult (15+) per

capita consumption in litres pure alcohol) - co2emissions
*(cumulative CO2 emission (metric tons), Total amount of CO2 emission in metric tons since 1751)* - lifeexpectancy
*(life expectancy at birth (years)*

The average number of years a newborn child would live if current

mortality patterns were to stay the same.) - oilperperson
*(oil Consumption per capita (tonnes per year and person))*

This will help us to find answers for important questions like How does a country can have better life expectancy without compromising on its development projects which causes lots of CO2 emission. Where does a country should focus its agenda to have a better life expectancy. So lets dive into the data….

]]>

Neural networks have been the most promising field of research for quite some time. Recently they have picked up more pace. In earlier days of neural networks, it could only implement single hidden layers and still we have seen better results.

Deep learning methods are becoming exponentially more important due to their demonstrated success at tackling complex learning problems. At the same time, increasing access to high-performance computing resources and state-of-the-art open-source libraries are making it more and more feasible for enterprises, small firms, and individuals to use these methods.

Neural network models have become the center of attraction in solving machine learning problems.

Now, What’s the use of knowing something when we can’t apply our knowledge intelligently. There are various problems with neural networks when we implement them and if we don’t know how to deal with them, then so-called “Neural Network” becomes useless.

Some Issues with Neural Network:

- Sometimes neural networks fail to converge due to low dimensionality.
- Even a small change in weights can lead to significant change in output. sometimes results may be worse.
- The gradient may become zero . In this case , weight optimization fails.
- Data overfitting.
- Time complexity is too high. Sometimes algorithm runs for days even on small data set.
- We get the same output for every input when we predict.

So what next!!

One day I sat down(I am not kidding!) with neural networks to check What can I do for better performance of neural networks. I have tried and tested various use cases to discover solutions.

Let’s dig deeper now. Now we’ll check out the proven way to improve the performance(Speed and Accuracy both) of neural network models:

**1. Increase hidden Layers **

we have always been wondering what happens if we can implement more hidden layers!! In theory, it has been established that many of the functions will converge in a higher level of abstraction. So it seems more layers better results

Multiple hidden layers for networks are created using the mlp function in the RSNNS package and neuralnet in the neuralnet package. As far as I know, these are the only neural network functions in R that can create multiple hidden layers(I am not talking about Deep Learning here). All others use a single hidden layer. Let’s start exploring the neural net package first.

I won’t go into the details of the algorithms. You can google it yourself about their training process. I have used a data set and want to predict Response/Target variable. Below is a sample code for 4 layers.

R code

** A. Neuralnet Package**

library(neuralnet)

set.seed(1000000)

multi_net = neuralnet(action_click~ FAL_DAYS_last_visit_index+NoofSMS_30days_index+offer_index+Days_last_SMS_index+camp_catL3_index+Index_weekday , algorithm= ‘rprop+’, data=train, hidden = c(6,9,10,11) ,stepmax=1e9 , err.fct = “ce” ,linear.output =F)

I have tried several iteration. Below are the confusion matrix of some of the results

** B. RSNNS Package**

library(RSNNS)

set.seed(10000)

a = mlp(train[,2:7], train$action_click, size = c(5,6), maxit = 5000,

initFunc = “Randomize_Weights”, initFuncParams = c(-0.3, 0.3),

learnFunc = “Std_Backpropagation”, learnFuncParams = c(0.2,0),

hiddenActFunc = “Act_Logistic”, shufflePatterns = TRUE, linOut = FALSE )

I have tried several iteration. Below are the confusion matrix of some of the results.

From my experiment, I have concluded that when you increase layers, it may result in better accuracy but it’s not a thumb rule. You have to just test it with a different number of layers. I have tried several data set with several iterations and it seems neuralnet package performs better than RSNNS. Always start with single layer then gradually increase if you don’t have performance improvement .

Figure 2 . A multi layered Neural Network

**2. Change Activation function**

Changing activation function can be a deal breaker for you. I have tested results with sigmoid, tanh and Rectified linear units. Simplest and most successful activation function is rectified linear unit. Mostly we use sigmoid function network. Compared to sigmoid, the gradients of ReLU does not approach zero when x is very big. ReLU also converges faster than other activation function. You should know how to use these activation function i.e. when you use “tanh” activation function you should categorize your binary classes into “-1” and “1”. The classes encoded in 0 and 1 , won’t work in tanh activation function.

** **

**3. Change Activation function in Output layer**

I have experimented with trying a different activation function in output layer than that of in hidden layers. In some cases, results were better so its better to try with different activation function in output neuron.

As with the single-layered ANN, the choice of activation function for the output layer will depend on the task that we would like the network to perform (i.e. categorization or regression). However, in multi-layered NN, it is generally desirable for the hidden units to have nonlinear activation functions (e.g. logistic sigmoid or tanh). This is because multiple layers of linear computations can be equally formulated as a single layer of linear computations. Thus using linear activations for the hidden layers doesn’t buy us much. However, using linear activations for the output unit activation function (in conjunction with nonlinear activations for the hidden units) allows the network to perform nonlinear regression.

**4. Increase number of neurons**

If an inadequate number of neurons are used, the network will be unable to model complex data, and the resulting fit will be poor. If too many neurons are used, the training time may become excessively long, and, worse, the network may overfit the data. When overfitting $ occurs, the network will begin to model random noise in the data. The result is that the model fits the training data extremely well, but it generalizes poorly to new, unseen data. Validation must be used to test for this.

There is no rule of thumb in choosing number of neurons but you can consider this one –

N is number of hidden neurons-

- N = 2/3 the size of the input layer, plus the size of the output layer.
- N < twice the size of the input layer

**5. Weight initialization**

While training neural networks, first-time weights are assigned randomly. Although weight updation does take place, but sometimes neural network can converge in local minima. When we use multilayered architecture, random weights does not perform well. We can supply optimal initial weights. You should try with different random seed to generate different random weights then choose the seed number which works well for your problem.

You can use methods like Adaptive weight initialization, Xavier weight initialization etc to initialize weights.

The random values of initial synaptic weights generally lead to a big error. So learning is finding a proper value for the synaptic weights, in order to find the minimum value for output error. below figure shows being trapped in local minima in order to find optimal weights-

Figure 3: Local minima problem due to random initialization of weights

**6. More data**

When We have lots of data , then neural network generalizes well. otherwise, it may overfits data. So it’s better to have more data. Overfitting is a general problem when using neural networks. The amount of data needed to train a neural network is very much problem-dependent. The quality of training data (i.e., how well the available training data represents the problem space) is as important as the quantity (i.e., the number of records, or examples of input-output pairs). The key is to use training data that generally span the problem data space. For relatively small datasets (fewer than 20 input variables, 100 to several thousand records) a minimum of 10 to 40 records (examples) per input variable is recommended for training. For relatively large datasets (more than 20 000 records), the dataset should be sub-sampled to obtain a smaller dataset that contains 30 – 50 records per input variable. In either case, any “extra” records should be used for validating the neural networks produced.

**7. Normalizing/Scaling data**

Most of the times scaling/normalizing your input data can lead to improvement. There are a variety of practical reasons why standardizing the inputs can make training faster and reduce the chances of getting stuck in local optima. Also, weight decay and Bayesian estimation can be done more conveniently with standardized inputs. When NN use gradient descent to optimize parameters , standardizing covariates may speed up convergence (because when you have unscaled covariates, the corresponding parameters may inappropriately dominate the gradient).

**8. Change learning algorithm parameters**

Try different learning rates (0.01 to 0.9). Also try different momentum parameters, if your algorithm supports it (0.1 to 0.9). Changing learning rate parameter can help us to identify if we are getting stuck in local minima.

The two plots below nicely emphasize the importance of choosing learning rate by illustrating two most common problems with gradient descent:

(i) If the learning rate is too large, gradient descent will overshoot the minima and diverge.

(ii) If the learning rate is too small, the algorithm will require too many epochs to converge and can become trapped in local minima more easily.

Figure 4 : Effect of learning rate parameter values

**9. Deep learning for auto feature generation**

Machine learning is one of the fastest-growing and most exciting fields out there, and deep learning represents its true bleeding edge. Usual neural networks are not efficient in creating features. Like other machine learning models, Neural networks algorithm’s performance also depends on the quality of features. If we have better features then we would have better accuracy. When we use deep architecture then features are created automatically and every layer refines the features. i.e.

**10. Misc-** You can try with a different number of epoch and different random seed. Various parameters like dropout ratio, regularization weight penalties, early stopping etc can be changed while training neural network models.

To improve generalization on small noisy data, you can train multiple neural networks and average their output or you can also take a weighted average. There are various types of neural network model and you should choose according to your problem. i.e. while doing stock prediction you should first try Recurrent Neural network models.

Figure 5 : After dropout, insignificant neurons do not participate in training

References:

1. http://stats.stackexchange.com/

2. http://stackoverflow.com/

3. https://www.quora.com/

4. http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html

5. http://www.nexyad.net/html/upgrades%20site%20nexyad/e-book-Tutorial-Neural-Networks.html