Thursday, December 26, 2013

Logistic Regression in Machine Learning

Before studying logistic regression, I would recommend you to go through these tutorials.
The first and most important thing about logistic regression is that it is not a "Regression" but a "Classification" algorithm. The name itself is somewhat misleading. Regression gives a continuous numeric output but most of the time we need the output in classes (i.e. categorical, discrete). For example, we want to classify emails into "spam" or "not spam",  classify treatment into "success" or "failure", classify statement into "right" or "wrong" , classify transactions into "fraudulent" or "non-fraudulent" and so on. These are the examples of  logistic regression having binary output (also called dichotomous). Note that the output may not always be binary but in this article I merely talk about  binary output.

Saturday, December 21, 2013

Nepal over last Four Decades with R

The World Bank DataBank provides the data of all countries for more than 1000 indicators since 1960. It can be used for various statistical analysis purpose. This weekend, I downloaded the Data for Nepal, and tried few simple things in R, mainly for the purpose of learning R. R is very powerful statistical analysis tool.

The data was downloaded as csv file, which can be read with read.csv() function in R. We can run the R commands from R command line, and its really interactive.

Gradient descent versus normal equation

Gradient descent and normal equation (also called batch processing) both are methods for finding out the local minimum of a function. I have given some intuition about gradient descent in previous article. Gradient descent is actually an iterative method to find out the parameters. We start with the initial guess of the parameters (usually zeros but not necessarily), and gradually adjust those parameters so that we get the function that best fit the given data points. Here is the mathematical formulation of gradient descent in the case of linear regression.

Tuesday, December 10, 2013

Numerical Methods Tutorials

This section consists of various numerical methods problems and their solution in C language. You can click each link to view the source code of corresponding problem in C.

  1. Solution of Differential Equation using RK4 method
  2. Solution of Non-linear equation by Bisection Method
  3. Solution of Non-linear equation by Newton Raphson Method
  4. Solution of Non-linear equation by Secant Method
  5. Interpolation with unequal method by Lagrange's Method
  6. Linear Curve Fitting
  7. Parabolic Curve Fitting
  8. Gauss Jordan Method
  9. Determinant of a NxN Matrix
  10. Inverse of a NxN Matrix
  11. Integration using Trapezoidal Rule
  12. Integration using Simpson's 3/8 Rule
  13. Integration using Simpson's 1/3 Rule
  14. Greatest Eigen value and Eigen vector using Power Method
  15. Condition number and ill condition checking 
  16. Newton's Forward and Backward interpolation
  17. 2 Dimensional matrix multiplication 
Note: All the codes are compiled in GCC Mingw compiler in windows. Attempting to compile in other compiler and platform may result errors. These tutorials are targeted for student and for learning purpose, so the code may not be optimized for the actual implementation. Some operations like matrix inversion and determinant are done without pivoting, divide by zero error may result in some cases. Partial and full pivoting are recommended. 

Friday, November 1, 2013

Business Intelligence (BI) implementation in Nepalese Telecommunications

This article aims at discovering the business intelligence technologies that can be applied to the telecommunication industries to facilitate the business process. I have used the data warehouse approach to build the warehouse of telecom data from which various reporting tolls can be used to visualize these data and perform various types of analysis to support the decision making process. Furthermore, I have pointed out some data mining methods that can be added to make it more intelligent as its name "Business Intelligence'. This article shows all the steps from data collection to report generation through warehouse ETC process with related technological background and tools and technologies wherever necessary.

You can download the full article from here.

Linear Regression in Machine Learning

Linear regression is used in machine learning to predict the output for a new data based on the previous data set. Suppose you have data set of shoes containing 100 different sized shoes along with prices. Now if you want to predict a price of a shoe of size (say) 9.5 then one way of doing prediction is by using linear regression. We train the model based on those 100 data. After training we will have a hypothesis and based on this hypothesis we can predict the price of the new shoe.

Monday, July 22, 2013

Fetch Data from MYSQL using Jquery Ajax

This is really an interesting article where I will show you how to fetch data from mysql table using Jquery ajax. If you are familiar with PHP, then you may already know about how to fetch data using PHP. But in this tutorial I will show a completely different way of doing this. Ajax is a technology which extracts and displays the web content without loading the full page. I can update a single HTML <div> without loading the full web site. Now lets move to step by step procedures for pulling the out of mysql table and displaying in HTML table.
The first thing you need to do is create a simple MySQL database with some tables with some data. You can do this using simple SQL query but I have already a data of population of districts of Nepal in my database. I will use it in this tutorial. If you want to display different data then its ok just create a mysql table and insert some data in it.

Thursday, July 18, 2013

Solving Knapsack problem using Dynamic Programming

This is post is basically for solving the Knapsack problem, very famous problem in optimization community, using dynamic programming. But remember this problem can be solved using various approaches with different complexities, but here I shall talk about only dynamic programming, specifically bottom-up  approach. So lets first talk about what is Knapsack problem for the people who are unfamiliar with it. The Knapsack problem states that “Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible”. Lets consider a concrete example , suppose you have a Knapsack and you have lots of books and articles. You are planning to go somewhere in your vacation and you want to keep some important books and articles with you. The Knapsack has fix capacity so you cannot keep all the books and articles that you got inside the knapsack.

Saturday, June 22, 2013

C Mini Project Ideas with a Sample Calculator Project

Do you want to build a simple application in C but you don’t know how and where to start? Or you know how to build a C application but don’t know any project ideas? then do not worry you are at the right place. If you finished learning  C and became familiar with its programming paradigm then I encourage you to build some applications (weather it is application software or system software) to actually sharpen your skills in C. If you do projects, then you will know how to apply those programming constructs accurately in building a projects. So here I will explain how to start a new C projects for completely beginner and give some projects ideas about what type of application you can build using C language
I have some suggestions for people who are about to write their first C application.

Friday, May 17, 2013

Basic Euclidean vector operations in C++

Euclidean vectors are very important quantity in mathematics, applied mathematics, physics, engineering etc. Formally vectors are those quantity which have both magnitude and direction. For example velocity. Velocity has both magnitude (Speed, like 2 Km/Hr) and direction (e.g. east). In programmer’s perspective, there are many situation where you need to compute different vector operations. For example consider a moving ball. To simulate the motion of ball, you must calculate the velocity of the ball. The ball has both magnitude (i.e rate of translation of  position of center of the ball) and direction (e.g. axis x or y or z). This is just one example but there are various examples where vector operations are essential.
Now in this article, I will show you how to compute different vector operations like, sum of two vectors, multiplication by scalar, dot product, cross product, normalization etc in C++. I choose C++ instead of C because it is object oriented. The vector can be represented by an object. The components of vector along x, y and z axis will be the data member of the object. The data members are attributes of the object. Similarly different operations will be the member functions of the object. Now the vector object can completely represent the vector quantity.

Tuesday, May 14, 2013

Calculation of Discrete Fourier Transform(DFT) in C/C++ using Naive and Fast Fourier Transform (FFT) method

Discrete Fourier Transform has great importance on Digital Signal Processing (DSP). There are many situations where analyzing the signal in frequency domain is better than that in time domain. The Fourier Transform actually converts the function in time domain to frequency domain,  some processing is done in frequency domain, and finally inverse Fourier transform converts the signal back into time domain. The term discrete means the signal is not continuous rather it is sampled by some sampling frequency i.e. only some samples are taken in certain interval (also called period). The sampling frequency depends upon the frequency of original signal and this must satisfy Nyquist criteria. A simple example of Fourier transform is applying filters in frequency domain of digital image processing. Before looking into the implementation of DFT, I recommend you to first read in detail about the Discrete Fourier Transform in Wikipedia. If you are already familiar with it, then you can see the implementation directly.  The 1 dimensional DFT can be calculated by using the following formula
dfte 

Monday, May 6, 2013

Pascal’s Triangle using C program

Printing Pascal’s Triangle is famous problem in C. You may be familiar with the Pascal triangle from the Math class when you learnt Permutation and Combination or Binomial Coefficients or other class. If you are not familiar then don't worry, just look at Wikipedia to get inside of the Pascal Triangle and see the code below.

There are various methods to generate Pascal’s Triangle in C program. Among them most popular one is by using Combination. Since elements of the triangle represent the Binomial Coefficients of a polynomial and Binomial Coefficient can be calculated using the Combination, I decided to generate Pascal’s Triangle using Combination method. Here is the source code

Tuesday, March 12, 2013

Setting up FTP server vsftpd in Fedora 16


The file transfer protocol (FTP) can be used to transfer files across different systems. In this article, I set up a secure ftp server vsftpd in my fedora machine and access the files on this server from my friend Niroj Karki’s client computer. To set up the file server on fedora you need to first install the ftp server using following command

$su
$password:
$yum –y install vsftpd.x86_64

Open the file /etc/vsftpd/vsftpd.conf and uncomment/add the following line

Setting up SSH server in Fedora 16 : Theory and Configuration Details

rlogin and ssh are used to login to remote server. They are very useful tool to login to the remote machine and access the resources available. Rlogin and ssh both are used for this purpose. The only difference between them is in security aspect. In rlogin, all information, including passwords, is transmitted unencrypted (making it vulnerable to interception). So now-a-days ssh (secured shell) is used most often.

The original Berkeley package which provides rlogin also features rcp (remote-copy, allowing files to be copied over the network) and rsh (remote-shell, allowing commands to be run on a remote machine without the user logging into it). These share the hosts.equiv and .rhosts access-control scheme (although they connect to a different daemon, rshd), and as such suffer from the same security problems. The ssh suite contains suitable replacements for both: scp replaces rcp, and ssh itself replaces both rlogin and rsh.


Steps needed to configure SSH server in Fedora 16

Monday, March 4, 2013

Sobel and Prewitt edge detector in C++: Image Processing

Sobel and Prewitt are used extensively for detecting edges in image processing.

Sobel Operator

The operator calculates the gradient of the image intensity at each point, giving the direction of the largest possible increase from light to dark and the rate of change in that direction. The result therefore shows how "abruptly" or "smoothly" the image changes at that point, and therefore how likely it is that part of the image represents an edge, as well as how that edge is likely to be oriented. The Sobel kernels are given by

Here the kernel hx is sensitive to changes in the x direction, i.e., edges that

Saturday, March 2, 2013

Gaussian blurring using separable kernel in C++

Gaussian blurring is used to reduce the noise and details of the image. It reduces the image's high frequency components and thus it is type of low pass filter. Gaussian blurring is obtained by convolving the image with Gaussian function. For more information about Gaussian function see the Wikipedia page.  Since 2D Gaussian function can be obtained by multiplying two 1D Gaussian functions, the blurring can be obtained by using separable kernel.

Use of Separable Kernel

Thursday, February 28, 2013

Loading large data into Oracle database using SQL* Loader

SQL * Loader is very useful when loading large data, which are very tedious and time consuming task, to the database. Suppose you have a 10000 rows in .csv file and you have to load all the rows to the database. Now what would you do? Would you enter each and every row individually and spends years to do it? obviously not. For this task there is a very useful tool SQL *Loader which load large data file into database in very short time. Here I will give an small example of loading the data which are in .csv file to the ORACLE database located in remote host (same steps can be used to load into local machine).

Wednesday, February 27, 2013

C/C++ program to determine the day of a week

This tutorial is about finding out the corresponding day of the given date. For example, if I want to know which is the corresponding day of date 1991 March 10, then the program returns "Sunday". There are various algorithm proposed for this. The below program is based on and valid for

Source Code

Tuesday, February 19, 2013

Gaussian Filter generation using C/C++

Gaussian filtering is extensively used in Image Processing to reduce the noise of an image. In this article I will generate the 2D Gaussian Kernel that follows the Gaussian Distribution which is given 
Where σ is the standard deviation of distribution, x is the distance from the origin in the horizontal axis, y is the distance from the origin in the vertical axis. The mean is assumed to be at origin O(0,0). The pictorial view of Gaussian Distribution for σ= 0 and mean at origin is

Thursday, February 14, 2013

Median Filter using C++ and OpenCV: Image Processing

Basic Theory

Median filter also reduces the noise in image like low pass filter, but it is better than low pass filter in the sense that it preserves the edges and other details.  The process of calculating the intensity of central pixel is same as that of low pass filtering except instead of averaging all the neighbours, we sort the window and replace the central pixel with median from the sorted window. For example, lets we have a window like this
Now we sort the given window and get the sorted array as [1 1 2 2 3 3 5 6 7]. The median of this array is 4th element i.e 3. Now we replace the central element 7 with 4. That's it.

Source Code

Saturday, February 2, 2013

Calculating a convolution of an Image with C++: Image Processing

In convolution, the calculation performed at a pixel is a weighted sum of grey  levels from a neighbourhood surrounding a pixel. Grey levels taken from the neighbourhood are weighted by coefficients that come from a matrix or convolution kernel. The kernel's dimensions define the size of the neighbourhood in which calculation take place. The most common dimension is 3x3. I am using this size of matrix in this article. During convolution, we take each kernel coefficient in turn and multiply it by a value from the neighbourhood of the image lying under the kernel. We apply the kernel to the image in such a way that the value at the top-left corner of the kernel is multiplied by the value at bottom-right corner of the neighbourhood. This can be expressed by following mathematical expression for kernel of size mxn.

Wednesday, January 30, 2013

Low pass filters (blurring) in Image Processing using C++

Theory

Low pass filtering also called "blurring" & "smoothing" is very basic filtering operations in image processing. The simplest low-pass filter just calculates the average of a pixel and all of its eight immediate neighbours. The result replaces the original value of the pixel. The process is repeated for every pixel in the image.The example of Kernel used for simple low pass filters is
In this kernel pixel values from the neighbourhood are summed without being weighted, and the sum is divided by the number of pixels in the neighbourhood. Here is the C++ code for low pass filtering operation using above Kernel (Averaging operation) . Here I implement the convolution operation.

Source Code : C++ 

Saturday, January 12, 2013

Histogram equalization using C++: Image Processing

Theory

The histogram equalization is an approach to enhance a given image. The approach is to design a transformation T such that the gray values in the output is uniformly distributed in [0, 1].

Algorithm

Compute a scaling factor, α= 255 / number of pixels
Calculate histogram of the image
Create a look up table LUT with
    LUT[0] =  α * histogram[0]
for all remaining grey levels, i, do
    LUT[i] = LUT[i-1] + α * histogram[i]
end for
for all pixel coordinates, x and  y, do
    g(x, y) = LUT[f(x, y)]
end for

Source Code : C++

Thursday, January 10, 2013

Intensity Histogram using C++ and OpenCV: Image Processing

Theory
The histogram of a digital image with gray levels in the range [0, L-1] is a discrete function h(rk) = nk, where rk is the kth gray level and nk is the number of pixels in the image having gray level rk.
For an 8-bit grayscale image there are 256 different possible intensities, and so the histogram will graphically display 256 numbers showing the distribution of pixels among those grayscale values.

Algorithm

1. Assign zero values to all element of the array hf;
2. For all pixel (x, y) of the image f, increment hf [f(x, y)] by 1.

Tuesday, January 8, 2013

Contrast Stretching using C++ and OpenCV: Image Processing

Theory

Contrast Stretching is one of the piecewise linear function. Contrast Stretching increases the dynamic range of the grey level in the image being processed.
Points (r1, s1) and (r2, s2) control the shape of the transformation. The selection of control points depends upon the types of image and varies from one image to another image. If r1 = s1 and r2 = s2 then the transformation is linear and this doesn't affect the image. In other case we can calculate the intensity of output pixel, provided intensity of input pixel is x, as follows

Wednesday, January 2, 2013

Producing negative of a grayscale image: C++ and OpenCV

Theory
The concept behind negative of grayscale image is very simple. Just subtract each intensity level of an image from 255. The negative transformation is given by the function
                         s = L - 1 - r
Where s is the pixel after transformation, r is the pixel before transformation and L is the maximum intensity level (in our case it is 256).

Program
The program is written in C++ using OpenCV library in QT IDE. If you are using QT IDE then add the following line of code in .pro file.

Tuesday, January 1, 2013

[Solved] Error: Cannot find -IGL / Cannot find existing library

While I was running a OpenCV project on QT, I came up with a wired error. It was like "cannot find -IGl".  I was working on Fedora 16 x86_64 system.
Later I knew that the error was due to  Shared Object (SO) Name, also called soname,  and its versoining.

When linking, the link editor (ld) can take a library as parameter via a link-library parameter (-lGL). When it goes to look the library, it simply search the file by replacing l with lib and appending .so at the end of the link-library parameter. Now the parameter becomes libGL.so. However, it may not find exactly libGL.so and may find different versions like libGL.so.1, libGL.so.1.1 etc. This is the actual reason of my problem.  Linker is looking for libGL.so but I  only have libGL.so.1.