# introduction of Regression Analysis

November 24, 2017

Extracting patterns and models of interest from large databases is attracting much attention in a variety of disciplines. Knowledge discovery in databases (KDD) and data mining are areas of common interest to researchers in machine learning, pattern recognition, statistics, artificial intelligence, and high performance computing.

### Regression General Overview

Regression is a data mining function that predicts a number. Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression techniques. For example, a regression model could be used to predict the value of a data warehouse based on web-marketing, number of data entries, size, and other factors. A regression task begins with a data set in which the target values are known. For example, a regression model that predicts data warehouse values could be developed based on observed data for many data warehouses over a period of time. In addition to the value, the data might track the age of the data warehouse, size and number of clusters and so on. Data warehouse value would be the target, the other attributes would be the predictors, and the data for each data warehouse would constitute a case. In the Regression is a data mining function that predicts a number. Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression techniques.

For example, a regression model could be used to predict the value of a data warehouse based on web-marketing, number of data entries, size, and other factors. A regression task begins with a data set in which the target values are known. For example, a regression model that predicts data warehouse values could be developed based on observed data for many data warehouses over a period of time. In addition to the value, the data might track the age of the data warehouse, size and number of clusters and so on. Data warehouse value would be the target, the other attributes would be the predictors, and the data for each data warehouse would constitute a case.

### Regression Analysis Definition

Regression is a data mining (machine learning) technique used to fit an equation to a dataset. The simplest form of regression, linear regression, uses the formula of a straight line  and determines the appropriate values for m and b to predict the value of y based upon a given value of. Basically a Linear regression models are used to show or predict the relationship between two variables or factors. The factor that is being predicted (the factor that the equation solves for) is called the dependent variable. The factors that are used to predict the value of the dependent variable are called the independent variable.

### Regression Analysis Example

In statistics, it’s hard to stare at a set of random numbers in a table and try to make any sense of it. For example, global warming may be reducing average snowfall in your town and you are asked to predict how much snow you think will fall this year. Looking at the following table you might guess somewhere around 10-20 inches. That’s a good guess, but you could make a better guess, by using regression.

 Year Amount (Inches) 2000 40 2001 39 2002 41 2003 29 2004 32 2005 30 2006 33 2007 15 2008 10 2009 11 2010 20 2011 24 2012 10 2013 15 2014 26 2015 32 2016 19

Essentially, regression is the “best guess” at using a set of data to make some kind of prediction. It’s fitting a set of points to a graph. There’s a whole host of tools that can run regression, including graph

Figure 1: Example of Regression

Just by looking at the regression line running down through the data, you can fine tune your best guess a bit.

### Need of Regression Technique

There are various reasons for using regression technique in data mining. Some of these are listed below:

• A regression task begins with a data set in which the target values are known. For example, a regression model that predicts children’s height could be developed based on observed data for many children over a period of time. The data might track age, height, weight, developmental milestones, family history, and so on. Height would be the target, the other attributes would be the predictors, and the data for each child would constitute a case.
• In the model build (training) process, a regression algorithm estimates the value of the target as a function of the predictors for each case in the build data. These relationships between predictors and target are summarized in a model, which can then be applied to a different data set in which the target values are unknown.
• Regression models are tested by computing various statistics that measure the difference between the predicted values and the expected values.

References

[1] Nanhay Singh, Ram Shringar Raw And Chauhan R.K., “Data Mining With Regression Technique”, Journal of Information Systems and Communication, Volume 3, Issue 1, 2012, pp.-199-202

[2] “Regression Analysis: Step by Step Articles, Videos, and Simple Definitions”, available online at: http://www.statisticshowto.com/probability-and-statistics/regression-analysis/

[3] Swati Gupta, “A Regression Modeling Technique on Data Mining”, International Journal of Computer Applications (IJCA), Volume 116 – No. 9, April 2015

$${}$$