# What is Knowledge Discovery and Database

January 4, 2018

The desire and need for information has led to the development of systems and equipment that can generate and collect massive amounts of data. Knowledge Discovery and Database (KDD) is an interdisciplinary area focusing upon methodologies for extracting useful knowledge from data. The ongoing rapid growth of online data due to the Internet and the widespread use of databases have created an immense need for Knowledge Discovery and Database (KDD) methodologies. The challenge of extracting knowledge from data draws upon research in statistics, databases, pattern recognition, machine learning, data visualization, optimization, and high-performance computing, to deliver advanced business intelligence and web discovery solutions.

### What is Knowledge Discovery and Database (KDD)?

Knowledge Discovery in Databases is the process of searching for hidden knowledge in the massive amounts of data that we are technically capable of generating and storing. Data, in its raw form, is simply a collection of elements, from which little knowledge can be gleaned. With the development of data discovery techniques the value of the data is significantly improved.

In the real world, huge amount of data are available in education, medical, industry and many other areas. Such data may provide knowledge and information for decision making. For example, you can find out drop out student in any university, sales data in shopping database. Data can be analyzed, summarized, understand and meet to challenges. Data mining is a powerful concept for data analysis and process of discovery interesting pattern from the huge amount of data, data stored in various databases such as data warehouse, World Wide Web, external sources .Interesting pattern that is easy to understand, unknown, valid, potential useful. Data mining is a type of sorting technique which is actually used to extract hidden patterns from large databases.

The goals of data mining are fast retrieval of data or information, knowledge Discovery from the databases, to identify hidden patterns and those patterns which are previously not explored, to reduce the level of complexity, time saving, etc. Data mining refers extracting knowledge and mining from large amount of data. Sometimes data mining treated as Knowledge Discovery and Database (KDD). Knowledge Discovery and Database (KDD) is an iterative process, consist a following step shown in Figure 1.

Figure 1:  Stages of Knowledge Discovery Process

Knowledge Discovery in databases is the process of retrieving high-level knowledge from low-level data. It is an iterative process that comprises steps like Selection of Data, Pre-processing the selected data, Transformation of data into appropriate form, Data mining to extract necessary information and Interpretation/Evaluation of data.

Selection step collects the heterogeneous data from varied sources for processing. Real life medical data may be incomplete, complex, noisy, inconsistent, and/or irrelevant which requires a selection process that gathers the important data from which knowledge is to be extracted.

Pre-processing step performs basic operations of eliminating the noisy data, try to find the missing data or to develop a strategy for handling missing data, detect or remove outliers and resolve inconsistencies among the data.

Transformation step transforms the data into forms which is suitable for mining by performing task like aggregation, smoothing, normalization, generalization, and discretization. Data reduction task shrinks the data and represents the same data in less volume, but produces the similar analytical outcomes. Data mining is a main component in KDD process.

Data Mining includes choosing the data mining algorithm(s) and using the algorithms to generate previously unknown and hypothetically beneficial information from the data stored in the database. This comprises deciding which models/algorithms and parameters may be suitable and matching a specific data mining method with the general standards of the KDD process. Data mining methods includes classification, summarization, clustering, regression, etc. [5].

Evaluation step includes presentation of mined patterns in understandable form. Various types of information need different type of representation; in this step the mined patterns are interpreted. Evaluation of the outcomes is prepared with statistical justification and significance testing.

“The basic task of KDD is to extract knowledge (or information) from lower level data (databases).”(2) There are several formal definitions of KDD; all agree that the intent is to harvest information by recognizing patterns in raw data. Let us examine definition proposed by Fayyad, Piatetsky-Shapiro and Smyth, “Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.”(3) The goal is to distinguish from unprocessed data, something that may not be obvious but is valuable or enlightening in its discovery. Extraction of knowledge from raw data is accomplished by applying Data Mining methods. KDD has a much broader scope, of which data mining is one step in a multidimensional process

### References

[1] Han, Jiawei, Jian Pei, and Micheline Kamber, Data mining: concepts and techniques, Elsevier, 2011.

[2] Er. Rimmy Chuchra, “Use of Data Mining Techniques for the Evaluation of Student Performance: A Case Study”, International Journal of Computer Science and Management Research, Volume 1 Issue 3 October 2012

[3] Maimon, Oded, and Lior Rokach, “Introduction to knowledge discovery and data mining”, In Data Mining and Knowledge Discovery Handbook, pp. 1-15, Springer US, 2009.

[4] “Knowledge Discovery in Databases”, available online at: https://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/KDD3.htm

$${}$$