K-Means Algorithm with Numerical Explanation

K-means clustering Overview in a number of classical data mining techniques the k-means algorithm is one of the most popular technique of unsupervised learning or clustering. that technique can be used with any kind of mining techniques web mining, text mining or any structured data mining. that algorithm is suitable to used for partitioning of data into K number of clusters. therefore that technique is also known as partition based clustering approach. Basic Functioning that technique traditionally need to select k number of initial centroids. these centroids are compared with the remaining data objects in the input data samples. in order to compare the data objects the distance functions are used such as euclidean distance. Basics of clustering Clustering is the grouping of a particular set of objects based on their characteristics, aggregating them according to their similarities. Regarding to data mining, this methodology partitions the data implementing a specific join algorithm, most suitable for the desired information analysis. This clustering analysis allows an object not to be part of a cluster, or strictly belong to it, calling this type of grouping hard partitioning. In the other hand, soft partitioning states that every object belongs to a cluster in a determined…

Classification and prediction in data mining

Classification is a technique of supervised learning in data mining. that technique is applied when the data patterns or samples are having some predefined pattern labels or class labels. the supervised learning algorithms first prepare the data models based on the existing patterns. these existing patterns are known as training samples. additionally the preparation of data models are known as the training of algorithms. after the training of algorithms the data model is used to recognize the similar newly appeared samples or patterns. that is a very essential and popular technique in data mining because for obtaining the precise outcomes these techniques are used. figure 1 classification There are two forms of data analysis that can be used for extract models describing important classes or predict future data trends. These two forms are as follows Classification Prediction These data analysis help us to provide a better understanding of large data. Classification predicts categorical and a prediction model predicts continuous valued functions. For example, we can build a classification model to categorize bank loan applications as either safe or risky, or a prediction model to predict the expenditures in dollars of potential customers on computer equipment given their income and occupation….

Clustering in Data mining

clustering is a technique to prepare the group of similar data objects based on their internal similarity. the data objects are representing the properties or features of an individual pattern of data. these properties or features are used for computing the similarity or differences among the data objects. the data objects which are grouped is termed as the cluster of data. the cluster analysis includes two main components first is known as centroid and second is data objects or cluster members. Clustering is the grouping of a particular set of objects based on their characteristics, aggregating them according to their similarities. Regarding to data mining, this methodology partitions the data implementing a specific join algorithm, most suitable for the desired information analysis. This clustering analysis allows an object not to be part of a cluster, or strictly belong to it, calling this type of grouping hard partitioning [1]. In the other hand, soft partitioning states that every object belongs to a cluster in a determined degree. More specific divisions can be possible to create like objects belonging to multiple clusters, to force an object to participate in only one cluster or even construct hierarchical trees on group relationships There are…

What is web mining

Overview of web mining Internet is a large source of data and information; the data on web is frequently accessed and changed. Important and knowledgeable information extraction form the World Wide Web is the application of data mining techniques. Figure 1 categories of web mining The technique of exploring the web data using the data mining algorithms is termed as web mining in order to recover the significant patterns over the data. The information in web can be available directly by using contents and links or indirectly by using the access logs or other kinds of log formats. According to the application of mining algorithms or techniques the web mining can be categorized in three main classes: Web content mining: this technique is also known as text mining, generally the second step in Web data mining. Content mining is the scanning and mining of text, pictures and graphs of a Web page to determine the significance of the content. Web structure mining: that is one of three categories of web mining, it is a tool used to recognize the connection between web pages linked by information or direct link connection. This organization of data is discover-able by the condition of web structure…

What is text mining

When the data mining techniques and algorithms are utilized with the unstructured source of digital documents such as text file, web documents or others. that process is known as text mining. In web mining the content mining includes the techniques of text mining for finding text based patterns from web documents. Basically text mining approaches, required much effort to find specific pattern from data. First text mining approach is introduced in mid-1980s. But technological progress have allow to improve previous issues continuously. Text mining is a domain that having a wide range of applications in information retrieval, machine learning, data mining, statistics, and computational semantics. According to the different definitions of text mining it can be: “This kind of system able to gain information across languages and also capable to group similar data from different kind of language sources according to their original semantics.” A Business Intelligence System where text mining for unstructured data is addresses as the major issue, which describes a system that will: “Utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the ‘action points’ in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a…

What is data mining

Data mining is a technique to explore and analyse the data using the computational algorithms. The analysis of data results some similar or dissimilar patterns. That are used for designing and developing different applications such as recognition, decision making and others. Data mining supports various kinds of data modeling such as classification, prediction, association, cluster analysis and others. The mining and their techniques can be depends upon the application. Data mining algorithms consumes data samples, which is supplied for performing the mining. In the real world, huge amount of data are available. This data is belongs from various domains such as education, medical and others. This data may be used for extracting knowledge and information for making decision and recognizing similar patterns. For example, we can find sales patterns in a month from some shopping database. Data can be analyzed, summarized, visualized to understand and meet to challenges [1]. The goals of data mining are fast retrieval of data, knowledge Discovery, identification of hidden patterns, reduce level of complexity, etc [2]. Data mining is treated as knowledge discovery in database (KDD process). KDD is an iterative process it includes the following steps. figure 1 data mining process Types of Data…

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert