What is text mining

August 10, 2017 Author: munishmishra04_3od47tgp
Print Friendly, PDF & Email

When the data mining techniques and algorithms are utilized with the unstructured source of digital documents such as text file, web documents or others. that process is known as text mining. In web mining the content mining includes the techniques of text mining for finding text based patterns from web documents. Basically text mining approaches, required much effort to find specific pattern from data. First text mining approach is introduced in mid-1980s. But technological progress have allow to improve previous issues continuously. Text mining is a domain that having a wide range of applications in information retrieval, machine learning, data mining, statistics, and computational semantics. According to the different definitions of text mining it can be:

“This kind of system able to gain information across languages and also capable to group similar data from different kind of language sources according to their original semantics.”

A Business Intelligence System where text mining for unstructured data is addresses as the major issue, which describes a system that will:

“Utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the ‘action points’ in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points”.

Text mining is an art of “text analytics”. That provide a way to make qualitative or “unstructured” data usable by a computer. Qualitative data is descriptive data that cannot be measured in numbers and often includes qualities of appearance like color, texture, and textual description. Quantitative data is numerical, structured data that can be measured. However, there is often confusion between qualitative and quantitative categories [1]. Text mining is also known as data mining for textual unstructured databases. That refers to a process of extracting interesting patterns or knowledge from text i.e. may it be in any format. Text mining roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.

Text Mining Process

Text mining process starts with pre-processing. That transforms input raw text in structured information. Text pre-processing input is unstructured data or semi structured data such as HTML pages. The data submitted to this process is cleaned and useful features are recovered. The output is stored in database or any other structured format [2]. Text Mining techniques are applied to this structured form of data [3]. The different data mining algorithms can be applicable to this data to model text data with applications such as information retrieval system.

Structured data is data that resides in a fixed field within a record or file. This data is contained in relational database and spreadsheets. The unstructured data usually refers to information that does not reside in a traditional row-column database and it is the opposite of structured data. Following are the depiction of steps to process the text data.

Text mining process

Figure 1 text mining process

Text Mining Applications

Some applications of text mining techniques are [4] [5]:

Software Applications: Text mining is also being researched and developed by major firms, to further automate the mining and analysis processes.

Security Applications: Many text mining packages are marketed for security applications, especially monitoring and analysis of text sources such as news, blogs, etc. for security purposes.

Online media applications: Text mining is used by large media companies, to clarify information and to provide readers great experiences.

Marketing applications: Text mining is starting to be used in marketing as well, more specifically in analytical customer relationship management. A survey and marketing is done by using many of Text Mining Tools.

Academic applications: The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text.

Sentiment Analysis: Sentiment analysis may involve analysis of reviews for estimating how favorable a review is for a product or service.


[1] Miss LatikaKaushik, “Text Mining – Scope and Applications”, Journal of Computer Science and Applications, Volume 5, Number 2 (2013), pp. 51-55

[2] A. H. Tan, “Text Mining: The State of the Art and the Challenges”, in PAKDD99 Whorkshop on Knowledge Discovery from advanced Databases, Beijing, China, April 1999.

[3] Nahm U.Y. e Mooney R.J., “Using Information Extraction to Aid the Discovery of Prediction Rules from Text”, in KDD2000 Workshop on Text Mining, Boston, Massachusetts, USA, and August 2000.

[4] Dr. S. Vijayarani Ms. J. I lamathi and Ms. Nithya, “Preprocessing Techniques for Text Mining – An Overview”, International Journal of Computer Science & Communication Networks, Volume 5(1), pp. 7-16

[5] Vishal Gupta, “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies In Web Intelligence, Vol. 1, No. 1, August 2009

[6]http://litablog.org/2015/11/brave-new-workplace-text-mining/ (image)

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview