Privacy Preserving Data Mining

October 22, 2017 Author: virendra
Print Friendly, PDF & Email

Privacy is a matter of individual perception, an infallible and universal solution to this dichotomy is infeasible. The common term of privacy in the general, limits the information that is leaked by the distributed computation to be the information that can be learned from the designated output of the computation. The current state-of-the-art paradigm for privacy-preserving data mining is differential privacy, which allows un-trusted parties to access private data through aggregate queries.

Privacy Preserving Data Mining : Overview

The technology that converts clear text into a non-human readable form is called data anonymization. In recent years data anonymization technique for privacy-preserving data publishing of micro-data has received a lot of attention. Micro-data contains information about an individual entity, such as a person, a household or an organization. In each record a number of attributes can be categorized as i) Identifiers that can uniquely identify an individual, such as Name or Social Security Number ii) some attributes may be Sensitive Attributes (SAs) such as disease and salary and iii) some attributes are Quasi-Identifiers (QI) such as zip code, age, and sex which may be from publicly available database, whose values, when taken together, can potentially identify an individual. Data anonymization enables the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure.

Privacy Preserving Data Mining : Significance

Due to the enhancement in computer science new paradigms such as cloud computing created a great demand for integrating data between different database holders. These integrated data enable better data analysis for making better decisions and providing high-quality services. Data mining is the process of extracting hidden predictive information from large databases to support decision making process. There may be a chance of revealing sensitive information stored in data warehouse during extraction of hidden details. To protect this consequence a method is proposed to securely integrate person-specific sensitive data from two data providers, whereby the integrated data still retains the essential information for supporting data mining tasks. Most real-life scenarios are in need for simultaneous data sharing and privacy preservation of person sensitive data. Enhanced security for two party confidential data adopts differential privacy, provides a rigorous privacy model and makes no assumption about an adversary’s background knowledge. Two party algorithms are used in existing system for private data release for vertically-partitioned data between two parties in the semi-honest adversary model.

Privacy Preserving Data Mining

Figure 1 Privacy Preserving Data Mining

Categories of Privacy violation

A privacy breach occurs when private and confidential information about the user is disclosed to an adversary. So, preserving privacy of individuals while publishing user’s collected data is an important research area. The privacy breaches in social networks can be categorized into three types:

  • Identity Disclosure – Identity disclosure occurs when an individual behind a record is exposed. This type of breach leads to the revelation of information of a user and relationship he/she shares with other individuals in the network.
  • Sensitive Link Disclosure – Sensitive link disclosure occurs when the associations between two individuals are revealed. Social activities generate this type of information when social media services are utilized by users.
  • Sensitive Attribute Disclosure – Sensitive attribute disclosure takes place when an attacker obtains the information of a sensitive and confidential user attribute. Sensitive attributes may be linked with an entity and link relationship.

All these mentioned privacy breaches pose severe threats like stalking, blackmailing and robbery because users expect privacy of their data from the service provider end. Besides that it damages the image and reputation of an individual. There are many examples of accidental disclosure of private information of users’ data that causes organizations to be conservative in releasing the network data, such as the AOL search data example and attacks on Netflix data. As per the promises of social networks there is a need to address these issues. Therefore, data needs to be released to third parties in such a way that ensures the privacy of the users. Thus data should be anonymised before releasing or publishing to third parties.


[1] P. Samarati. Protecting respondent’s privacy in micro data release, TKDE, 13(6):1010–1027, 2001

[2] G. Sumathi and M.Indumathi, “Enhanced Security for Two Party Confidential Data”, International Journal of Engineering Sciences & Research Technology , Volume 3, Issue 5,  May, 2014.

[3] J. Liu, J. Luo and J. Z. Huang, “Rating: Privacy Preservation for Multiple Attributes with Different Sensitivity requirements”, in proceedings of 11th IEEE International Conference on Data Mining Workshops, IEEE 2011.



Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview