# Community Detection : Unsupervised Learning

November 9, 2017

Advances in technology and computation have provided the possibility of collecting and mining a massive amount of real-world data. Mining such “big data” allows us to understand the structure and the function of real systems and to find unknown and interesting patterns. This section provides the brief overview of the community structure.

# Introduction of Community Detection

In the actual interconnected world, and the rising of online social networks the graph mining and the community detection become completely up-to-date. Understanding the formation and evolution of communities is a long-standing research topic in sociology in part because of its fundamental connections with the studies of urban development, criminology, social marketing, and several other areas. With increasing popularity of online social network services like Facebook, the study of community structures assumes more significance. Identifying and detecting communities are not only of particular importance but have immediate applications. For instance, for effective online marketing, such as placing online ads or deploying viral marketing strategies [10], identifying communities in social network could often lead to more accurate targeting and better marketing results. Albeit online user profiles or other semantic information is helpful to discover user segments this kind of information is often at a coarse-grained level and overlooks community structure that reveals rich information at fine-grained level.

### Significance of Community Detection

Many real-world complex systems, such as social or computer networks can be modeled as large graphs, called complex networks. Because of the increasing volume of data and the need to understand such huge systems, complex networks have been extensively studied these last ten years. Communities are clearly overlapping in real world systems, especially in social networks, where every individual belongs to various communities: family, colleagues, groups of friends, etc. Finding all these overlapping communities in a huge graph is very complex: in a graph of  nodes there are  2n such possible communities and  such possible community structures. Even if these communities could be efficiently computed, it may lead to uninterruptable results. Because of the complexity of overlapping communities’ detection, most studies have restricted the community structure to a partition, where each node belongs to one and only one community.

Identifying network communities can be viewed as a problem of clustering a set of nodes into communities, where a node can belongs multiple communities at once. Because nodes in communities share common properties or attributes, and because they have many relationships among themselves, there are two sources of data that can be used to perform the clustering task. The first is the data about the objects (i.e., nodes) and their attributes. Known properties of proteins, users’ social network profiles, or authors’ publication histories may tell us which objects are similar, and to which communities or modules they may belong. The second source of data comes from the network and the set of connections between the objects. Users form friendships, proteins interact, and authors collaborate.

### Definition of Community Detection

Community detection is a key to understanding the structure of complex networks, and ultimately extracting useful information from them. An excessively studied structural property of real-world networks is their community structure. The community structure captures the tendency of nodes in the network to group together with other similar nodes into communities. This property has been observed in many real-world networks. Despite excessive studies of the community structure of networks, there is no consensus on a single quantitative definition for the concept of community and different studies have used different definitions. A community, also known as a cluster, is usually thought of as a group of nodes that have many connections to each other and few connections to the rest of the network. Identifying communities in a network can provide valuable information about the structural properties of the network, the interactions among nodes in the communities, and the role of the nodes in each community.

In the clustering framework a community is a cluster of nodes in a graph, but a very important question is what a cluster is? but most of the time, the objects in a cluster must be more similar than objects out of this cluster: the objects are clustered or grouped based on the principle of maximizing the intra-class similarity and minimizing the inter-class similarity. Let us remark that, this definition implies the necessity to define the notions of similarity measure and/or cluster fitness measure.

Figure 1 Social Network Community Structure

Community detection has been widely used in social network analysis to study the behaviour and interaction patterns of people in social network. Community detection is also important to identify powerful nodes in the network, based on their structural position to initiate influential campaigns.

### References

[12] Chayant Tantipathananandh, “Detecting and Tracking Communities in Social Networks”, Dissertation Northwestern University, 2013

[2] J. Chang and D. M. Blei, Relational topic models for document networks. In AISTATS ’09, 2009

[3] Clara Granell, Sergio G´omez and Alex Arenas, “Data clustering using community detection algorithms”, Int. J. Complex Systems in Science volume 1 (2011), pp. 21–24

[4] S. Fortunato, “Community detection in graphs”, Physics Reports, vol. 486, no. 3-5, pp. 75 – 174, 2010, online available at: http://www.sciencedirect.com/science/article/B6TVP-4XPYXF1- 1/2/99061fac6435db4343b2374d26e64ac1

$${}$$