Sampling, Generation, and Application of Social Networks

博士 === 國立臺灣大學 === 電信工程學研究所 === 103 === By the popularization of the mobile phones and the developing of the Web 2.0, the online social network websites become very popular nowadays. For example, the numbers of active users on Facebook and Twitter are 1.26 billion and 550 million, respectively. On th...

Full description

Bibliographic Details
Main Authors: Hong-Han Shuai, 帥宏翰
Other Authors: Ming-Syan Chen
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/85551723568850623386
Description
Summary:博士 === 國立臺灣大學 === 電信工程學研究所 === 103 === By the popularization of the mobile phones and the developing of the Web 2.0, the online social network websites become very popular nowadays. For example, the numbers of active users on Facebook and Twitter are 1.26 billion and 550 million, respectively. On the other hand, with the emergence of Online Social Networks (OSNs), they have motivated a great deal of research on social network analysis. Therefore, in this dissertation, we study different important and fundamental problems in social networks. The challenges faced for social network analysis are threefold. First, unlike dealing with transactions, in social networks, nodes are connected with edges and form a graph, which complicates the analysis. Therefore, many problems related to graph are NP-Hard problems (as the third problem studied in this dissertation is). Second, our study considers the graph properties in social networks, in which we need to carefully address the interplay between social network properties. Third, the computation needed is much greater than the transaction case, which demands carefully designed data structures and algorithms. In this study, we first explore the joint sampling of multiple OSNs and propose an approach called Quality-guaranteed Multi-network Sampler (QMSampler) that can crawl and jointly sample multiple OSNs. QMSampler provides a statistical guarantee on the difference between the crawled real dataset and the ground truth (the perfect integration of all OSNs). Afterward, since nowadays most available real datasets only support millions of nodes and current popular statistical graph generators are properly designed to preserve only the statistical metrics, such as the degree distribution, diameter, and clustering coefficient of the original social graphs without considering the importance of frequent graph patterns, we make the first attempt to design a Pattern Preserving Graph Generation (PPGG) algorithm to generate a graph including all frequent patterns and three most popular statistical parameters: degree distribution, clustering coefficient, and average vertex degree. In addition, given the available social network datasets, we also explore the group formations with willingness optimization for social group activity. We design a new randomized algorithm to effectively and efficiently solve the problem. Given the available computational budgets, the proposed algorithm is able to optimally allocate the resources and find a solution with an approximation ratio. We implement the proposed algorithm in Facebook, and the user study demonstrates that social groups obtained by the proposed algorithm significantly outperform the solutions manually configured by users.