519 Wahrscheinlichkeitstheorien, mathematische Statistik
Networks are commonly used to model complex real-world systems such as the cell or the brain. An important notion in analyzing networks is modularity. Usually, a network is considered modular if all its nodes can be partitioned into dense, connected subsets that are sparsely connected to one another. In this thesis, we extend the definition of modularity to cover networks that do contain dense modules that are sparsely connected to the rest of the network, but some nodes can belong to the region outside of modules, the so-called transition region. Accordingly, in contrast to full partitions, this work deals considers assignments: The association of every node to a module in the modular region or to the transition region.
The first task to be tackled in this thesis is the construction of a formal definition of modularity. After surveying some of the prominent existing definitions in the literature, we found that they cannot be directly applied to networks that are not fully-partitionable. We then developed an approach based on time-continuous random walks for analyzing networks. Here we built on a rich literature on the topic of spectral clustering, equating network modules with metastable sets of a random walk. This allows us to define a network as modular if there is a choice of modules that produces a coarse-grained Markov process whose dynamics well-approximate those of the original random walk on the network. We first demonstrate that this score matches our intuition of modularity on synthetic networks, increasing with the density of the modules and in the case where the modular region is large. Perturbation analysis is then used to formally prove the correct behavior of the score on two specifically constructed classes of networks.
Motivated both by the need to provide a good assignment as input to the modularity score and by the established usefulness of clustering methods, the second task to be tackled is that of finding good assignments. Continuing with the approach of Markov models, we first describe an algorithm based on a continuous random walk on the network. Next, modifications to several leading full-partition algorithms are described so that they output modules and transition regions rather than full partitions. The performance of all algorithms is then tested on a class of benchmark networks. Finally, we developed a combinatorial model of networks that are not fully partitionable as a union of split graphs, and posed the problem of finding modules and transition regions as a graph editing problem, proving its hardness and providing exact algorithms and heuristics.
The discussion of each of the two tasks is completed by an example of a biological application. Identifying good assignments gives us a way to detect protein complexes in protein interaction networks, a well-studied topic. The flexibility of assigning proteins to modules or to the transition region helps identify more complicated structures as complexes. The modularity score is used to analyze the brain networks of autistic and typically-developed children, where the distribution of modularity scores is different between the two types. In future work it will be interesting to further investigate the modularity of biological networks and draw conclusions about their organization.
PDF-Datei von FUDISS_thesis_000000098121
Falls Ihr Browser eine Datei nicht öffnen kann, die Datei zuerst herunterladen und dann öffnen.