31 resultados para distributed algorithms
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
The metabolism of an organism consists of a network of biochemical reactions that transform small molecules, or metabolites, into others in order to produce energy and building blocks for essential macromolecules. The goal of metabolic flux analysis is to uncover the rates, or the fluxes, of those biochemical reactions. In a steady state, the sum of the fluxes that produce an internal metabolite is equal to the sum of the fluxes that consume the same molecule. Thus the steady state imposes linear balance constraints to the fluxes. In general, the balance constraints imposed by the steady state are not sufficient to uncover all the fluxes of a metabolic network. The fluxes through cycles and alternative pathways between the same source and target metabolites remain unknown. More information about the fluxes can be obtained from isotopic labelling experiments, where a cell population is fed with labelled nutrients, such as glucose that contains 13C atoms. Labels are then transferred by biochemical reactions to other metabolites. The relative abundances of different labelling patterns in internal metabolites depend on the fluxes of pathways producing them. Thus, the relative abundances of different labelling patterns contain information about the fluxes that cannot be uncovered from the balance constraints derived from the steady state. The field of research that estimates the fluxes utilizing the measured constraints to the relative abundances of different labelling patterns induced by 13C labelled nutrients is called 13C metabolic flux analysis. There exist two approaches of 13C metabolic flux analysis. In the optimization approach, a non-linear optimization task, where candidate fluxes are iteratively generated until they fit to the measured abundances of different labelling patterns, is constructed. In the direct approach, linear balance constraints given by the steady state are augmented with linear constraints derived from the abundances of different labelling patterns of metabolites. Thus, mathematically involved non-linear optimization methods that can get stuck to the local optima can be avoided. On the other hand, the direct approach may require more measurement data than the optimization approach to obtain the same flux information. Furthermore, the optimization framework can easily be applied regardless of the labelling measurement technology and with all network topologies. In this thesis we present a formal computational framework for direct 13C metabolic flux analysis. The aim of our study is to construct as many linear constraints to the fluxes from the 13C labelling measurements using only computational methods that avoid non-linear techniques and are independent from the type of measurement data, the labelling of external nutrients and the topology of the metabolic network. The presented framework is the first representative of the direct approach for 13C metabolic flux analysis that is free from restricting assumptions made about these parameters.In our framework, measurement data is first propagated from the measured metabolites to other metabolites. The propagation is facilitated by the flow analysis of metabolite fragments in the network. Then new linear constraints to the fluxes are derived from the propagated data by applying the techniques of linear algebra.Based on the results of the fragment flow analysis, we also present an experiment planning method that selects sets of metabolites whose relative abundances of different labelling patterns are most useful for 13C metabolic flux analysis. Furthermore, we give computational tools to process raw 13C labelling data produced by tandem mass spectrometry to a form suitable for 13C metabolic flux analysis.
Resumo:
The conferencing systems in IP Multimedia (IM) networks are going through restructuring, accomplished in the near future. One of the changes introduced is the concept of floors and floor control in its current form with matching entity roles. The Binary Floor Control Protocol (BFCP) is a novelty to be exploited in distributed tightly coupled conferencing services. The protocol defines the floor control server (FCS), which implements floor control giving access to shared resources. As the newest tendency is to distribute the conferencing services, the locations of different functionality units play an important role in developing the standards. The floor control server location is not yet single-mindedly fixed in different standardization bodies, and the debate goes on where to place it within the media server, providing the conferencing service. The thesis main objective is to evaluate two distinctive alternatives in respect the Mp interface protocol between the respective nodes, as the interface in relation to floor control is under standardization work at the moment. The thesis gives a straightforward preamble in IMS network, nodes of interest including floor control server and conferencing. Knowledge on several protocols – BFCP, SDP, SIP and H.248 provides an important background for understanding the functionality changes introduced in the Mp interface and therefore introductions on those protocols and how they are connected to the full picture is given. The actual analysis on the impact of the floor control server into the Mp reference point is concluded in relation to the locations, giving basic flows, requirements analysis including a limited implementation proposal on supporting protocol parameters. The overall conclusion of the thesis is that even if both choices are seemingly useful, not one of the locations is clearly the most suitable in the light of this work. The thesis suggests a solution having both possibilities available to be chosen from in separate circumstances, realized with consistent standardization. It is evident, that if the preliminary assumption for the analysis is kept regarding to only one right place for the floor control server, more work is to be done in connected areas to discover the one most appropriate location.
Resumo:
Diffuse large B-cell lymphoma (DLBCL) is the most common of the non-Hodgkin lymphomas. As DLBCL is characterized by heterogeneous clinical and biological features, its prognosis varies. To date, the International Prognostic Index has been the strongest predictor of outcome for DLBCL patients. However, no biological characters of the disease are taken into account. Gene expression profiling studies have identified two major cell-of-origin phenotypes in DLBCL with different prognoses, the favourable germinal centre B-cell-like (GCB) and the unfavourable activated B-cell-like (ABC) phenotypes. However, results of the prognostic impact of the immunohistochemically defined GCB and non-GCB distinction are controversial. Furthermore, since the addition of the CD20 antibody rituximab to chemotherapy has been established as the standard treatment of DLBCL, all molecular markers need to be evaluated in the post-rituximab era. In this study, we aimed to evaluate the predictive value of immunohistochemically defined cell-of-origin classification in DLBCL patients. The GCB and non-GCB phenotypes were defined according to the Hans algorithm (CD10, BCL6 and MUM1/IRF4) among 90 immunochemotherapy- and 104 chemotherapy-treated DLBCL patients. In the chemotherapy group, we observed a significant difference in survival between GCB and non-GCB patients, with a good and a poor prognosis, respectively. However, in the rituximab group, no prognostic value of the GCB phenotype was observed. Likewise, among 29 high-risk de novo DLBCL patients receiving high-dose chemotherapy and autologous stem cell transplantation, the survival of non-GCB patients was improved, but no difference in outcome was seen between GCB and non-GCB subgroups. Since the results suggested that the Hans algorithm was not applicable in immunochemotherapy-treated DLBCL patients, we aimed to further focus on algorithms based on ABC markers. We examined the modified activated B-cell-like algorithm based (MUM1/IRF4 and FOXP1), as well as a previously reported Muris algorithm (BCL2, CD10 and MUM1/IRF4) among 88 DLBCL patients uniformly treated with immunochemotherapy. Both algorithms distinguished the unfavourable ABC-like subgroup with a significantly inferior failure-free survival relative to the GCB-like DLBCL patients. Similarly, the results of the individual predictive molecular markers transcription factor FOXP1 and anti-apoptotic protein BCL2 have been inconsistent and should be assessed in immunochemotherapy-treated DLBCL patients. The markers were evaluated in a cohort of 117 patients treated with rituximab and chemotherapy. FOXP1 expression could not distinguish between patients, with favourable and those with poor outcomes. In contrast, BCL2-negative DLBCL patients had significantly superior survival relative to BCL2-positive patients. Our results indicate that the immunohistochemically defined cell-of-origin classification in DLBCL has a prognostic impact in the immunochemotherapy era, when the identifying algorithms are based on ABC-associated markers. We also propose that BCL2 negativity is predictive of a favourable outcome. Further investigational efforts are, however, warranted to identify the molecular features of DLBCL that could enable individualized cancer therapy in routine patient care.
Resumo:
A key trait of Free and Open Source Software (FOSS) development is its distributed nature. Nevertheless, two project-level operations, the fork and the merge of program code, are among the least well understood events in the lifespan of a FOSS project. Some projects have explicitly adopted these operations as the primary means of concurrent development. In this study, we examine the effect of highly distributed software development, is found in the Linux kernel project, on collection and modelling of software development data. We find that distributed development calls for sophisticated temporal modelling techniques where several versions of the source code tree can exist at once. Attention must be turned towards the methods of quality assurance and peer review that projects employ to manage these parallel source trees. Our analysis indicates that two new metrics, fork rate and merge rate, could be useful for determining the role of distributed version control systems in FOSS projects. The study presents a preliminary data set consisting of version control and mailing list data.
Resumo:
A key trait of Free and Open Source Software (FOSS) development is its distributed nature. Nevertheless, two project-level operations, the fork and the merge of program code, are among the least well understood events in the lifespan of a FOSS project. Some projects have explicitly adopted these operations as the primary means of concurrent development. In this study, we examine the effect of highly distributed software development, is found in the Linux kernel project, on collection and modelling of software development data. We find that distributed development calls for sophisticated temporal modelling techniques where several versions of the source code tree can exist at once. Attention must be turned towards the methods of quality assurance and peer review that projects employ to manage these parallel source trees. Our analysis indicates that two new metrics, fork rate and merge rate, could be useful for determining the role of distributed version control systems in FOSS projects. The study presents a preliminary data set consisting of version control and mailing list data.
Resumo:
In a max-min LP, the objective is to maximise ω subject to Ax ≤ 1, Cx ≥ ω1, and x ≥ 0 for nonnegative matrices A and C. We present a local algorithm (constant-time distributed algorithm) for approximating max-min LPs. The approximation ratio of our algorithm is the best possible for any local algorithm; there is a matching unconditional lower bound.