4 resultados para high dimensional data, call detail records (CDR), wireless telecommunication industry
em Dalarna University College Electronic Archive
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
With the rapid development of telecommunication industry, the IP multimedia Subsystem (IMS) could very well be the panacea for most telecom operators. It is originally defined as the core network for 3G mobile systems by the 3rd Generation Partnership Project (3GPP), the more recent development is merging between fixed line network and wireless networkd This report researchs the characteristic of the IMS data and proposes an IMS characterization analysis. We captured the IMS traffic data with 10 tousands users for about 41 hours. By analyzing the characteristics of the IMS, we know that the most important application in the IMS is VoIP call. Then we use the tool designed by Tsinghua University & Ericsson Company to recognize the data, and the results we got can be used to build the traffic models. From the results of the traffic models, I will get some reasons and conclusion. The traffic model gives out the types of session and types of VoIP call. I bring into a concept—busy hour. This concept is very important because it can help us to know which period is the peak of the VoIP call. The busy hour is from 10:00 to 11:00 in the morning. I also bring into another concept—connection ratio. This concept is significant because it can evaluate whether the VoIP call is good when it use IMS network. By comparing the traffic model with other one’s models, we found the different results from them, both the accuracy and the busy hour. From the contract, we got the advantages of our traffic models.
Resumo:
To have good data quality with high complexity is often seen to be important. Intuition says that the higher accuracy and complexity the data have the better the analytic solutions becomes if it is possible to handle the increasing computing time. However, for most of the practical computational problems, high complexity data means that computational times become too long or that heuristics used to solve the problem have difficulties to reach good solutions. This is even further stressed when the size of the combinatorial problem increases. Consequently, we often need a simplified data to deal with complex combinatorial problems. In this study we stress the question of how the complexity and accuracy in a network affect the quality of the heuristic solutions for different sizes of the combinatorial problem. We evaluate this question by applying the commonly used p-median model, which is used to find optimal locations in a network of p supply points that serve n demand points. To evaluate this, we vary both the accuracy (the number of nodes) of the network and the size of the combinatorial problem (p). The investigation is conducted by the means of a case study in a region in Sweden with an asymmetrically distributed population (15,000 weighted demand points), Dalecarlia. To locate 5 to 50 supply points we use the national transport administrations official road network (NVDB). The road network consists of 1.5 million nodes. To find the optimal location we start with 500 candidate nodes in the network and increase the number of candidate nodes in steps up to 67,000 (which is aggregated from the 1.5 million nodes). To find the optimal solution we use a simulated annealing algorithm with adaptive tuning of the temperature. The results show that there is a limited improvement in the optimal solutions when the accuracy in the road network increase and the combinatorial problem (low p) is simple. When the combinatorial problem is complex (large p) the improvements of increasing the accuracy in the road network are much larger. The results also show that choice of the best accuracy of the network depends on the complexity of the combinatorial (varying p) problem.
Resumo:
Background: The Swedish Maternal Health Care Register (MHCR) is a national quality register that has been collecting pregnancy, delivery, and postpartum data since 1999. A substantial revision of the MHCR resulted in a Web-based version of the register in 2010. Although MHCR provides data for health care services and research, the validity of the MHCR data has not been evaluated. This study investigated degree of coverage and internal validity of specific variables in the MHCR and identified possible systematic errors. Methods: This cross-sectional observational study compared pregnancy and delivery data in medical records with corresponding data in the MHCR. The medical record was considered the gold standard. The medical records from nine Swedish hospitals were selected for data extraction. This study compared data from 878 women registered in both medical records and in the MHCR. To evaluate the quality of the initial data extraction, a second data extraction of 150 medical records was performed. Statistical analyses were performed for degree of coverage, agreement and correlation of data, and sensitivity and specificity. Results: Degree of coverage of specified variables in the MHCR varied from 90.0% to 100%. Identical information in both medical records and the MHCR ranged from 71.4% to 99.7%. For more than half of the investigated variables, 95% or more of the information was identical. Sensitivity and specificity were analysed for binary variables. Probable systematic errors were identified for two variables. Conclusions: When comparing data from medical records and data registered in the MHCR, most variables in the MHCR demonstrated good to very good degree of coverage, agreement, and internal validity. Hence, data from the MHCR may be regarded as reliable for research as well as for evaluating, planning, and decision-making with respect to Swedish maternal health care services.