8 resultados para Statistical factora analysis
em Dalarna University College Electronic Archive
Resumo:
In recent years, it has been observed that software clones and plagiarism are becoming an increased threat for one?s creativity. Clones are the results of copying and using other?s work. According to the Merriam – Webster dictionary, “A clone is one that appears to be a copy of an original form”. It is synonym to duplicate. Clones lead to redundancy of codes, but not all redundant code is a clone.On basis of this background knowledge ,in order to safeguard one?s idea and to avoid intentional code duplication for pretending other?s work as if their owns, software clone detection should be emphasized more. The objective of this paper is to review the methods for clone detection and to apply those methods for finding the extent of plagiarism occurrence among the Swedish Universities in Master level computer science department and to analyze the results.The rest part of the paper, discuss about software plagiarism detection which employs data analysis technique and then statistical analysis of the results.Plagiarism is an act of stealing and passing off the idea?s and words of another person?s as one?s own. Using data analysis technique, samples(Master level computer Science thesis report) were taken from various Swedish universities and processed in Ephorus anti plagiarism software detection. Ephorus gives the percentage of plagiarism for each thesis document, from this results statistical analysis were carried out using Minitab Software.The results gives a very low percentage of Plagiarism extent among the Swedish universities, which concludes that Plagiarism is not a threat to Sweden?s standard of education in computer science.This paper is based on data analysis, intelligence techniques, EPHORUS software plagiarism detection tool and MINITAB statistical software analysis.
Resumo:
The purpose of this paper is to make quantitative and qualitative analysis of foreign citizens who may participate on the Swedish labor market (in text refers to as ‘immigrants’). This research covers the period 1973-2005 and gives prediction figures of immigrant population, age and gender structure, and education attainment in 2010. To cope with data regarding immigrants from different countries, the population was divided into six groups. The main chapter is divided into two parts. The first part specifies division of immigrants into groups by country of origin according to geographical, ethnical, economical and historical criteria. Brief characteristics and geographic position, dynamic and structure description were given for each group; historical review explain rapid changes in immigrant population. Statistical models for description and estimation future population were given. The second part specifies education and qualification level of the immigrants according to international and Swedish standards. Models for estimating age and gender structure, level of education and professional orientation of immigrants in different groups are given. Inferences were made regarding ethnic, gender and education structure of immigrants; the distribution of immigrants among Swedish counties is given. Discussion part presents the results of the research, gives perspectives for the future brief evaluation of the role of immigrants on the Swedish labor market.
Resumo:
In this paper, we study the influence of the National Telecom Business Volume by the data in 2008 that have been published in China Statistical Yearbook of Statistics. We illustrate the procedure of modeling “National Telecom Business Volume” on the following eight variables, GDP, Consumption Levels, Retail Sales of Social Consumer Goods Total Renovation Investment, the Local Telephone Exchange Capacity, Mobile Telephone Exchange Capacity, Mobile Phone End Users, and the Local Telephone End Users. The testing of heteroscedasticity and multicollinearity for model evaluation is included. We also consider AIC and BIC criterion to select independent variables, and conclude the result of the factors which are the optimal regression model for the amount of telecommunications business and the relation between independent variables and dependent variable. Based on the final results, we propose several recommendations about how to improve telecommunication services and promote the economic development.
Resumo:
This study aims to investigate the important indicators that contribute to happiness among Beijing residence. The residents of Beijing were taken as the target population for the survey. A questionnaire was used as the main statistical instrument to collect the data from the residents in Beijing. In so doing the investigation employs Factor analyses and chi-square analyses as the main statistical tools used for the analyses in this research. The study found that Beijing residents gained greater happiness in the family, interpersonal relationships, and health status. The analysis also shows that generally, the residence of Beijing feels happier and also in terms of gender basis, females in Beijing feel happier as compare to their male counterpart. It will find that gender, age and education are statistically significant when dealing with happiness.
Resumo:
Quadratic assignment problems (QAPs) are commonly solved by heuristic methods, where the optimum is sought iteratively. Heuristics are known to provide good solutions but the quality of the solutions, i.e., the confidence interval of the solution is unknown. This paper uses statistical optimum estimation techniques (SOETs) to assess the quality of Genetic algorithm solutions for QAPs. We examine the functioning of different SOETs regarding biasness, coverage rate and length of interval, and then we compare the SOET lower bound with deterministic ones. The commonly used deterministic bounds are confined to only a few algorithms. We show that, the Jackknife estimators have better performance than Weibull estimators, and when the number of heuristic solutions is as large as 100, higher order JK-estimators perform better than lower order ones. Compared with the deterministic bounds, the SOET lower bound performs significantly better than most deterministic lower bounds and is comparable with the best deterministic ones.
Resumo:
Solutions to combinatorial optimization problems, such as problems of locating facilities, frequently rely on heuristics to minimize the objective function. The optimum is sought iteratively and a criterion is needed to decide when the procedure (almost) attains it. Pre-setting the number of iterations dominates in OR applications, which implies that the quality of the solution cannot be ascertained. A small, almost dormant, branch of the literature suggests using statistical principles to estimate the minimum and its bounds as a tool to decide upon stopping and evaluating the quality of the solution. In this paper we examine the functioning of statistical bounds obtained from four different estimators by using simulated annealing on p-median test problems taken from Beasley’s OR-library. We find the Weibull estimator and the 2nd order Jackknife estimator preferable and the requirement of sample size to be about 10 being much less than the current recommendation. However, reliable statistical bounds are found to depend critically on a sample of heuristic solutions of high quality and we give a simple statistic useful for checking the quality. We end the paper with an illustration on using statistical bounds in a problem of locating some 70 distribution centers of the Swedish Post in one Swedish region.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.