994 resultados para statistical software
Resumo:
For various reasons, it is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, etc. with the documents that describe and rely on them. This integration allows readers to both verify and adapt the statements in the documents. Authors can easily reproduce them in the future, and they can present the document's contents in a different medium, e.g. with interactive controls. This paper describes a software framework for authoring and distributing these integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The documents are dynamic in that the contents, including figures, tables, etc., can be recalculated each time a view of the document is generated. Our model treats a dynamic document as a master or ``source'' document from which one can generate different views in the form of traditional, derived documents for different audiences. We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, ...), and as a means for distributing, managing and updating the collection. The step from disseminating analyses via a compendium to reproducible research is a small one. By reproducible research, we mean research papers with accompanying software tools that allow the reader to directly reproduce the results and employ the methods that are presented in the research paper. Some of the issues involved in paradigms for the production, distribution and use of such reproducible research are discussed.
Resumo:
Genetic anticipation is defined as a decrease in age of onset or increase in severity as the disorder is transmitted through subsequent generations. Anticipation has been noted in the literature for over a century. Recently, anticipation in several diseases including Huntington's Disease, Myotonic Dystrophy and Fragile X Syndrome were shown to be caused by expansion of triplet repeats. Anticipation effects have also been observed in numerous mental disorders (e.g. Schizophrenia, Bipolar Disorder), cancers (Li-Fraumeni Syndrome, Leukemia) and other complex diseases. ^ Several statistical methods have been applied to determine whether anticipation is a true phenomenon in a particular disorder, including standard statistical tests and newly developed affected parent/affected child pair methods. These methods have been shown to be inappropriate for assessing anticipation for a variety of reasons, including familial correlation and low power. Therefore, we have developed family-based likelihood modeling approaches to model the underlying transmission of the disease gene and penetrance function and hence detect anticipation. These methods can be applied in extended families, thus improving the power to detect anticipation compared with existing methods based only upon parents and children. The first method we have proposed is based on the regressive logistic hazard model. This approach models anticipation by a generational covariate. The second method allows alleles to mutate as they are transmitted from parents to offspring and is appropriate for modeling the known triplet repeat diseases in which the disease alleles can become more deleterious as they are transmitted across generations. ^ To evaluate the new methods, we performed extensive simulation studies for data simulated under different conditions to evaluate the effectiveness of the algorithms to detect genetic anticipation. Results from analysis by the first method yielded empirical power greater than 87% based on the 5% type I error critical value identified in each simulation depending on the method of data generation and current age criteria. Analysis by the second method was not possible due to the current formulation of the software. The application of this method to Huntington's Disease and Li-Fraumeni Syndrome data sets revealed evidence for a generation effect in both cases. ^
Resumo:
In the last decade, a large number of software repositories have been created for different purposes. In this paper we present a survey of the publicly available repositories and classify the most common ones as well as discussing the problems faced by researchers when applying machine learning or statistical techniques to them.
Resumo:
The statistical distributions of different software properties have been thoroughly studied in the past, including software size, complexity and the number of defects. In the case of object-oriented systems, these distributions have been found to obey a power law, a common statistical distribution also found in many other fields. However, we have found that for some statistical properties, the behavior does not entirely follow a power law, but a mixture between a lognormal and a power law distribution. Our study is based on the Qualitas Corpus, a large compendium of diverse Java-based software projects. We have measured the Chidamber and Kemerer metrics suite for every file of every Java project in the corpus. Our results show that the range of high values for the different metrics follows a power law distribution, whereas the rest of the range follows a lognormal distribution. This is a pattern typical of so-called double Pareto distributions, also found in empirical studies for other software properties.
Resumo:
All meta-analyses should include a heterogeneity analysis. Even so, it is not easy to decide whether a set of studies are homogeneous or heterogeneous because of the low statistical power of the statistics used (usually the Q test). Objective: Determine a set of rules enabling SE researchers to find out, based on the characteristics of the experiments to be aggregated, whether or not it is feasible to accurately detect heterogeneity. Method: Evaluate the statistical power of heterogeneity detection methods using a Monte Carlo simulation process. Results: The Q test is not powerful when the meta-analysis contains up to a total of about 200 experimental subjects and the effect size difference is less than 1. Conclusions: The Q test cannot be used as a decision-making criterion for meta-analysis in small sample settings like SE. Random effects models should be used instead of fixed effects models. Caution should be exercised when applying Q test-mediated decomposition into subgroups.
Resumo:
This article shows software that allows determining the statistical behavior of qualitative data originating surveys previously transformed with a Likert’s scale to quantitative data. The main intention is offer to users a useful tool to know statistics' characteristics and forecasts of financial risks in a fast and simple way. Additionally,this paper presents the definition of operational risk. On the other hand, the article explains different techniques to do surveys with a Likert’s scale (Avila, 2008) to know expert’s opinion with the transformation of qualitative data to quantitative data. In addition, this paper will show how is very easy to distinguish an expert’s opinion related to risk, but when users have a lot of surveys and matrices is very difficult to obtain results because is necessary to compare common data. On the other hand, statistical value representative must be extracted from common data to get weight of each risk. In the end, this article exposes the development of “Qualitative Operational Risk Software” or QORS by its acronym, which has been designed to determine the root of risks in organizations and its value at operational risk OpVaR (Jorion, 2008; Chernobai et al, 2008) when input data comes from expert’s opinion and their associated matrices.
Resumo:
En las últimas dos décadas, se ha puesto de relieve la importancia de los procesos de adquisición y difusión del conocimiento dentro de las empresas, y por consiguiente el estudio de estos procesos y la implementación de tecnologías que los faciliten ha sido un tema que ha despertado un creciente interés en la comunidad científica. Con el fin de facilitar y optimizar la adquisición y la difusión del conocimiento, las organizaciones jerárquicas han evolucionado hacia una configuración más plana, con estructuras en red que resulten más ágiles, disminuyendo la dependencia de una autoridad centralizada, y constituyendo organizaciones orientadas a trabajar en equipo. Al mismo tiempo, se ha producido un rápido desarrollo de las herramientas de colaboración Web 2.0, tales como blogs y wikis. Estas herramientas de colaboración se caracterizan por una importante componente social, y pueden alcanzar todo su potencial cuando se despliegan en las estructuras organizacionales planas. La Web 2.0 aparece como un concepto enfrentado al conjunto de tecnologías que existían a finales de los 90s basadas en sitios web, y se basa en la participación de los propios usuarios. Empresas del Fortune 500 –HP, IBM, Xerox, Cisco– las adoptan de inmediato, aunque no hay unanimidad sobre su utilidad real ni sobre cómo medirla. Esto se debe en parte a que no se entienden bien los factores que llevan a los empleados a adoptarlas, lo que ha llevado a fracasos en la implantación debido a la existencia de algunas barreras. Dada esta situación, y ante las ventajas teóricas que tienen estas herramientas de colaboración Web 2.0 para las empresas, los directivos de éstas y la comunidad científica muestran un interés creciente en conocer la respuesta a la pregunta: ¿cuáles son los factores que contribuyen a que los empleados de las empresas adopten estas herramientas Web 2.0 para colaborar? La respuesta a esta pregunta es compleja ya que se trata de herramientas relativamente nuevas en el contexto empresarial mediante las cuales se puede llevar a cabo la gestión del conocimiento en lugar del manejo de la información. El planteamiento que se ha llevado a cabo en este trabajo para dar respuesta a esta pregunta es la aplicación de los modelos de adopción tecnológica, que se basan en las percepciones de los individuos sobre diferentes aspectos relacionados con el uso de la tecnología. Bajo este enfoque, este trabajo tiene como objetivo principal el estudio de los factores que influyen en la adopción de blogs y wikis en empresas, mediante un modelo predictivo, teórico y unificado, de adopción tecnológica, con un planteamiento holístico a partir de la literatura de los modelos de adopción tecnológica y de las particularidades que presentan las herramientas bajo estudio y en el contexto especifico. Este modelo teórico permitirá determinar aquellos factores que predicen la intención de uso de las herramientas y el uso real de las mismas. El trabajo de investigación científica se estructura en cinco partes: introducción al tema de investigación, desarrollo del marco teórico, diseño del trabajo de investigación, análisis empírico, y elaboración de conclusiones. Desde el punto de vista de la estructura de la memoria de la tesis, las cinco partes mencionadas se desarrollan de forma secuencial a lo largo de siete capítulos, correspondiendo la primera parte al capítulo 1, la segunda a los capítulos 2 y 3, la tercera parte a los capítulos 4 y 5, la cuarta parte al capítulo 6, y la quinta y última parte al capítulo 7. El contenido del capítulo 1 se centra en el planteamiento del problema de investigación así como en los objetivos, principal y secundarios, que se pretenden cumplir a lo largo del trabajo. Así mismo, se expondrá el concepto de colaboración y su encaje con las herramientas colaborativas Web 2.0 que se plantean en la investigación y una introducción a los modelos de adopción tecnológica. A continuación se expone la justificación de la investigación, los objetivos de la misma y el plan de trabajo para su elaboración. Una vez introducido el tema de investigación, en el capítulo 2 se lleva a cabo una revisión de la evolución de los principales modelos de adopción tecnológica existentes (IDT, TRA, SCT, TPB, DTPB, C-TAM-TPB, UTAUT, UTAUT2), dando cuenta de sus fundamentos y factores empleados. Sobre la base de los modelos de adopción tecnológica expuestos en el capítulo 2, en el capítulo 3 se estudian los factores que se han expuesto en el capítulo 2 pero adaptados al contexto de las herramientas colaborativas Web 2.0. Con el fin de facilitar la comprensión del modelo final, los factores se agrupan en cuatro tipos: tecnológicos, de control, socio-normativos y otros específicos de las herramientas colaborativas. En el capítulo 4 se lleva a cabo la relación de los factores que son más apropiados para estudiar la adopción de las herramientas colaborativas y se define un modelo que especifica las relaciones entre los diferentes factores. Estas relaciones finalmente se convertirán en hipótesis de trabajo, y que habrá que contrastar mediante el estudio empírico. A lo largo del capítulo 5 se especifican las características del trabajo empírico que se lleva a cabo para contrastar las hipótesis que se habían enunciado en el capítulo 4. La naturaleza de la investigación es de carácter social, de tipo exploratorio, y se basa en un estudio empírico cuantitativo cuyo análisis se llevará a cabo mediante técnicas de análisis multivariante. En este capítulo se describe la construcción de las escalas del instrumento de medida, la metodología de recogida de datos, y posteriormente se presenta un análisis detallado de la población muestral, así como la comprobación de la existencia o no del sesgo atribuible al método de medida, lo que se denomina sesgo de método común (en inglés, Common Method Bias). El contenido del capítulo 6 corresponde al análisis de resultados, aunque previamente se expone la técnica estadística empleada, PLS-SEM, como herramienta de análisis multivariante con capacidad de análisis predictivo, así como la metodología empleada para validar el modelo de medida y el modelo estructural, los requisitos que debe cumplir la muestra, y los umbrales de los parámetros considerados. En la segunda parte del capítulo 6 se lleva a cabo el análisis empírico de los datos correspondientes a las dos muestras, una para blogs y otra para wikis, con el fin de validar las hipótesis de investigación planteadas en el capítulo 4. Finalmente, en el capítulo 7 se revisa el grado de cumplimiento de los objetivos planteados en el capítulo 1 y se presentan las contribuciones teóricas, metodológicas y prácticas derivadas del trabajo realizado. A continuación se exponen las conclusiones generales y detalladas por cada grupo de factores, así como las recomendaciones prácticas que se pueden extraer para orientar la implantación de estas herramientas en situaciones reales. Como parte final del capítulo se incluyen las limitaciones del estudio y se sugiere una serie de posibles líneas de trabajo futuras de interés, junto con los resultados de investigación parciales que se han obtenido durante el tiempo que ha durado la investigación. ABSTRACT In the last two decades, the relevance of knowledge acquisition and dissemination processes has been highlighted and consequently, the study of these processes and the implementation of the technologies that make them possible has generated growing interest in the scientific community. In order to ease and optimize knowledge acquisition and dissemination, hierarchical organizations have evolved to a more horizontal configuration with more agile net structures, decreasing the dependence of a centralized authority, and building team-working oriented organizations. At the same time, Web 2.0 collaboration tools such as blogs and wikis have quickly developed. These collaboration tools are characterized by a strong social component and can reach their full potential when they are deployed in horizontal organization structures. Web 2.0, based on user participation, arises as a concept to challenge the existing technologies of the 90’s which were based on websites. Fortune 500 companies – HP, IBM, Xerox, Cisco- adopted the concept immediately even though there was no unanimity about its real usefulness or how it could be measured. This is partly due to the fact that the factors that make the drivers for employees to adopt these tools are not properly understood, consequently leading to implementation failure due to the existence of certain barriers. Given this situation, and faced with theoretical advantages that these Web 2.0 collaboration tools seem to have for companies, managers and the scientific community are showing an increasing interest in answering the following question: Which factors contribute to the decision of the employees of a company to adopt the Web 2.0 tools for collaborative purposes? The answer is complex since these tools are relatively new in business environments. These tools allow us to move from an information Management approach to Knowledge Management. In order to answer this question, the chosen approach involves the application of technology adoption models, all of them based on the individual’s perception of the different aspects related to technology usage. From this perspective, this thesis’ main objective is to study the factors influencing the adoption of blogs and wikis in a company. This is done by using a unified and theoretical predictive model of technological adoption with a holistic approach that is based on literature of technological adoption models and the particularities that these tools presented under study and in a specific context. This theoretical model will allow us to determine the factors that predict the intended use of these tools and their real usage. The scientific research is structured in five parts: Introduction to the research subject, development of the theoretical framework, research work design, empirical analysis and drawing the final conclusions. This thesis develops the five aforementioned parts sequentially thorough seven chapters; part one (chapter one), part two (chapters two and three), part three (chapters four and five), parte four (chapters six) and finally part five (chapter seven). The first chapter is focused on the research problem statement and the objectives of the thesis, intended to be reached during the project. Likewise, the concept of collaboration and its link with the Web 2.0 collaborative tools is discussed as well as an introduction to the technology adoption models. Finally we explain the planning to carry out the research and get the proposed results. After introducing the research topic, the second chapter carries out a review of the evolution of the main existing technology adoption models (IDT, TRA, SCT, TPB, DTPB, C-TAM-TPB, UTAUT, UTAUT2), highlighting its foundations and factors used. Based on technology adoption models set out in chapter 2, the third chapter deals with the factors which have been discussed previously in chapter 2, but adapted to the context of Web 2.0 collaborative tools under study, blogs and wikis. In order to better understand the final model, the factors are grouped into four types: technological factors, control factors, social-normative factors and other specific factors related to the collaborative tools. The first part of chapter 4 covers the analysis of the factors which are more relevant to study the adoption of collaborative tools, and the second part proceeds with the theoretical model which specifies the relationship between the different factors taken into consideration. These relationships will become specific hypotheses that will be tested by the empirical study. Throughout chapter 5 we cover the characteristics of the empirical study used to test the research hypotheses which were set out in chapter 4. The nature of research is social, exploratory, and it is based on a quantitative empirical study whose analysis is carried out using multivariate analysis techniques. The second part of this chapter includes the description of the scales of the measuring instrument; the methodology for data gathering, the detailed analysis of the sample, and finally the existence of bias attributable to the measurement method, the "Bias Common Method" is checked. The first part of chapter 6 corresponds to the analysis of results. The statistical technique employed (PLS-SEM) is previously explained as a tool of multivariate analysis, capable of carrying out predictive analysis, and as the appropriate methodology used to validate the model in a two-stages analysis, the measurement model and the structural model. Futhermore, it is necessary to check the requirements to be met by the sample and the thresholds of the parameters taken into account. In the second part of chapter 6 an empirical analysis of the data is performed for the two samples, one for blogs and the other for wikis, in order to validate the research hypothesis proposed in chapter 4. Finally, in chapter 7 the fulfillment level of the objectives raised in chapter 1 is reviewed and the theoretical, methodological and practical conclusions derived from the results of the study are presented. Next, we cover the general conclusions, detailing for each group of factors including practical recommendations that can be drawn to guide implementation of these tools in real situations in companies. As a final part of the chapter the limitations of the study are included and a number of potential future researches suggested, along with research partial results which have been obtained thorough the research.
Resumo:
El software se ha convertido en el eje central del mundo actual, una compleja creación humana que influye en la vida, negocios y comunicación de todas las personas pertenecientes a la Sociedad de la Información. El rápido crecimiento experimentado en el ámbito del desarrollo software ha permitido la creación de avanzadas estructuras tecnológicas, denominadas “Sistemas Intensivos Software”, capaces de comunicarse con otros sistemas, dispositivos, sensores y personas. A lo largo de los próximos años los sistemas se enfrentarán a una mayor complejidad, surgida de la necesidad de operar en entornos de grandes dimensiones y de comportamientos no deterministas. Los métodos y herramientas actuales no son lo suficientemente potentes para diseñar, construir,implementar y mantener sistemas intensivos software con estas características, y detener la construcción de sistemas intensivos software o construir sistemas poco flexibles o fiables no es una alternativa real. En el desarrollo de “Sistemas Intensivos Software” pueden llegar a intervenir distintas entidades o compañías software que suelen estar en ubicaciones geográficas distintas y constituidas por grandes equipos de desarrollo, multidisciplinares e incluso multilingües. Debido a la criticidad del resultado de las actividades realizadas de forma independiente en el sistema resultante, éstas se han de controlar y monitorizar para asegurar la correcta integración de todos los elementos del sistema completo. El objetivo de este proyecto es la creación de una herramienta software para dar soporte a la gestión y monitorización de la construcción e integración de sistemas intensivos software, siendo extensible también a proyectos de otra índole. La herramienta resultante se denomina Positioning System, una aplicación web del tipo SPA (Single Page Application) creada con tecnología de última generación como el framework JavaScript AngularJS y tecnología de back-end como SlimPHP. Positioning System provee la funcionalidad necesaria para la creación de proyectos, familias y subfamilias de productos que constituyen los productos software de los proyectos creados, así como la gestión de socios comerciales y gestión de contactos de dichos proyectos. Todas estas funcionalidades son fácilmente monitorizadas y controladas por gráficos estadísticos generados para cada proyecto. ABSTRACT Software has become the backbone of today’s world, a complex human creation that has an important impact in the life, business and communication of all people involved with the Information Society. The quick growth that software development has undergone for last years has enabled the creation of advanced technological structures called “Software Intensive Systems”. They are able to communicate with other systems, devices, sensors and people. Next years, systems will face more complexity. It arises from the need of operating systems of large dimensions with non-deterministic behaviors. Current methods and tools are not powerful enough to design, build, implement and maintain software intensive systems; however stopping the development or developing unreliable and non-flexible systems is not a real alternative. Software Intensive Systems” development may involve different entities or software companies which may be in different geographical locations and may be constituted by large, multidisciplinary and even multilingual development teams. Due to the criticality of the result of each conducted activity, independently in the resulting system, these activities must be controlled and monitored to ensure the proper integration of all the elements within the complete system. The goal of this project is the creation of a software tool to support the management and monitoring of the construction and integration of software intensive systems, being possible to be extended to other kind of projects. The resultant tool is called Positioning System, a web application that follows the SPA (Single Page Application) style. It was created with the latest technologies, such as, the AngularJS framework and SlimPHP. The Positioning System provides the necessary features for the creation of projects, families and subfamilies of products that constitute the software products of the created projects, as well as the management of business partners and contacts of these projects. All these features are easily monitored and controlled by statistical graphs generated for each project.
Resumo:
Trabalho de Projeto apresentado à Escola Superior de Tecnologia do Instituto Politécnico de Castelo Branco para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Desenvolvimento de Software e Sistemas Interactivos, realizada sob a orientação científica do Professor Doutor José Carlos Metrôlho, do Instituto Politécnico de Castelo Branco.
Resumo:
An important aspect in manufacturing design is the distribution of geometrical tolerances so that an assembly functions with given probability, while minimising the manufacturing cost. This requires a complex search over a multidimensional domain, much of which leads to infeasible solutions and which can have many local minima. As well, Monte-Carlo methods are often required to determine the probability that the assembly functions as designed. This paper describes a genetic algorithm for carrying out this search and successfully applies it to two specific mechanical designs, enabling comparisons of a new statistical tolerancing design method with existing methods. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
In empirical studies of Evolutionary Algorithms, it is usually desirable to evaluate and compare algorithms using as many different parameter settings and test problems as possible, in border to have a clear and detailed picture of their performance. Unfortunately, the total number of experiments required may be very large, which often makes such research work computationally prohibitive. In this paper, the application of a statistical method called racing is proposed as a general-purpose tool to reduce the computational requirements of large-scale experimental studies in evolutionary algorithms. Experimental results are presented that show that racing typically requires only a small fraction of the cost of an exhaustive experimental study.
Resumo:
This book is aimed primarily at microbiologists who are undertaking research and who require a basic knowledge of statistics to analyse their experimental data. Computer software employing a wide range of data analysis methods is widely available to experimental scientists. The availability of this software, however, makes it essential that investigators understand the basic principles of statistics. Statistical analysis of data can be complex with many different methods of approach, each of which applies in a particular experimental circumstance. Hence, it is possible to apply an incorrect statistical method to data and to draw the wrong conclusions from an experiment. The purpose of this book, which has its origin in a series of articles published in the Society for Applied Microbiology journal ‘The Microbiologist’, is an attempt to present the basic logic of statistics as clearly as possible and therefore, to dispel some of the myths that often surround the subject. The 28 ‘Statnotes’ deal with various topics that are likely to be encountered, including the nature of variables, the comparison of means of two or more groups, non-parametric statistics, analysis of variance, correlating variables, and more complex methods such as multiple linear regression and principal components analysis. In each case, the relevant statistical method is illustrated with examples drawn from experiments in microbiological research. The text incorporates a glossary of the most commonly used statistical terms and there are two appendices designed to aid the investigator in the selection of the most appropriate test.
Resumo:
DEA literature continues apace but software has lagged behind. This session uses suitably selected data to present newly developed software which includes many of the most recent DEA models. The software enables the user to address a variety of issues not frequently found in existing DEA software such as: -Assessments under a variety of possible assumptions of returns to scale including NIRS and NDRS; -Scale elasticity computations; -Numerous Input/Output variables and truly unlimited number of assessment units (DMUs) -Panel data analysis -Analysis of categorical data (multiple categories) -Malmquist Index and its decompositions -Computations of Supper efficiency -Automated removal of super-efficient outliers under user-specified criteria; -Graphical presentation of results -Integrated statistical tests
Resumo:
The accurate identification of T-cell epitopes remains a principal goal of bioinformatics within immunology. As the immunogenicity of peptide epitopes is dependent on their binding to major histocompatibility complex (MHC) molecules, the prediction of binding affinity is a prerequisite to the reliable prediction of epitopes. The iterative self-consistent (ISC) partial-least-squares (PLS)-based additive method is a recently developed bioinformatic approach for predicting class II peptide−MHC binding affinity. The ISC−PLS method overcomes many of the conceptual difficulties inherent in the prediction of class II peptide−MHC affinity, such as the binding of a mixed population of peptide lengths due to the open-ended class II binding site. The method has applications in both the accurate prediction of class II epitopes and the manipulation of affinity for heteroclitic and competitor peptides. The method is applied here to six class II mouse alleles (I-Ab, I-Ad, I-Ak, I-As, I-Ed, and I-Ek) and included peptides up to 25 amino acids in length. A series of regression equations highlighting the quantitative contributions of individual amino acids at each peptide position was established. The initial model for each allele exhibited only moderate predictivity. Once the set of selected peptide subsequences had converged, the final models exhibited a satisfactory predictive power. Convergence was reached between the 4th and 17th iterations, and the leave-one-out cross-validation statistical terms - q2, SEP, and NC - ranged between 0.732 and 0.925, 0.418 and 0.816, and 1 and 6, respectively. The non-cross-validated statistical terms r2 and SEE ranged between 0.98 and 0.995 and 0.089 and 0.180, respectively. The peptides used in this study are available from the AntiJen database (http://www.jenner.ac.uk/AntiJen). The PLS method is available commercially in the SYBYL molecular modeling software package. The resulting models, which can be used for accurate T-cell epitope prediction, will be made freely available online (http://www.jenner.ac.uk/MHCPred).