974 resultados para Data Interpretation, Statistical
Resumo:
This paper is dedicated to modelling of network maintaining based on live example maintaining ATM banking network, where any problems are mean money loss. A full analysis is made in order to estimate valuable and not-valuable parameters based on complex analysis of available data. Correlation analysis helps to estimate provided data and to produce a complex solution of increasing network maintaining effectiveness.
Resumo:
2000 Mathematics Subject Classification: 62P10, 92C40
Resumo:
The representation of serial position in sequences is an important topic in a variety of cognitive areas including the domains of language, memory, and motor control. In the neuropsychological literature, serial position data have often been normalized across different lengths, and an improved procedure for this has recently been reported by Machtynger and Shallice (2009). Effects of length and a U-shaped normalized serial position curve have been criteria for identifying working memory deficits. We present simulations and analyses to illustrate some of the issues that arise when relating serial position data to specific theories. We show that critical distinctions are often difficult to make based on normalized data. We suggest that curves for different lengths are best presented in their raw form and that binomial regression can be used to answer specific questions about the effects of length, position, and linear or nonlinear shape that are critical to making theoretical distinctions. 2010 Psychology Press.
Resumo:
A Vilgbank az 1990-es vek vgn egy nagyszabs, nemzetkzi teljestmny-rtkelsi programot indtott a vz- s szennyvz-szolgltat vllalatok krben. Az International Benchmarking Network for Water and Sanitation Utilities (IBNET) elnevezs kezdemnyezs keretben a szolgltatk egy szabvnyostott krdven informcit adnak meg mkdskrl. Az egyedi, cgszint adatokbl egy adatbzis kszl, mely lehetv teszi a vllalatok mkdsnek sszehasonlt elemzst, amit teljestmny-rtkelsnek (benchmarking) is szoks nevezni. A programrl s eddigi eredmnyeirl angol nyelv informci a www.ib-net.org honlapon tallhat. A felmrst eddig tbb, mint 70 orszgban vgeztk el, kztk Magyarorszgon is. Itthon a REKK kapott megbzst a feladat vgrehajtsra. Az adatgyjtsen tl az adatbzisra alapozva Kzp s Kelet-Eurpa orszgainak vzikzm cgeirl statisztikai elemzst vgeztnk az alap adottsgok s a kltsgek sszefggsnek feltrsra.
Resumo:
A tanulmny arra a feltevsre pl, hogy minl ersebb a bizalomra mltsg szintje egy adott zleti kapcsolatban, annl inkbb igaz, hogy nagy kockzat tevkenysgek mennek vgbe benne. Ilyen esetekben a bizalomra mltsg a kapcsolatban zajl esemnyek, cselekvsek irnytsi eszkzv vlik, s az zleti kapcsolatban megjelenik a cselekvsi hajlandsgknt rtelmezett bizalom. A tanulmny felhvja a figyelmet a bizalom s a bizalomra mltsg fogalmai kztti klnbsgre, szisztematikus klnvlasztsuk fontossgra. Bemutatja az gynevezett diadikus adatelemzs gazdlkodstudomnyi alkalmazst. Empirikus eredmnyei is igazoljk, hogy ezzel a mdszerrel az zleti kapcsolatok trsas jellemzinek (kztk a bizalomnak) s a kzttk lv kapcsolatoknak mlyebb elemzsre nylik lehetsg. ____ The paper rests on the behavioral interpretation of trust, making a clear distinction between trustworthiness (honesty) and trust interpreted as willingness to engage in risky situations with specific partners. The hypothesis tested is that in a business relation marked by high levels of trustworthiness as perceived by the opposite parties, willingness to be involved in risky situations is higher than it is in relations where actors do not believe their partners to be highly trustworthy. Testing this hypothesis clearly calls for dyadic operationalization, measurement, and analysis. The authors present the first economic application of a newly developed statistical technique called dyadic data analysis, which has already been applied in social psychology. It clearly overcomes the problem of single-ended research in business relations analysis and allows a deeper understanding of any dyadic phenomenon, including trust/trustworthiness as a governance mechanism.
Resumo:
With the latest development in computer science, multivariate data analysis methods became increasingly popular among economists. Pattern recognition in complex economic data and empirical model construction can be more straightforward with proper application of modern softwares. However, despite the appealing simplicity of some popular software packages, the interpretation of data analysis results requires strong theoretical knowledge. This book aims at combining the development of both theoretical and applicationrelated data analysis knowledge. The text is designed for advanced level studies and assumes acquaintance with elementary statistical terms. After a brief introduction to selected mathematical concepts, the highlighting of selected model features is followed by a practice-oriented introduction to the interpretation of SPSS1 outputs for the described data analysis methods. Learning of data analysis is usually time-consuming and requires efforts, but with tenacity the learning process can bring about a significant improvement of individual data analysis skills.
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as histogram binning inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. ^ Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. ^ The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. ^ These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. ^ In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation. ^
Resumo:
This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
Flow Cytometry analyzers have become trusted companions due to their ability to perform fast and accurate analyses of human blood. The aim of these analyses is to determine the possible existence of abnormalities in the blood that have been correlated with serious disease states, such as infectious mononucleosis, leukemia, and various cancers. Though these analyzers provide important feedback, it is always desired to improve the accuracy of the results. This is evidenced by the occurrences of misclassifications reported by some users of these devices. It is advantageous to provide a pattern interpretation framework that is able to provide better classification ability than is currently available. Toward this end, the purpose of this dissertation was to establish a feature extraction and pattern classification framework capable of providing improved accuracy for detecting specific hematological abnormalities in flow cytometric blood data. ^ This involved extracting a unique and powerful set of shift-invariant statistical features from the multi-dimensional flow cytometry data and then using these features as inputs to a pattern classification engine composed of an artificial neural network (ANN). The contribution of this method consisted of developing a descriptor matrix that can be used to reliably assess if a donors blood pattern exhibits a clinically abnormal level of variant lymphocytes, which are blood cells that are potentially indicative of disorders such as leukemia and infectious mononucleosis. ^ This study showed that the set of shift-and-rotation-invariant statistical features extracted from the eigensystem of the flow cytometric data pattern performs better than other commonly-used features in this type of disease detection, exhibiting an accuracy of 80.7%, a sensitivity of 72.3%, and a specificity of 89.2%. This performance represents a major improvement for this type of hematological classifier, which has historically been plagued by poor performance, with accuracies as low as 60% in some cases. This research ultimately shows that an improved feature space was developed that can deliver improved performance for the detection of variant lymphocytes in human blood, thus providing significant utility in the realm of suspect flagging algorithms for the detection of blood-related diseases.^
Resumo:
Stable isotope analysis has become a standard ecological tool for elucidating feeding relationships of organisms and determining food web structure and connectivity. There remain important questions concerning rates at which stable isotope values are incorporated into tissues (turnover rates) and the change in isotope value between a tissue and a food source (discrimination values). These gaps in our understanding necessitate experimental studies to adequately interpret field data. Tissue turnover rates and discrimination values vary among species and have been investigated in a broad array of taxa. However, little attention has been paid to ectothermic top predators in this regard. We quantified the turnover rates and discrimination values for three tissues (scutes, red blood cells, and plasma) in American alligators (Alligator mississippiensis). Plasma turned over faster than scutes or red blood cells, but turnover rates of all three tissues were very slow in comparison to those in endothermic species. Alligator 15N discrimination values were surprisingly low in comparison to those of other top predators and varied between experimental and control alligators. The variability of 15N discrimination values highlights the difficulties in using 15N to assign absolute and possibly even relative trophic levels in field studies. Our results suggest that interpreting stable isotope data based on parameter estimates from other species can be problematic and that large ectothermic tetrapod tissues may be characterized by unique stable isotope dynamics relative to species occupying lower trophic levels and endothermic tetrapods.
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.
Resumo:
<p>Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.</p><p>Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.</p>
Resumo:
A certain type of bacterial inclusion, known as a bacterial microcompartment, was recently identified and imaged through cryo-electron tomography. A reconstructed 3D object from single-axis limited angle tilt-series cryo-electron tomography contains missing regions and this problem is known as the missing wedge problem. Due to missing regions on the reconstructed images, analyzing their 3D structures is a challenging problem. The existing methods overcome this problem by aligning and averaging several similar shaped objects. These schemes work well if the objects are symmetric and several objects with almost similar shapes and sizes are available. Since the bacterial inclusions studied here are not symmetric, are deformed, and show a wide range of shapes and sizes, the existing approaches are not appropriate. This research develops new statistical methods for analyzing geometric properties, such as volume, symmetry, aspect ratio, polyhedral structures etc., of these bacterial inclusions in presence of missing data. These methods work with deformed and non-symmetric varied shaped objects and do not necessitate multiple objects for handling the missing wedge problem. The developed methods and contributions include: (a) an improved method for manual image segmentation, (b) a new approach to 'complete' the segmented and reconstructed incomplete 3D images, (c) a polyhedral structural distance model to predict the polyhedral shapes of these microstructures, (d) a new shape descriptor for polyhedral shapes, named as polyhedron profile statistic, and (e) the Bayes classifier, linear discriminant analysis and support vector machine based classifiers for supervised incomplete polyhedral shape classification. Finally, the predicted 3D shapes for these bacterial microstructures belong to the Johnson solids family, and these shapes along with their other geometric properties are important for better understanding of their chemical and biological characteristics.
Resumo:
Dynamic positron emission tomography (PET) imaging can be used to track the distribution of injected radio-labelled molecules over time in vivo. This is a powerful technique, which provides researchers and clinicians the opportunity to study the status of healthy and pathological tissue by examining how it processes substances of interest. Widely used tracers include 18F-uorodeoxyglucose, an analog of glucose, which is used as the radiotracer in over ninety percent of PET scans. This radiotracer provides a way of quantifying the distribution of glucose utilisation in vivo. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue function. As the residue represents the amount of tracer remaining in the tissue, this can be thought of as a survival function; these functions been examined in great detail by the statistics community. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as ow, ux and volume of distribution. This thesis presents a Markov chain formulation of blood tissue exchange and explores how this relates to established compartmental forms. A nonparametric approach to the estimation of the residue is examined and the improvement in this model relative to compartmental model is evaluated using simulations and cross-validation techniques. The reference distribution of the test statistics, generated in comparing the models, is also studied. We explore these models further with simulated studies and an FDG-PET dataset from subjects with gliomas, which has previously been analysed with compartmental modelling. We also consider the performance of a recently proposed mixture modelling technique in this study.