935 resultados para complex data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The key to the correct application of ANOVA is careful experimental design and matching the correct analysis to that design. The following points should therefore, be considered before designing any experiment: 1. In a single factor design, ensure that the factor is identified as a 'fixed' or 'random effect' factor. 2. In more complex designs, with more than one factor, there may be a mixture of fixed and random effect factors present, so ensure that each factor is clearly identified. 3. Where replicates can be grouped or blocked, the advantages of a randomised blocks design should be considered. There should be evidence, however, that blocking can sufficiently reduce the error variation to counter the loss of DF compared with a randomised design. 4. Where different treatments are applied sequentially to a patient, the advantages of a three-way design in which the different orders of the treatments are included as an 'effect' should be considered. 5. Combining different factors to make a more efficient experiment and to measure possible factor interactions should always be considered. 6. The effect of 'internal replication' should be taken into account in a factorial design in deciding the number of replications to be used. Where possible, each error term of the ANOVA should have at least 15 DF. 7. Consider carefully whether a particular factorial design can be considered to be a split-plot or a repeated measures design. If such a design is appropriate, consider how to continue the analysis bearing in mind the problem of using post hoc tests in this situation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

1. The techniques associated with regression, whether linear or non-linear, are some of the most useful statistical procedures that can be applied in clinical studies in optometry. 2. In some cases, there may be no scientific model of the relationship between X and Y that can be specified in advance and the objective may be to provide a ‘curve of best fit’ for predictive purposes. In such cases, the fitting of a general polynomial type curve may be the best approach. 3. An investigator may have a specific model in mind that relates Y to X and the data may provide a test of this hypothesis. Some of these curves can be reduced to a linear regression by transformation, e.g., the exponential and negative exponential decay curves. 4. In some circumstances, e.g., the asymptotic curve or logistic growth law, a more complex process of curve fitting involving non-linear estimation will be required.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present a novel method for emulating a stochastic, or random output, computer model and show its application to a complex rabies model. The method is evaluated both in terms of accuracy and computational efficiency on synthetic data and the rabies model. We address the issue of experimental design and provide empirical evidence on the effectiveness of utilizing replicate model evaluations compared to a space-filling design. We employ the Mahalanobis error measure to validate the heteroscedastic Gaussian process based emulator predictions for both the mean and (co)variance. The emulator allows efficient screening to identify important model inputs and better understanding of the complex behaviour of the rabies model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent advances in technology have produced a significant increase in the availability of free sensor data over the Internet. With affordable weather monitoring stations now available to individual meteorology enthusiasts a reservoir of real time data such as temperature, rainfall and wind speed can now be obtained for most of the United States and Europe. Despite the abundance of available data, obtaining useable information about the weather in your local neighbourhood requires complex processing that poses several challenges. This paper discusses a collection of technologies and applications that harvest, refine and process this data, culminating in information that has been tailored toward the user. In this case we are particularly interested in allowing a user to make direct queries about the weather at any location, even when this is not directly instrumented, using interpolation methods. We also consider how the uncertainty that the interpolation introduces can then be communicated to the user of the system, using UncertML, a developing standard for uncertainty representation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tonal, textural and contextual properties are used in manual photointerpretation of remotely sensed data. This study has used these three attributes to produce a lithological map of semi arid northwest Argentina by semi automatic computer classification procedures of remotely sensed data. Three different types of satellite data were investigated, these were LANDSAT MSS, TM and SIR-A imagery. Supervised classification procedures using tonal features only produced poor classification results. LANDSAT MSS produced classification accuracies in the range of 40 to 60%, while accuracies of 50 to 70% were achieved using LANDSAT TM data. The addition of SIR-A data produced increases in the classification accuracy. The increased classification accuracy of TM over the MSS is because of the better discrimination of geological materials afforded by the middle infra red bands of the TM sensor. The maximum likelihood classifier consistently produced classification accuracies 10 to 15% higher than either the minimum distance to means or decision tree classifier, this improved accuracy was obtained at the cost of greatly increased processing time. A new type of classifier the spectral shape classifier, which is computationally as fast as a minimum distance to means classifier is described. However, the results for this classifier were disappointing, being lower in most cases than the minimum distance or decision tree procedures. The classification results using only tonal features were felt to be unacceptably poor, therefore textural attributes were investigated. Texture is an important attribute used by photogeologists to discriminate lithology. In the case of TM data, texture measures were found to increase the classification accuracy by up to 15%. However, in the case of the LANDSAT MSS data the use of texture measures did not provide any significant increase in the accuracy of classification. For TM data, it was found that second order texture, especially the SGLDM based measures, produced highest classification accuracy. Contextual post processing was found to increase classification accuracy and improve the visual appearance of classified output by removing isolated misclassified pixels which tend to clutter classified images. Simple contextual features, such as mode filters were found to out perform more complex features such as gravitational filter or minimal area replacement methods. Generally the larger the size of the filter, the greater the increase in the accuracy. Production rules were used to build a knowledge based system which used tonal and textural features to identify sedimentary lithologies in each of the two test sites. The knowledge based system was able to identify six out of ten lithologies correctly.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis studied the effect of (i) the number of grating components and (ii) parameter randomisation on root-mean-square (r.m.s.) contrast sensitivity and spatial integration. The effectiveness of spatial integration without external spatial noise depended on the number of equally spaced orientation components in the sum of gratings. The critical area marking the saturation of spatial integration was found to decrease when the number of components increased from 1 to 5-6 but increased again at 8-16 components. The critical area behaved similarly as a function of the number of grating components when stimuli consisted of 3, 6 or 16 components with different orientations and/or phases embedded in spatial noise. Spatial integration seemed to depend on the global Fourier structure of the stimulus. Spatial integration was similar for sums of two vertical cosine or sine gratings with various Michelson contrasts in noise. The critical area for a grating sum was found to be a sum of logarithmic critical areas for the component gratings weighted by their relative Michelson contrasts. The human visual system was modelled as a simple image processor where the visual stimuli is first low-pass filtered by the optical modulation transfer function of the human eye and secondly high-pass filtered, up to the spatial cut-off frequency determined by the lowest neural sampling density, by the neural modulation transfer function of the visual pathways. The internal noise is then added before signal interpretation occurs in the brain. The detection is mediated by a local spatially windowed matched filter. The model was extended to include complex stimuli and its applicability to the data was found to be successful. The shape of spatial integration function was similar for non-randomised and randomised simple and complex gratings. However, orientation and/or phase randomised reduced r.m.s contrast sensitivity by a factor of 2. The effect of parameter randomisation on spatial integration was modelled under the assumption that human observers change the observer strategy from cross-correlation (i.e., a matched filter) to auto-correlation detection when uncertainty is introduced to the task. The model described the data accurately.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Roma population has become a policy issue highly debated in the European Union (EU). The EU acknowledges that this ethnic minority faces extreme poverty and complex social and economic problems. 52% of the Roma population live in extreme poverty, 75% in poverty (Soros Foundation, 2007, p. 8), with a life expectancy at birth of about ten years less than the majority population. As a result, Romania has received a great deal of policy attention and EU funding, being eligible for 19.7 billion Euros from the EU for 2007-2013. Yet progress is slow; it is debated whether Romania's government and companies were capable to use these funds (EurActiv.ro, 2012). Analysing three case studies, this research looks at policy implementation in relation to the role of Roma networks in different geographical regions of Romania. It gives insights about how to get things done in complex settings and it explains responses to the Roma problem as a „wicked‟ policy issue. This longitudinal research was conducted between 2008 and 2011, comprising 86 semi-structured interviews, 15 observations, and documentary sources and using a purposive sample focused on institutions responsible for implementing social policies for Roma: Public Health Departments, School Inspectorates, City Halls, Prefectures, and NGOs. Respondents included: governmental workers, academics, Roma school mediators, Roma health mediators, Roma experts, Roma Councillors, NGOs workers, and Roma service users. By triangulating the data collected with various methods and applied to various categories of respondents, a comprehensive and precise representation of Roma network practices was created. The provisions of the 2001 „Governmental Strategy to Improve the Situation of the Roma Population‟ facilitated forming a Roma network by introducing special jobs in local and central administration. In different counties, resources, people, their skills, and practices varied. As opposed to the communist period, a new Roma elite emerged: social entrepreneurs set the pace of change by creating either closed cliques or open alliances and by using more or less transparent practices. This research deploys the concept of social/institutional entrepreneurs to analyse how key actors influence clique and alliance formation and functioning. Significantly, by contrasting three case studies, it shows that both closed cliques and open alliances help to achieve public policy network objectives, but that closed cliques can also lead to failure to improve the health and education of Roma people in a certain region.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present experimental studies and numerical modeling based on a combination of the Bidirectional Beam Propagation Method and Finite Element Modeling that completely describes the wavelength spectra of point by point femtosecond laser inscribed fiber Bragg gratings, showing excellent agreement with experiment. We have investigated the dependence of different spectral parameters such as insertion loss, all dominant cladding and ghost modes and their shape relative to the position of the fiber Bragg grating in the core of the fiber. Our model is validated by comparing model predictions with experimental data and allows for predictive modeling of the gratings. We expand our analysis to more complicated structures, where we introduce symmetry breaking; this highlights the importance of centered gratings and how maintaining symmetry contributes to the overall spectral quality of the inscribed Bragg gratings. Finally, the numerical modeling is applied to superstructure gratings and a comparison with experimental results reveals a capability for dealing with complex grating structures that can be designed with particular wavelength characteristics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the gemcitabine (dFdC) resistant cell lines manifested high NF?B activity. The NF?B activity can be induced by dFdC and 5-FU exposure. The chemosensitizing effect of disulfiram (DS), an anti-alcoholism drug and NF?B inhibitor, and copper (Cu) on the chemoresistant cell lines was examined. The DS/Cu complex significantly enhanced the cytotoxicity of dFdC (resistant cells: 12.2–1085-fold) and completely reversed the dFdC resistance in the resitant cell lines. The dFdC-induced NF?B activity was markedly inhibited by DS/Cu complex. The data from this study indicated that DS may be used in clinic to improve the therapeutic effect of dFdC in breast and colon cancer patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent advances in technology have produced a significant increase in the availability of free sensor data over the Internet. With affordable weather monitoring stations now available to individual meteorology enthusiasts a reservoir of real time data such as temperature, rainfall and wind speed can now be obtained for most of the United States and Europe. Despite the abundance of available data, obtaining useable information about the weather in your local neighbourhood requires complex processing that poses several challenges. This paper discusses a collection of technologies and applications that harvest, refine and process this data, culminating in information that has been tailored toward the user. In this case we are particularly interested in allowing a user to make direct queries about the weather at any location, even when this is not directly instrumented, using interpolation methods. We also consider how the uncertainty that the interpolation introduces can then be communicated to the user of the system, using UncertML, a developing standard for uncertainty representation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation investigates the very important and current problem of modelling human expertise. This is an apparent issue in any computer system emulating human decision making. It is prominent in Clinical Decision Support Systems (CDSS) due to the complexity of the induction process and the vast number of parameters in most cases. Other issues such as human error and missing or incomplete data present further challenges. In this thesis, the Galatean Risk Screening Tool (GRiST) is used as an example of modelling clinical expertise and parameter elicitation. The tool is a mental health clinical record management system with a top layer of decision support capabilities. It is currently being deployed by several NHS mental health trusts across the UK. The aim of the research is to investigate the problem of parameter elicitation by inducing them from real clinical data rather than from the human experts who provided the decision model. The induced parameters provide an insight into both the data relationships and how experts make decisions themselves. The outcomes help further understand human decision making and, in particular, help GRiST provide more accurate emulations of risk judgements. Although the algorithms and methods presented in this dissertation are applied to GRiST, they can be adopted for other human knowledge engineering domains.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Indicators which summarise the characteristics of spatiotemporal data coverages significantly simplify quality evaluation, decision making and justification processes by providing a number of quality cues that are easy to manage and avoiding information overflow. Criteria which are commonly prioritised in evaluating spatial data quality and assessing a dataset’s fitness for use include lineage, completeness, logical consistency, positional accuracy, temporal and attribute accuracy. However, user requirements may go far beyond these broadlyaccepted spatial quality metrics, to incorporate specific and complex factors which are less easily measured. This paper discusses the results of a study of high level user requirements in geospatial data selection and data quality evaluation. It reports on the geospatial data quality indicators which were identified as user priorities, and which can potentially be standardised to enable intercomparison of datasets against user requirements. We briefly describe the implications for tools and standards to support the communication and intercomparison of data quality, and the ways in which these can contribute to the generation of a GEO label.