5 resultados para Linguistic homogeneity

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Backgrounds Ea aims: The boundaries between the categories of body composition provided by vectorial analysis of bioimpedance are not well defined. In this paper, fuzzy sets theory was used for modeling such uncertainty. Methods: An Italian database with 179 cases 18-70 years was divided randomly into developing (n = 20) and testing samples (n = 159). From the 159 registries of the testing sample, 99 contributed with unequivocal diagnosis. Resistance/height and reactance/height were the input variables in the model. Output variables were the seven categories of body composition of vectorial analysis. For each case the linguistic model estimated the membership degree of each impedance category. To compare such results to the previously established diagnoses Kappa statistics was used. This demanded singling out one among the output set of seven categories of membership degrees. This procedure (defuzzification rule) established that the category with the highest membership degree should be the most likely category for the case. Results: The fuzzy model showed a good fit to the development sample. Excellent agreement was achieved between the defuzzified impedance diagnoses and the clinical diagnoses in the testing sample (Kappa = 0.85, p < 0.001). Conclusions: fuzzy linguistic model was found in good agreement with clinical diagnoses. If the whole model output is considered, information on to which extent each BIVA category is present does better advise clinical practice with an enlarged nosological framework and diverse therapeutic strategies. (C) 2012 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The evaluation of associations between genotypes and diseases in a case-control framework plays an important role in genetic epidemiology. This paper focuses on the evaluation of the homogeneity of both genotypic and allelic frequencies. The traditional test that is used to check allelic homogeneity is known to be valid only under Hardy-Weinberg equilibrium, a property that may not hold in practice. Results: We first describe the flaws of the traditional (chi-squared) tests for both allelic and genotypic homogeneity. Besides the known problem of the allelic procedure, we show that whenever these tests are used, an incoherence may arise: sometimes the genotypic homogeneity hypothesis is not rejected, but the allelic hypothesis is. As we argue, this is logically impossible. Some methods that were recently proposed implicitly rely on the idea that this does not happen. In an attempt to correct this incoherence, we describe an alternative frequentist approach that is appropriate even when Hardy-Weinberg equilibrium does not hold. It is then shown that the problem remains and is intrinsic of frequentist procedures. Finally, we introduce the Full Bayesian Significance Test to test both hypotheses and prove that the incoherence cannot happen with these new tests. To illustrate this, all five tests are applied to real and simulated datasets. Using the celebrated power analysis, we show that the Bayesian method is comparable to the frequentist one and has the advantage of being coherent. Conclusions: Contrary to more traditional approaches, the Full Bayesian Significance Test for association studies provides a simple, coherent and powerful tool for detecting associations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work assessed homogeneity of the Institute of Astronomy, Geophysics and Atmospheric Sciences (IAG) weather station climate series, using various statistical techniques. The record from this target station is one of the longest in Brazil, having commenced in 1933 with observations of precipitation, and temperatures and other variables later in 1936. Thus, it is one of the few stations in Brazil with enough data for long-term climate variability and climate change studies. There is, however, a possibility that its data may have been contaminated by some artifacts over time. Admittedly, there was an intervention on the observations in 1958, with the replacement of instruments, for which the size of impact has not been yet evaluated. The station transformed in the course of time from rural to urban, and this may also have influenced homogeneity of the observations and makes the station less representative for climate studies over larger spatial scales. Homogeneity of the target station was assessed applying both absolute, or single station tests, and tests relatively to regional climate, in annual scale, regarding daily precipitation, relative humidity, maximum (TMax), minimum (TMin), and wet bulb temperatures. Among these quantities, only precipitation does not exhibit any inhomogeneity. A clear signal of change of instruments in 1958 was detected in the TMax and relative humidity data, the latter certainly because of its strong dependence on temperature. This signal is not very clear in TMin, but it presents non-climatic discontinuities around 1953 and around 1970. A significant homogeneity break is found around 1990 for TMax and wet bulb temperature. The discontinuities detected after 1958 may have been caused by urbanization, as the observed warming trend in the station is considerably greater than that corresponding to regional climate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report a morphology-based approach for the automatic identification of outlier neurons, as well as its application to the NeuroMorpho.org database, with more than 5,000 neurons. Each neuron in a given analysis is represented by a feature vector composed of 20 measurements, which are then projected into a two-dimensional space by applying principal component analysis. Bivariate kernel density estimation is then used to obtain the probability distribution for the group of cells, so that the cells with highest probabilities are understood as archetypes while those with the smallest probabilities are classified as outliers. The potential of the methodology is illustrated in several cases involving uniform cell types as well as cell types for specific animal species. The results provide insights regarding the distribution of cells, yielding single and multi-variate clusters, and they suggest that outlier cells tend to be more planar and tortuous. The proposed methodology can be used in several situations involving one or more categories of cells, as well as for detection of new categories and possible artifacts.