917 resultados para structuration of lexical data bases


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data.Many of the issues that are discussed with reference to the statistical analysis of compositionaldata have a natural counterpart in the construction of a Bayesian statistical model for categoricaldata.This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986)in his seminal book on compositional data. Particular emphasis is put on the problem of whatparameterization to use

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Precision of released figures is not only an important quality feature of official statistics,it is also essential for a good understanding of the data. In this paper we show a casestudy of how precision could be conveyed if the multivariate nature of data has to betaken into account. In the official release of the Swiss earnings structure survey, the totalsalary is broken down into several wage components. We follow Aitchison's approachfor the analysis of compositional data, which is based on logratios of components. Wefirst present diferent multivariate analyses of the compositional data whereby the wagecomponents are broken down by economic activity classes. Then we propose a numberof ways to assess precision

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examined the validity and reliability of a sequential "Run-Bike-Run" test (RBR) in age-group triathletes. Eight Olympic distance (OD) specialists (age 30.0 ± 2.0 years, mass 75.6 ± 1.6 kg, run VO2max 63.8 ± 1.9 ml· kg(-1)· min(-1), cycle VO2peak 56.7 ± 5.1 ml· kg(-1)· min(-1)) performed four trials over 10 days. Trial 1 (TRVO2max) was an incremental treadmill running test. Trials 2 and 3 (RBR1 and RBR2) involved: 1) a 7-min run at 15 km· h(-1) (R1) plus a 1-min transition to 2) cycling to fatigue (2 W· kg(-1) body mass then 30 W each 3 min); 3) 10-min cycling at 3 W· kg(-1) (Bsubmax); another 1-min transition and 4) a second 7-min run at 15 km· h(-1) (R2). Trial 4 (TT) was a 30-min cycle - 20-min run time trial. No significant differences in absolute oxygen uptake (VO2), heart rate (HR), or blood lactate concentration ([BLA]) were evidenced between RBR1 and RBR2. For all measured physiological variables, the limits of agreement were similar, and the mean differences were physiologically unimportant, between trials. Low levels of test-retest error (i.e. ICC <0.8, CV<10%) were observed for most (logged) measurements. However [BLA] post R1 (ICC 0.87, CV 25.1%), [BLA] post Bsubmax (ICC 0.99, CV 16.31) and [BLA] post R2 (ICC 0.51, CV 22.9%) were least reliable. These error ranges may help coaches detect real changes in training status over time. Moreover, RBR test variables can be used to predict discipline specific and overall TT performance. Cycle VO2peak, cycle peak power output, and the change between R1 and R2 (deltaR1R2) in [BLA] were most highly related to overall TT distance (r = 0.89, p < 0. 01; r = 0.94, p < 0.02; r = 0.86, p < 0.05, respectively). The percentage of TR VO2max at 15 km· h(-1), and deltaR1R2 HR, were also related to run TT distance (r = -0.83 and 0.86, both p < 0.05).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The statistical analysis of compositional data should be treated using logratios of parts,which are difficult to use correctly in standard statistical packages. For this reason afreeware package, named CoDaPack was created. This software implements most of thebasic statistical methods suitable for compositional data.In this paper we describe the new version of the package that now is calledCoDaPack3D. It is developed in Visual Basic for applications (associated with Excel©),Visual Basic and Open GL, and it is oriented towards users with a minimum knowledgeof computers with the aim at being simple and easy to use.This new version includes new graphical output in 2D and 3D. These outputs could bezoomed and, in 3D, rotated. Also a customization menu is included and outputs couldbe saved in jpeg format. Also this new version includes an interactive help and alldialog windows have been improved in order to facilitate its use.To use CoDaPack one has to access Excel© and introduce the data in a standardspreadsheet. These should be organized as a matrix where Excel© rows correspond tothe observations and columns to the parts. The user executes macros that returnnumerical or graphical results. There are two kinds of numerical results: new variablesand descriptive statistics, and both appear on the same sheet. Graphical output appearsin independent windows. In the present version there are 8 menus, with a total of 38submenus which, after some dialogue, directly call the corresponding macro. Thedialogues ask the user to input variables and further parameters needed, as well aswhere to put these results. The web site http://ima.udg.es/CoDaPack contains thisfreeware package and only Microsoft Excel© under Microsoft Windows© is required torun the software.Kew words: Compositional data Analysis, Software

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is one of the few studies that have explored the value of baseline symptoms and health-related quality of life (HRQOL) in predicting survival in brain cancer patients. Baseline HRQOL scores (from the EORTC QLQ-C30 and the Brain Cancer Module (BN 20)) were examined in 490 newly diagnosed glioblastoma cancer patients for the relationship with overall survival by using Cox proportional hazards regression models. Refined techniques as the bootstrap re-sampling procedure and the computation of C-indexes and R(2)-coefficients were used to try and validate the model. Classical analysis controlled for major clinical prognostic factors selected cognitive functioning (P=0.0001), global health status (P=0.0055) and social functioning (P<0.0001) as statistically significant prognostic factors of survival. However, several issues question the validity of these findings. C-indexes and R(2)-coefficients, which are measures of the predictive ability of the models, did not exhibit major improvements when adding selected or all HRQOL scores to clinical factors. While classical techniques lead to positive results, more refined analyses suggest that baseline HRQOL scores add relatively little to clinical factors to predict survival. These results may have implications for future use of HRQOL as a prognostic factor in cancer patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the eighties, John Aitchison (1986) developed a new methodological approach for the statistical analysis of compositional data. This new methodology was implemented in Basic routines grouped under the name CODA and later NEWCODA inMatlab (Aitchison, 1997). After that, several other authors have published extensions to this methodology: Marín-Fernández and others (2000), Barceló-Vidal and others (2001), Pawlowsky-Glahn and Egozcue (2001, 2002) and Egozcue and others (2003). (...)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel densityestimation techniques in the context of compositional data analysis. Indeed, they gavetwo options for the choice of the kernel to be used in the kernel estimator. One ofthese kernels is based on the use the alr transformation on the simplex SD jointly withthe normal distribution on RD-1. However, these authors themselves recognized thatthis method has some deficiencies. A method for overcoming these dificulties based onrecent developments for compositional data analysis and multivariate kernel estimationtheory, combining the ilr transformation with the use of the normal density with a fullbandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu-Figueras (2006). Here we present an extensive simulation study that compares bothmethods in practice, thus exploring the finite-sample behaviour of both estimators

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The 2008 Data Fusion Contest organized by the IEEE Geoscience and Remote Sensing Data Fusion Technical Committee deals with the classification of high-resolution hyperspectral data from an urban area. Unlike in the previous issues of the contest, the goal was not only to identify the best algorithm but also to provide a collaborative effort: The decision fusion of the best individual algorithms was aiming at further improving the classification performances, and the best algorithms were ranked according to their relative contribution to the decision fusion. This paper presents the five awarded algorithms and the conclusions of the contest, stressing the importance of decision fusion, dimension reduction, and supervised classification methods, such as neural networks and support vector machines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ground-penetrating radar (GPR) has the potential to provide valuable information on hydrological properties of the vadose zone because of their strong sensitivity to soil water content. In particular, recent evidence has suggested that the stochastic inversion of crosshole GPR data within a coupled geophysical-hydrological framework may allow for effective estimation of subsurface van-Genuchten-Mualem (VGM) parameters and their corresponding uncertainties. An important and still unresolved issue, however, is how to best integrate GPR data into a stochastic inversion in order to estimate the VGM parameters and their uncertainties, thus improving hydrological predictions. Recognizing the importance of this issue, the aim of the research presented in this thesis was to first introduce a fully Bayesian inversion called Markov-chain-Monte-carlo (MCMC) strategy to perform the stochastic inversion of steady-state GPR data to estimate the VGM parameters and their uncertainties. Within this study, the choice of the prior parameter probability distributions from which potential model configurations are drawn and tested against observed data was also investigated. Analysis of both synthetic and field data collected at the Eggborough (UK) site indicates that the geophysical data alone contain valuable information regarding the VGM parameters. However, significantly better results are obtained when these data are combined with a realistic, informative prior. A subsequent study explore in detail the dynamic infiltration case, specifically to what extent time-lapse ZOP GPR data, collected during a forced infiltration experiment at the Arrenaes field site (Denmark), can help to quantify VGM parameters and their uncertainties using the MCMC inversion strategy. The findings indicate that the stochastic inversion of time-lapse GPR data does indeed allow for a substantial refinement in the inferred posterior VGM parameter distributions. In turn, this significantly improves knowledge of the hydraulic properties, which are required to predict hydraulic behaviour. Finally, another aspect that needed to be addressed involved the comparison of time-lapse GPR data collected under different infiltration conditions (i.e., natural loading and forced infiltration conditions) to estimate the VGM parameters using the MCMC inversion strategy. The results show that for the synthetic example, considering data collected during a forced infiltration test helps to better refine soil hydraulic properties compared to data collected under natural infiltration conditions. When investigating data collected at the Arrenaes field site, further complications arised due to model error and showed the importance of also including a rigorous analysis of the propagation of model error with time and depth when considering time-lapse data. Although the efforts in this thesis were focused on GPR data, the corresponding findings are likely to have general applicability to other types of geophysical data and field environments. Moreover, the obtained results allow to have confidence for future developments in integration of geophysical data with stochastic inversions to improve the characterization of the unsaturated zone but also reveal important issues linked with stochastic inversions, namely model errors, that should definitely be addressed in future research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In rodents and nonhuman primates subjected to spinal cord lesion, neutralizing the neurite growth inhibitor Nogo-A has been shown to promote regenerative axonal sprouting and functional recovery. The goal of the present report was to re-examine the data on the recovery of the primate manual dexterity using refined behavioral analyses and further statistical assessments, representing secondary outcome measures from the same manual dexterity test. Thirteen adult monkeys were studied; seven received an anti-Nogo-A antibody whereas a control antibody was infused into the other monkeys. Monkeys were trained to perform the modified Brinkman board task requiring opposition of index finger and thumb to grasp food pellets placed in vertically and horizontally oriented slots. Two parameters were quantified before and following spinal cord injury: (i) the standard 'score' as defined by the number of pellets retrieved within 30 s from the two types of slots; (ii) the newly introduced 'contact time' as defined by the duration of digit contact with the food pellet before successful retrieval. After lesion the hand was severely impaired in all monkeys; this was followed by progressive functional recovery. Remarkably, anti-Nogo-A antibody-treated monkeys recovered faster and significantly better than control antibody-treated monkeys, considering both the score for vertical and horizontal slots (Mann-Whitney test: P = 0.05 and 0.035, respectively) and the contact time (P = 0.008 and 0.005, respectively). Detailed analysis of the lesions excluded the possibility that this conclusion may have been caused by differences in lesion properties between the two groups of monkeys.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increasing volume of data describing humandisease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the@neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system’s architecture is generic enough that it could be adapted to the treatment of other diseases.Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers cliniciansthe tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medicalresearchers gain access to a critical mass of aneurysm related data due to the system’s ability to federate distributed informationsources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access andwork on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand forperforming computationally intensive simulations for treatment planning and research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article reports on the results of the research done towards the fully automatically merging of lexical resources. Our main goal is to show the generality of the proposed approach, which have been previously applied to merge Spanish Subcategorization Frames lexica. In this work we extend and apply the same technique to perform the merging of morphosyntactic lexica encoded in LMF. The experiments showed that the technique is general enough to obtain good results in these two different tasks which is an important step towards performing the merging of lexical resources fully automatically.