973 resultados para correlated data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

SARAL/AltiKa GDR-T are analyzed to assess the quality of the significant wave height (SWH) measurements. SARAL along-track SWH plots reveal cases of erroneous data, more or less isolated, not detected by the quality flags. The anomalies are often correlated with strong attenuation of the Ka-band backscatter coefficient, sensitive to clouds and rain. A quality test based on the 1Hz standard deviation is proposed to detect such anomalies. From buoy comparison, it is shown that SARAL SWH is more accurate than Jason-2, particularly at low SWH, and globally does not require any correction. Results are better with open ocean than with coastal buoys. The scatter and the number of outliers are much larger for coastal buoys. SARAL is then compared with Jason-2 and Cryosat-2. The altimeter data are extracted from the global altimeter SWH Ifremer data base, including specific corrections to calibrate the various altimeters. The comparison confirms the high quality of SARAL SWH. The 1Hz standard deviation is much less than for Jason-2 and Cryosat-2, particularly at low SWH. Furthermore, results show that the corrections applied to Jason-2 and to Cryosat-2, in the data base, are efficient, improving the global agreement between the three altimeters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To assess the relationship between short-term and long-term changes in power at different corneal locations relative to the change in central corneal power and the 2-year change in axial elongation relative to baseline in children fitted with orthokeratology contact lenses (OK). METHODS: Thirty-one white European subjects 6 to 12 years of age and with myopia −0.75 to −4.00 DS and astigmatism ≤1.00 DC were fitted with OK. Differences in refractive power 3 and 24 months post-OK in comparison with baseline and relative to the change in central corneal power were determined from corneal topography data in eight different corneal regions (i.e., N[nasal]1, N2, T[temporal]1, T2, I[inferior]1, I2, S[superior]1, S2), and correlated with OK-induced axial length changes at two years relative to baseline. RESULTS: After 2 years of OK lens wear, axial length increased by 0.48±0.18 mm (P0.05). CONCLUSION: The reduction in central corneal power and relative increase in paracentral and pericentral power induced by OK over 2 years were not significantly correlated with concurrent changes in axial length of white European children.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mature berries of Pinot Noir grapevines were sampled across a latitudinal gradient in Europe, from southern Spain to central Germany. Our aim was to study the influence of latitude-dependent environmental factors on the metabolite composition (mainly phenolic compounds) of berry skins. Solar radiation variables were positively correlated with flavonols and flavanonols and, to a lesser extent, with stilbenes and cinnamic acids. The daily means of global and erythematic UV solar radiation over long periods (bud break-veraison, bud break-harvest, and veraison-harvest), and the doses and daily means in shorter development periods (5–10 days before veraison and harvest) were the variables best correlated with the phenolic profile. The ratio between trihydroxylated and monohydroxylated flavonols, which was positively correlated with antioxidant capacity, was the berry skin variable best correlated with those radiation variables. Total flavanols and total anthocyanins did not show any correlation with radiation variables. Air temperature, degree days, rainfall, and aridity indices showed fewer correlations with metabolite contents than radiation. Moreover, the latter correlations were restricted to the period veraison-harvest, where radiation, temperature, and water availability variables were correlated, making it difficult to separate the possible individual effects of each type of variable. The data show that managing environmental factors, in particular global and UV radiation, through cultural practices during specific development periods, can be useful to promote the synthesis of valuable nutraceuticals and metabolites that influence wine quality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The most recent submarine eruption observed offshore the Azores archipelago occurred between 1998-2001 along the submarine Serreta ridge (SSR), ~4-5 nautical miles WNW of Terceira Island. This submarine eruption delivered abundant basaltic lava balloons floating at the sea surface and significantly changed the bathymetry around the eruption area. Our work combines bathymetry, volcanic facies cartography, petrography, rock magnetism and geochemistry in order to (1) track the possible vent source at seabed, (2) better constrain the Azores magma source(s) sampled through the Serreta submarine volcanic event, and (3) interpret the data within the small-scale mantle source heterogeneity framework that has been demonstrated for the Azores archipelago. Lava balloons sampled at sea surface display a radiogenic signature, which is also correlated with relatively primitive (low) 4He/3He isotopic ratios. Conversely, SSR lavas are characterized by significantly lower radiogenic 87Sr/86Sr, 206Pb/204Pb and 208Pb/204Pb ratios than the lava balloons and the onshore lavas from the Terceira Island. SSR lavas are primitive, but incompatible trace-enriched. Apparent decoupling between the enriched incompatible trace element abundances and depleted radiogenic isotope ratios is best explained by binary mixing of a depleted MORB source and a HIMU­type component into magma batches that evolved by similar shallower processes in their travel to the surface. The collected data suggest that the freshest samples collected in the SSR may correspond to volcanic products of an unnoticed and more recent eruption than the 1998-2001 episode.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Big data are reshaping the way we interact with technology, thus fostering new applications to increase the safety-assessment of foods. An extraordinary amount of information is analysed using machine learning approaches aimed at detecting the existence or predicting the likelihood of future risks. Food business operators have to share the results of these analyses when applying to place on the market regulated products, whereas agri-food safety agencies (including the European Food Safety Authority) are exploring new avenues to increase the accuracy of their evaluations by processing Big data. Such an informational endowment brings with it opportunities and risks correlated to the extraction of meaningful inferences from data. However, conflicting interests and tensions among the involved entities - the industry, food safety agencies, and consumers - hinder the finding of shared methods to steer the processing of Big data in a sound, transparent and trustworthy way. A recent reform in the EU sectoral legislation, the lack of trust and the presence of a considerable number of stakeholders highlight the need of ethical contributions aimed at steering the development and the deployment of Big data applications. Moreover, Artificial Intelligence guidelines and charters published by European Union institutions and Member States have to be discussed in light of applied contexts, including the one at stake. This thesis aims to contribute to these goals by discussing what principles should be put forward when processing Big data in the context of agri-food safety-risk assessment. The research focuses on two interviewed topics - data ownership and data governance - by evaluating how the regulatory framework addresses the challenges raised by Big data analysis in these domains. The outcome of the project is a tentative Roadmap aimed to identify the principles to be observed when processing Big data in this domain and their possible implementations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis, new classes of models for multivariate linear regression defined by finite mixtures of seemingly unrelated contaminated normal regression models and seemingly unrelated contaminated normal cluster-weighted models are illustrated. The main difference between such families is that the covariates are treated as fixed in the former class of models and as random in the latter. Thus, in cluster-weighted models the assignment of the data points to the unknown groups of observations depends also by the covariates. These classes provide an extension to mixture-based regression analysis for modelling multivariate and correlated responses in the presence of mild outliers that allows to specify a different vector of regressors for the prediction of each response. Expectation-conditional maximisation algorithms for the calculation of the maximum likelihood estimate of the model parameters have been derived. As the number of free parameters incresases quadratically with the number of responses and the covariates, analyses based on the proposed models can become unfeasible in practical applications. These problems have been overcome by introducing constraints on the elements of the covariance matrices according to an approach based on the eigen-decomposition of the covariance matrices. The performances of the new models have been studied by simulations and using real datasets in comparison with other models. In order to gain additional flexibility, mixtures of seemingly unrelated contaminated normal regressions models have also been specified so as to allow mixing proportions to be expressed as functions of concomitant covariates. An illustration of the new models with concomitant variables and a study on housing tension in the municipalities of the Emilia-Romagna region based on different types of multivariate linear regression models have been performed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is great interindividual variability in the response to GH therapy. Ascertaining genetic factors can improve the accuracy of growth response predictions. Suppressor of cytokine signaling (SOCS)-2 is an intracellular negative regulator of GH receptor (GHR) signaling. The objective of the study was to assess the influence of a SOCS2 polymorphism (rs3782415) and its interactive effect with GHR exon 3 and -202 A/C IGFBP3 (rs2854744) polymorphisms on adult height of patients treated with recombinant human GH (rhGH). Genotypes were correlated with adult height data of 65 Turner syndrome (TS) and 47 GH deficiency (GHD) patients treated with rhGH, by multiple linear regressions. Generalized multifactor dimensionality reduction was used to evaluate gene-gene interactions. Baseline clinical data were indistinguishable among patients with different genotypes. Adult height SD scores of patients with at least one SOCS2 single-nucleotide polymorphism rs3782415-C were 0.7 higher than those homozygous for the T allele (P < .001). SOCS2 (P = .003), GHR-exon 3 (P= .016) and -202 A/C IGFBP3 (P = .013) polymorphisms, together with clinical factors accounted for 58% of the variability in adult height and 82% of the total height SD score gain. Patients harboring any two negative genotypes in these three different loci (homozygosity for SOCS2 T allele; the GHR exon 3 full-length allele and/or the -202C-IGFBP3 allele) were more likely to achieve an adult height at the lower quartile (odds ratio of 13.3; 95% confidence interval of 3.2-54.2, P = .0001). The SOCS2 polymorphism (rs3782415) has an influence on the adult height of children with TS and GHD after long-term rhGH therapy. Polymorphisms located in GHR, IGFBP3, and SOCS2 loci have an influence on the growth outcomes of TS and GHD patients treated with rhGH. The use of these genetic markers could identify among rhGH-treated patients those who are genetically predisposed to have less favorable outcomes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our objective was to investigate spinal cord (SC) atrophy in amyotrophic lateral sclerosis (ALS) patients, and to determine whether it correlates with clinical parameters. Forty-three patients with ALS (25 males) and 43 age- and gender-matched healthy controls underwent MRI on a 3T scanner. We used T1-weighted 3D images covering the whole brain and the cervical SC to estimate cervical SC area and eccentricity at C2/C3 level using validated software (SpineSeg). Disease severity was quantified with the ALSFRS-R and ALS Severity scores. SC areas of patients and controls were compared with a Mann-Whitney test. We used linear regression to investigate association between SC area and clinical parameters. Results showed that mean age of patients and disease duration were 53.1 ± 12.2 years and 34.0 ± 29.8 months, respectively. The two groups were significantly different regarding SC areas (67.8 ± 6.8 mm² vs. 59.5 ± 8.4 mm², p < 0.001). Eccentricity values were similar in both groups (p = 0.394). SC areas correlated with disease duration (r = - 0.585, p < 0.001), ALSFRS-R score (r = 0.309, p = 0.044) and ALS Severity scale (r = 0.347, p = 0.022). In conclusion, patients with ALS have SC atrophy, but no flattening. In addition, SC areas correlated with disease duration and functional status. These data suggest that quantitative MRI of the SC may be a useful biomarker in the disease.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.