258 resultados para attribute subset selection


Relevância:

20.00% 20.00%

Publicador:

Resumo:

As of today, user-generated information such as online reviews has become increasingly significant for customers in decision making process. Meanwhile, as the volume of online reviews proliferates, there is an insistent demand to help the users tackle the information overload problem. In order to extract useful information from overwhelming reviews, considerable work has been proposed such as review summarization and review selection. Particularly, to avoid the redundant information, researchers attempt to select a small set of reviews to represent the entire review corpus by preserving its statistical properties (e.g., opinion distribution). However, one significant drawback of the existing works is that they only measure the utility of the extracted reviews as a whole without considering the quality of each individual review. As a result, the set of chosen reviews may consist of low-quality ones even its statistical property is close to that of the original review corpus, which is not preferred by the users. In this paper, we proposed a review selection method which takes review quality into consideration during the selection process. Specifically, we examine the relationships between product features based upon a domain ontology to capture the review characteristics based on which to select reviews that have good quality and preserve the opinion distribution as well. Our experimental results based on real world review datasets demonstrate that our proposed approach is feasible and able to improve the performance of the review selection effectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Osteoporotic fracture is a major cause of morbidity and mortality worldwide. Low bone mineral density (BMD) is a major predisposing factor to fracture and is known to be highly heritable. Site-, gender-, and age-specific genetic effects on BMD are thought to be significant, but have largely not been considered in the design of genome-wide association studies (GWAS) of BMD to date. We report here a GWAS using a novel study design focusing on women of a specific age (postmenopausal women, age 55-85 years), with either extreme high or low hip BMD (age- and gender-adjusted BMD z-scores of +1.5 to +4.0, n = 1055, or -4.0 to -1.5, n = 900), with replication in cohorts of women drawn from the general population (n = 20,898). The study replicates 21 of 26 known BMD-associated genes. Additionally, we report suggestive association of a further six new genetic associations in or around the genes CLCN7, GALNT3, IBSP, LTBP3, RSPO3, and SOX4, with replication in two independent datasets. A novel mouse model with a loss-of-function mutation in GALNT3 is also reported, which has high bone mass, supporting the involvement of this gene in BMD determination. In addition to identifying further genes associated with BMD, this study confirms the efficiency of extreme-truncate selection designs for quantitative trait association studies. © 2011 Duncan et al.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic speech recognition from multiple distant micro- phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Antigen selection of B cells within the germinal center reaction generally leads to the accumulation of replacement mutations in the complementarity-determining regions (CDRs) of immunoglobulin genes. Studies of mutations in IgE-associated VDJ gene sequences have cast doubt on the role of antigen selection in the evolution of the human IgE response, and it may be that selection for high affinity antibodies is a feature of some but not all allergic diseases. The severity of IgE-mediated anaphylaxis is such that it could result from higher affinity IgE antibodies. We therefore investigated IGHV mutations in IgE-associated sequences derived from ten individuals with a history of anaphylactic reactions to bee or wasp venom or peanut allergens. IgG sequences, which more certainly experience antigen selection, served as a control dataset. A total of 6025 unique IgE and 5396 unique IgG sequences were generated using high throughput 454 pyrosequencing. The proportion of replacement mutations seen in the CDRs of the IgG dataset was significantly higher than that of the IgE dataset, and the IgE sequences showed little evidence of antigen selection. To exclude the possibility that 454 errors had compromised analysis, rigorous filtering of the datasets led to datasets of 90 core IgE sequences and 411 IgG sequences. These sequences were present as both forward and reverse reads, and so were most unlikely to include sequencing errors. The filtered datasets confirmed that antigen selection plays a greater role in the evolution of IgG sequences than of IgE sequences derived from the study participants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Bloodstream infections resulting from intravascular catheters (catheter-BSI) in critical care increase patients' length of stay, morbidity and mortality, and the management of these infections and their complications has been estimated to cost the NHS annually £19.1–36.2M. Catheter-BSI are thought to be largely preventable using educational interventions, but guidance as to which types of intervention might be most clinically effective is lacking. Objective To assess the effectiveness and cost-effectiveness of educational interventions for preventing catheter-BSI in critical care units in England. Data sources Sixteen electronic bibliographic databases – including MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, Cumulative Index to Nursing and Allied Health Literature (CINAHL), NHS Economic Evaluation Database (NHS EED), EMBASE and The Cochrane Library databases – were searched from database inception to February 2011, with searches updated in March 2012. Bibliographies of systematic reviews and related papers were screened and experts contacted to identify any additional references. Review methods References were screened independently by two reviewers using a priori selection criteria. A descriptive map was created to summarise the characteristics of relevant studies. Further selection criteria developed in consultation with the project Advisory Group were used to prioritise a subset of studies relevant to NHS practice and policy for systematic review. A decision-analytic economic model was developed to investigate the cost-effectiveness of educational interventions for preventing catheter-BSI. Results Seventy-four studies were included in the descriptive map, of which 24 were prioritised for systematic review. Studies have predominantly been conducted in the USA, using single-cohort before-and-after study designs. Diverse types of educational intervention appear effective at reducing the incidence density of catheter-BSI (risk ratios statistically significantly < 1.0), but single lectures were not effective. The economic model showed that implementing an educational intervention in critical care units in England would be cost-effective and potentially cost-saving, with incremental cost-effectiveness ratios under worst-case sensitivity analyses of < £5000/quality-adjusted life-year. Limitations Low-quality primary studies cannot definitively prove that the planned interventions were responsible for observed changes in catheter-BSI incidence. Poor reporting gave unclear estimates of risk of bias. Some model parameters were sourced from other locations owing to a lack of UK data. Conclusions Our results suggest that it would be cost-effective and may be cost-saving for the NHS to implement educational interventions in critical care units. However, more robust primary studies are needed to exclude the possible influence of secular trends on observed reductions in catheter-BSI.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this report an artificial neural network (ANN) based automated emergency landing site selection system for unmanned aerial vehicle (UAV) and general aviation (GA) is described. The system aims increase safety of UAV operation by emulating pilot decision making in emergency landing scenarios using an ANN to select a safe landing site from available candidates. The strength of an ANN to model complex input relationships makes it a perfect system to handle the multicriteria decision making (MCDM) process of emergency landing site selection. The ANN operates by identifying the more favorable of two landing sites when provided with an input vector derived from both landing site's parameters, the aircraft's current state and wind measurements. The system consists of a feed forward ANN, a pre-processor class which produces ANN input vectors and a class in charge of creating a ranking of landing site candidates using the ANN. The system was successfully implemented in C++ using the FANN C++ library and ROS. Results obtained from ANN training and simulations using randomly generated landing sites by a site detection simulator data verify the feasibility of an ANN based automated emergency landing site selection system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Travel speed is one of the most critical parameters for road safety; the evidence suggests that increased vehicle speed is associated with higher crash risk and injury severity. Both naturalistic and simulator studies have reported that drivers distracted by a mobile phone select a lower driving speed. Speed decrements have been argued to be a risk compensatory behaviour of distracted drivers. Nonetheless, the extent and circumstances of the speed change among distracted drivers are still not known very well. As such, the primary objective of this study was to investigate patterns of speed variation in relation to contextual factors and distraction. Using the CARRS-Q high-fidelity Advanced Driving Simulator, the speed selection behaviour of 32 drivers aged 18-26 years was examined in two phone conditions: baseline (no phone conversation) and handheld phone operation. The simulator driving route contained five different types of road traffic complexities, including one road section with a horizontal S curve, one horizontal S curve with adjacent traffic, one straight segment of suburban road without traffic, one straight segment of suburban road with traffic interactions, and one road segment in a city environment. Speed deviations from the posted speed limit were analysed using Ward’s Hierarchical Clustering method to identify the effects of road traffic environment and cognitive distraction. The speed deviations along curved road sections formed two different clusters for the two phone conditions, implying that distracted drivers adopt a different strategy for selecting driving speed in a complex driving situation. In particular, distracted drivers selected a lower speed while driving along a horizontal curve. The speed deviation along the city road segment and other straight road segments grouped into a different cluster, and the deviations were not significantly different across phone conditions, suggesting a negligible effect of distraction on speed selection along these road sections. Future research should focus on developing a risk compensation model to explain the relationship between road traffic complexity and distraction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Spatial data analysis has become more and more important in the studies of ecology and economics during the last decade. One focus of spatial data analysis is how to select predictors, variance functions and correlation functions. However, in general, the true covariance function is unknown and the working covariance structure is often misspecified. In this paper, our target is to find a good strategy to identify the best model from the candidate set using model selection criteria. This paper is to evaluate the ability of some information criteria (corrected Akaike information criterion, Bayesian information criterion (BIC) and residual information criterion (RIC)) for choosing the optimal model when the working correlation function, the working variance function and the working mean function are correct or misspecified. Simulations are carried out for small to moderate sample sizes. Four candidate covariance functions (exponential, Gaussian, Matern and rational quadratic) are used in simulation studies. With the summary in simulation results, we find that the misspecified working correlation structure can still capture some spatial correlation information in model fitting. When the sample size is large enough, BIC and RIC perform well even if the the working covariance is misspecified. Moreover, the performance of these information criteria is related to the average level of model fitting which can be indicated by the average adjusted R square ( [GRAPHICS] ), and overall RIC performs well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selection criteria and misspecification tests for the intra-cluster correlation structure (ICS) in longitudinal data analysis are considered. In particular, the asymptotical distribution of the correlation information criterion (CIC) is derived and a new method for selecting a working ICS is proposed by standardizing the selection criterion as the p-value. The CIC test is found to be powerful in detecting misspecification of the working ICS structures, while with respect to the working ICS selection, the standardized CIC test is also shown to have satisfactory performance. Some simulation studies and applications to two real longitudinal datasets are made to illustrate how these criteria and tests might be useful.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A modeling paradigm is proposed for covariate, variance and working correlation structure selection for longitudinal data analysis. Appropriate selection of covariates is pertinent to correct variance modeling and selecting the appropriate covariates and variance function is vital to correlation structure selection. This leads to a stepwise model selection procedure that deploys a combination of different model selection criteria. Although these criteria find a common theoretical root based on approximating the Kullback-Leibler distance, they are designed to address different aspects of model selection and have different merits and limitations. For example, the extended quasi-likelihood information criterion (EQIC) with a covariance penalty performs well for covariate selection even when the working variance function is misspecified, but EQIC contains little information on correlation structures. The proposed model selection strategies are outlined and a Monte Carlo assessment of their finite sample properties is reported. Two longitudinal studies are used for illustration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consider a general regression model with an arbitrary and unknown link function and a stochastic selection variable that determines whether the outcome variable is observable or missing. The paper proposes U-statistics that are based on kernel functions as estimators for the directions of the parameter vectors in the link function and the selection equation, and shows that these estimators are consistent and asymptotically normal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Efficiency of analysis using generalized estimation equations is enhanced when intracluster correlation structure is accurately modeled. We compare two existing criteria (a quasi-likelihood information criterion, and the Rotnitzky-Jewell criterion) to identify the true correlation structure via simulations with Gaussian or binomial response, covariates varying at cluster or observation level, and exchangeable or AR(l) intracluster correlation structure. Rotnitzky and Jewell's approach performs better when the true intracluster correlation structure is exchangeable, while the quasi-likelihood criteria performs better for an AR(l) structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Statistical methods are often used to analyse commercial catch and effort data to provide standardised fishing effort and/or a relative index of fish abundance for input into stock assessment models. Achieving reliable results has proved difficult in Australia's Northern Prawn Fishery (NPF), due to a combination of such factors as the biological characteristics of the animals, some aspects of the fleet dynamics, and the changes in fishing technology. For this set of data, we compared four modelling approaches (linear models, mixed models, generalised estimating equations, and generalised linear models) with respect to the outcomes of the standardised fishing effort or the relative index of abundance. We also varied the number and form of vessel covariates in the models. Within a subset of data from this fishery, modelling correlation structures did not alter the conclusions from simpler statistical models. The random-effects models also yielded similar results. This is because the estimators are all consistent even if the correlation structure is mis-specified, and the data set is very large. However, the standard errors from different models differed, suggesting that different methods have different statistical efficiency. We suggest that there is value in modelling the variance function and the correlation structure, to make valid and efficient statistical inferences and gain insight into the data. We found that fishing power was separable from the indices of prawn abundance only when we offset the impact of vessel characteristics at assumed values from external sources. This may be due to the large degree of confounding within the data, and the extreme temporal changes in certain aspects of individual vessels, the fleet and the fleet dynamics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.