883 resultados para REGRESSION TREES


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of camera traps in wildlife management is an increasingly common practice. A phenomenon which is also becoming more common is for such camera traps to unintentionally film individuals engaged in a variety of activities, ranging from the innocent to the nefarious and including lewd or potentially embarrassing behaviour. It is therefore possible for the use of camera traps to accidentally encroach upon the privacy rights of persons who venture into the area of surveillance. In this chapter we describe the legal framework of privacy in Australia and discuss the potential risk of this sleeping tiger for users of camera traps. We also present the results of a survey of camera trap users to assess the frequency of such unintended captures and the nature of activity being filmed before discussing the practical implications of these laws for camera traps users in this country and make recommendations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large multisite efforts (e.g., the ENIGMA Consortium), have shown that neuroimaging traits including tract integrity (from DTI fractional anisotropy, FA) and subcortical volumes (from T1-weighted scans) are highly heritable and promising phenotypes for discovering genetic variants associated with brain structure. However, genetic correlations (rg) among measures from these different modalities for mapping the human genome to the brain remain unknown. Discovering these correlations can help map genetic and neuroanatomical pathways implicated in development and inherited risk for disease. We use structural equation models and a twin design to find rg between pairs of phenotypes extracted from DTI and MRI scans. When controlling for intracranial volume, the caudate as well as related measures from the limbic system - hippocampal volume - showed high rg with the cingulum FA. Using an unrelated sample and a Seemingly Unrelated Regression model for bivariate analysis of this connection, we show that a multivariate GWAS approach may be more promising for genetic discovery than a univariate approach applied to each trait separately.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We implemented least absolute shrinkage and selection operator (LASSO) regression to evaluate gene effects in genome-wide association studies (GWAS) of brain images, using an MRI-derived temporal lobe volume measure from 729 subjects scanned as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI). Sparse groups of SNPs in individual genes were selected by LASSO, which identifies efficient sets of variants influencing the data. These SNPs were considered jointly when assessing their association with neuroimaging measures. We discovered 22 genes that passed genome-wide significance for influencing temporal lobe volume. This was a substantially greater number of significant genes compared to those found with standard, univariate GWAS. These top genes are all expressed in the brain and include genes previously related to brain function or neuropsychiatric disorders such as MACROD2, SORCS2, GRIN2B, MAGI2, NPAS3, CLSTN2, GABRG3, NRXN3, PRKAG2, GAS7, RBFOX1, ADARB2, CHD4, and CDH13. The top genes we identified with this method also displayed significant and widespread post hoc effects on voxelwise, tensor-based morphometry (TBM) maps of the temporal lobes. The most significantly associated gene was an autism susceptibility gene known as MACROD2.We were able to successfully replicate the effect of the MACROD2 gene in an independent cohort of 564 young, Australian healthy adult twins and siblings scanned with MRI (mean age: 23.8±2.2 SD years). Our approach powerfully complements univariate techniques in detecting influences of genes on the living brain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite longstanding concern with the dimensionality of the service quality construct as measured by ServQual and IS-ServQual instruments, variations on the IS-ServQual instrument have been enduringly prominent in both academic research and practice in the field of IS. We explain the continuing popularity of the instrument based on the salience of the item set for predicting overall customer satisfaction, suggesting that the preoccupation with the dimensions has been a distraction. The implicit mutual exclusivity of the items suggests a more appropriate conceptualization of IS-ServQual as a formative index. This conceptualization resolves the paradox in IS-ServQual research, that of how an instrument with such well-known and well-documented weaknesses continue to be very influential and widely used by academics and practitioners. A formative conceptualization acknowledges and addresses the criticisms of IS-ServQual, while simultaneously explaining its enduring salience by focusing on the items rather than the “dimensions.” By employing an opportunistic sample and adopting the most recent IS-ServQual instrument published in a leading IS journal (virtually, any valid IS- ServQual sample in combination with a previously tested instrument variant would suffice for study purposes), we demonstrate that when re-specified as both first-order and second-order formatives, IS-ServQual has good model quality metrics and high predictive power on customer satisfaction. We conclude that this formative specification has higher practical use and is more defensible theoretically.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO3, NO3 and Mg do not change much from coast to inland while, SO4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent (R-2 similar to 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of similar to 5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

1 Species-accumulation curves for woody plants were calculated in three tropical forests, based on fully mapped 50-ha plots in wet, old-growth forest in Peninsular Malaysia, in moist, old-growth forest in central Panama, and in dry, previously logged forest in southern India. A total of 610 000 stems were identified to species and mapped to < Im accuracy. Mean species number and stem number were calculated in quadrats as small as 5 m x 5 m to as large as 1000 m x 500 m, for a variety of stem sizes above 10 mm in diameter. Species-area curves were generated by plotting species number as a function of quadrat size; species-individual curves were generated from the same data, but using stem number as the independent variable rather than area. 2 Species-area curves had different forms for stems of different diameters, but species-individual curves were nearly independent of diameter class. With < 10(4) stems, species-individual curves were concave downward on log-log plots, with curves from different forests diverging, but beyond about 104 stems, the log-log curves became nearly linear, with all three sites having a similar slope. This indicates an asymptotic difference in richness between forests: the Malaysian site had 2.7 times as many species as Panama, which in turn was 3.3 times as rich as India. 3 Other details of the species-accumulation relationship were remarkably similar between the three sites. Rectangular quadrats had 5-27% more species than square quadrats of the same area, with longer and narrower quadrats increasingly diverse. Random samples of stems drawn from the entire 50 ha had 10-30% more species than square quadrats with the same number of stems. At both Pasoh and BCI, but not Mudumalai. species richness was slightly higher among intermediate-sized stems (50-100mm in diameter) than in either smaller or larger sizes, These patterns reflect aggregated distributions of individual species, plus weak density-dependent forces that tend to smooth the species abundance distribution and 'loosen' aggregations as stems grow. 4 The results provide support for the view that within each tree community, many species have their abundance and distribution guided more by random drift than deterministic interactions. The drift model predicts that the species-accumulation curve will have a declining slope on a log-log plot, reaching a slope of O.1 in about 50 ha. No other model of community structure can make such a precise prediction. 5 The results demonstrate that diversity studies based on different stem diameters can be compared by sampling identical numbers of stems. Moreover, they indicate that stem counts < 1000 in tropical forests will underestimate the percentage difference in species richness between two diverse sites. Fortunately, standard diversity indices (Fisher's sc, Shannon-Wiener) captured diversity differences in small stem samples more effectively than raw species richness, but both were sample size dependent. Two nonparametric richness estimators (Chao. jackknife) performed poorly, greatly underestimating true species richness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Let G = (V, E) be a finite, simple and undirected graph. For S subset of V, let delta(S, G) = {(u, v) is an element of E : u is an element of S and v is an element of V - S} be the edge boundary of S. Given an integer i, 1 <= i <= vertical bar V vertical bar, let the edge isoperimetric value of G at i be defined as b(e)(i, G) = min(S subset of V:vertical bar S vertical bar=i)vertical bar delta(S, G)vertical bar. The edge isoperimetric peak of G is defined as b(e)(G) = max(1 <= j <=vertical bar V vertical bar)b(e)(j, G). Let b(v)(G) denote the vertex isoperimetric peak defined in a corresponding way. The problem of determining a lower bound for the vertex isoperimetric peak in complete t-ary trees was recently considered in [Y. Otachi, K. Yamazaki, A lower bound for the vertex boundary-width of complete k-ary trees, Discrete Mathematics, in press (doi: 10.1016/j.disc.2007.05.014)]. In this paper we provide bounds which improve those in the above cited paper. Our results can be generalized to arbitrary (rooted) trees. The depth d of a tree is the number of nodes on the longest path starting from the root and ending at a leaf. In this paper we show that for a complete binary tree of depth d (denoted as T-d(2)), c(1)d <= b(e) (T-d(2)) <= d and c(2)d <= b(v)(T-d(2)) <= d where c(1), c(2) are constants. For a complete t-ary tree of depth d (denoted as T-d(t)) and d >= c log t where c is a constant, we show that c(1)root td <= b(e)(T-d(t)) <= td and c(2)d/root t <= b(v) (T-d(t)) <= d where c(1), c(2) are constants. At the heart of our proof we have the following theorem which works for an arbitrary rooted tree and not just for a complete t-ary tree. Let T = (V, E, r) be a finite, connected and rooted tree - the root being the vertex r. Define a weight function w : V -> N where the weight w(u) of a vertex u is the number of its successors (including itself) and let the weight index eta(T) be defined as the number of distinct weights in the tree, i.e eta(T) vertical bar{w(u) : u is an element of V}vertical bar. For a positive integer k, let l(k) = vertical bar{i is an element of N : 1 <= i <= vertical bar V vertical bar, b(e)(i, G) <= k}vertical bar. We show that l(k) <= 2(2 eta+k k)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ordinal qualitative data are often collected for phenotypical measurements in plant pathology and other biological sciences. Statistical methods, such as t tests or analysis of variance, are usually used to analyze ordinal data when comparing two groups or multiple groups. However, the underlying assumptions such as normality and homogeneous variances are often violated for qualitative data. To this end, we investigated an alternative methodology, rank regression, for analyzing the ordinal data. The rank-based methods are essentially based on pairwise comparisons and, therefore, can deal with qualitative data naturally. They require neither normality assumption nor data transformation. Apart from robustness against outliers and high efficiency, the rank regression can also incorporate covariate effects in the same way as the ordinary regression. By reanalyzing a data set from a wheat Fusarium crown rot study, we illustrated the use of the rank regression methodology and demonstrated that the rank regression models appear to be more appropriate and sensible for analyzing nonnormal data and data with outliers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rank-based inference is widely used because of its robustness. This article provides optimal rank-based estimating functions in analysis of clustered data with random cluster effects. The extensive simulation studies carried out to evaluate the performance of the proposed method demonstrate that it is robust to outliers and is highly efficient given the existence of strong cluster correlations. The performance of the proposed method is satisfactory even when the correlation structure is misspecified, or when heteroscedasticity in variance is present. Finally, a real dataset is analyzed for illustration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a linear quantile regression analysis method for longitudinal data that combines the between- and within-subject estimating functions, which incorporates the correlations between repeated measurements. Therefore, the proposed method results in more efficient parameter estimation relative to the estimating functions based on an independence working model. To reduce computational burdens, the induced smoothing method is introduced to obtain parameter estimates and their variances. Under some regularity conditions, the estimators derived by the induced smoothing method are consistent and have asymptotically normal distributions. A number of simulation studies are carried out to evaluate the performance of the proposed method. The results indicate that the efficiency gain for the proposed method is substantial especially when strong within correlations exist. Finally, a dataset from the audiology growth research is used to illustrate the proposed methodology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For clustered survival data, the traditional Gehan-type estimator is asymptotically equivalent to using only the between-cluster ranks, and the within-cluster ranks are ignored. The contribution of this paper is two fold: - (i) incorporating within-cluster ranks in censored data analysis, and; - (ii) applying the induced smoothing of Brown and Wang (2005, Biometrika) for computational convenience. Asymptotic properties of the resulting estimating functions are given. We also carry out numerical studies to assess the performance of the proposed approach and conclude that the proposed approach can lead to much improved estimators when strong clustering effects exist. A dataset from a litter-matched tumorigenesis experiment is used for illustration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With growing population and fast urbanization in Australia, it is a challenging task to maintain our water quality. It is essential to develop an appropriate statistical methodology in analyzing water quality data in order to draw valid conclusions and hence provide useful advices in water management. This paper is to develop robust rank-based procedures for analyzing nonnormally distributed data collected over time at different sites. To take account of temporal correlations of the observations within sites, we consider the optimally combined estimating functions proposed by Wang and Zhu (Biometrika, 93:459-464, 2006) which leads to more efficient parameter estimation. Furthermore, we apply the induced smoothing method to reduce the computational burden. Smoothing leads to easy calculation of the parameter estimates and their variance-covariance matrix. Analysis of water quality data from Total Iron and Total Cyanophytes shows the differences between the traditional generalized linear mixed models and rank regression models. Our analysis also demonstrates the advantages of the rank regression models for analyzing nonnormal data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which dearly demonstrates the advantages of the rank regression models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider rank regression for clustered data analysis and investigate the induced smoothing method for obtaining the asymptotic covariance matrices of the parameter estimators. We prove that the induced estimating functions are asymptotically unbiased and the resulting estimators are strongly consistent and asymptotically normal. The induced smoothing approach provides an effective way for obtaining asymptotic covariance matrices for between- and within-cluster estimators and for a combined estimator to take account of within-cluster correlations. We also carry out extensive simulation studies to assess the performance of different estimators. The proposed methodology is substantially Much faster in computation and more stable in numerical results than the existing methods. We apply the proposed methodology to a dataset from a randomized clinical trial.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are numerous load estimation methods available, some of which are captured in various online tools. However, most estimators are subject to large biases statistically, and their associated uncertainties are often not reported. This makes interpretation difficult and the estimation of trends or determination of optimal sampling regimes impossible to assess. In this paper, we first propose two indices for measuring the extent of sampling bias, and then provide steps for obtaining reliable load estimates by minimizing the biases and making use of possible predictive variables. The load estimation procedure can be summarized by the following four steps: - (i) output the flow rates at regular time intervals (e.g. 10 minutes) using a time series model that captures all the peak flows; - (ii) output the predicted flow rates as in (i) at the concentration sampling times, if the corresponding flow rates are not collected; - (iii) establish a predictive model for the concentration data, which incorporates all possible predictor variables and output the predicted concentrations at the regular time intervals as in (i), and; - (iv) obtain the sum of all the products of the predicted flow and the predicted concentration over the regular time intervals to represent an estimate of the load. The key step to this approach is in the development of an appropriate predictive model for concentration. This is achieved using a generalized regression (rating-curve) approach with additional predictors that capture unique features in the flow data, namely the concept of the first flush, the location of the event on the hydrograph (e.g. rise or fall) and cumulative discounted flow. The latter may be thought of as a measure of constituent exhaustion occurring during flood events. The model also has the capacity to accommodate autocorrelation in model errors which are the result of intensive sampling during floods. Incorporating this additional information can significantly improve the predictability of concentration, and ultimately the precision with which the pollutant load is estimated. We also provide a measure of the standard error of the load estimate which incorporates model, spatial and/or temporal errors. This method also has the capacity to incorporate measurement error incurred through the sampling of flow. We illustrate this approach using the concentrations of total suspended sediment (TSS) and nitrogen oxide (NOx) and gauged flow data from the Burdekin River, a catchment delivering to the Great Barrier Reef. The sampling biases for NOx concentrations range from 2 to 10 times indicating severe biases. As we expect, the traditional average and extrapolation methods produce much higher estimates than those when bias in sampling is taken into account.