998 resultados para 230204 Applied Statistics
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
We present a fast method for finding optimal parameters for a low-resolution (threading) force field intended to distinguish correct from incorrect folds for a given protein sequence. In contrast to other methods, the parameterization uses information from >10(7) misfolded structures as well as a set of native sequence-structure pairs. In addition to testing the resulting force field's performance on the protein sequence threading problem, results are shown that characterize the number of parameters necessary for effective structure recognition.
Resumo:
Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.
Resumo:
We present a review of perceptual image quality metrics and their application to still image compression. The review describes how image quality metrics can be used to guide an image compression scheme and outlines the advantages, disadvantages and limitations of a number of quality metrics. We examine a broad range of metrics ranging from simple mathematical measures to those which incorporate full perceptual models. We highlight some variation in the models for luminance adaptation and the contrast sensitivity function and discuss what appears to be a lack of a general consensus regarding the models which best describe contrast masking and error summation. We identify how the various perceptual components have been incorporated in quality metrics, and identify a number of psychophysical testing techniques that can be used to validate the metrics. We conclude by illustrating some of the issues discussed throughout the paper with a simple demonstration. (C) 1998 Elsevier Science B.V. All rights reserved.
Resumo:
We derive analytical solutions for the three-dimensional time-dependent buckling of a non-Newtonian viscous plate in a less viscous medium. For the plate we assume a power-law rheology. The principal, axes of the stretching D-ij in the homogeneously deformed ground state are parallel and orthogonal to the bounding surfaces of the plate in the flat state. In the model formulation the action of the less viscous medium is replaced by equivalent reaction forces. The reaction forces are assumed to be parallel to the normal vector of the deformed plate surfaces. As a consequence, the buckling process is driven by the differences between the in-plane stresses and out of plane stress, and not by the in-plane stresses alone as assumed in previous models. The governing differential equation is essentially an orthotropic plate equation for rate dependent material, under biaxial pre-stress, supported by a viscous medium. The differential problem is solved by means of Fourier transformation and largest growth coefficients and corresponding wavenumbers are evaluated. We discuss in detail fold evolutions for isotropic in-plane stretching (D-11 = D-22), uniaxial plane straining (D-22 = 0) and in-plane flattening (D-11 = -2D(22)). Three-dimensional plots illustrate the stages of fold evolution for random initial perturbations or initial embryonic folds with axes non-parallel to the maximum compression axis. For all situations, one dominant set of folds develops normal to D-11, although the dominant wavelength differs from the Biot dominant wavelength except when the plate has a purely Newtonian viscosity. However, in the direction parallel to D-22, there exist infinitely many modes in the vicinity of the dominant wavelength which grow only marginally slower than the one corresponding to the dominant wavelength. This means that, except for very special initial conditions, the appearance of a three-dimensional fold will always be governed by at least two wavelengths. The wavelength in the direction parallel to D-11 is the dominant wavelength, and the wavelength(s) in the direction parallel to D-22 is determined essentially by the statistics of the initial state. A comparable sensitivity to the initial geometry does not exist in the classic two-dimensional folding models. In conformity with tradition we have applied Kirchhoff's hypothesis to constrain the cross-sectional rotations of the plate. We investigate the validity of this hypothesis within the framework of Reissner's plate theory. We also include a discussion of the effects of adding elasticity into the constitutive relations and show that there exist critical ratios of the relaxation times of the plate and the embedding medium for which two dominant wavelengths develop, one at ca. 2.5 of the classical Biot dominant wavelength and the other at ca. 0.45 of this wavelength. We propose that herein lies the origin of parasitic folds well known in natural examples.
Resumo:
lBACKGROUND. Management of patients with ductal carcinoma in situ (DCIS) is a dilemma, as mastectomy provides nearly a 100% cure rate but at the expense of physical and psychologic morbidity. It would be helpful if we could predict which patients with DCIS are at sufficiently high risk of local recurrence after conservative surgery (CS) alone to warrant postoperative radiotherapy (RT) and which patients are at sufficient risk of local recurrence after CS + RT to warrant mastectomy. The authors reviewed the published studies and identified the factors that may be predictive of local recurrence after management by mastectomy, CS alone, or CS + RT. METHODS. The authors examined patient, tumor, and treatment factors as potential predictors for local recurrence and estimated the risks of recurrence based on a review of published studies. They examined the effects of patient factors (age at diagnosis and family history), tumor factors (sub-type of DCIS, grade, tumor size, necrosis, and margins), and treatment (mastectomy, CS alone, and CS + RT). The 95% confidence intervals (CI) of the recurrence rates for each of the studies were calculated for subtype, grade, and necrosis, using the exact binomial; the summary recurrence rate and 95% CI for each treatment category were calculated by quantitative meta-analysis using the fixed and random effects models applied to proportions. RESULTS, Meta-analysis yielded a summary recurrence rate of 22.5% (95% CI = 16.9-28.2) for studies employing CS alone, 8.9% (95% CI = 6.8-11.0) for CS + RT, and 1.4% (95% CI = 0.7-2.1) for studies involving mastectomy alone. These summary figures indicate a clear and statistically significant separation, and therefore outcome, between the recurrence rates of each treatment category, despite the likelihood that the patients who underwent CS alone were likely to have had smaller, possibly low grade lesions with clear margins. The patients with risk factors of presence of necrosis, high grade cytologic features, or comedo subtype were found to derive the greatest improvement in local control with the addition of RT to CS. Local recurrence among patients treated by CS alone is approximately 20%, and one-half of the recurrences are invasive cancers. For most patients, RT reduces the risk of recurrence after CS alone by at least 50%. The differences in local recurrence between CS alone and CS + RT are most apparent for those patients with high grade tumors or DCIS with necrosis, or of the comedo subtype, or DCIS with close or positive surgical margins. CONCLUSIONS, The authors recommend that radiation be added to CS if patients with DCIS who also have the risk factors for local recurrence choose breast conservation over mastectomy. The patients who may be suitable for CS alone outside of a clinical trial may be those who have low grade lesions with little or no necrosis, and with clear surgical margins. Use of the summary statistics when discussing outcomes with patients may help the patient make treatment decisions. Cancer 1999;85:616-28. (C) 1999 American Cancer Society.
Resumo:
Sausage is a protein sequence threading program, but with remarkable run-time flexibility. Using different scripts, it can calculate protein sequence-structure alignments, search structure libraries, swap force fields, create models form alignments, convert file formats and analyse results. There are several different force fields which might be classed as knowledge-based, although they do not rely on Boltzmann statistics. Different force fields are used for alignment calculations and subsequent ranking of calculated models.
Resumo:
Background: The purpose of the present paper was to estimate the absolute risk of breast cancer over the remainder of a lifetime in Australian women with different categories of family history. Methods: Age-specific breast cancer incidence rates were adjusted for screening effects, and rates in those with no family history were estimated using the attributable fraction (AF). Relative risks from a published meta-analysis were applied to obtain incidence rates for different categories of family history, and age-specific incidence was converted to cumulative risk of breast cancer. The risk estimates were based upon Australian population statistics and published relative risks. Breast cancer incidence was from New South Wales women for 1996. The AF was calculated using prevalence of a family history of breast cancer from data on Queensland women. The cumulative absolute risk of breast cancer was calculated from decade and mid-decade ages to age 79 years, not adjusted for competing causes of death. Results: Lifetime risk is approximately 8.6% (1 in 12) for the general population and 7.8% (1 in 13) for those without a family history. Women with one relative affected have lifetime risks of 1 in 6-8 and those with two relatives affected have lifetime risks of 1 in 4-6. The cumulative residual lifetime risk decreases with advancing age; by age 60 years all groups with only one relative affected have well above a 90% probability of not developing breast cancer to age 79 years. Conclusions: These Australian risk statistics are useful for public information and in the clinical setting. Risks given here apply to women with average breast cancer risk from other risk factors.
Resumo:
We explore in detail the possibility of generating a pair-coherent state in the nondegenerate parametric oscillator when decoherence is included. Such states are predicted in the transient regime in parametric oscillation where the pump mode is adiabatically eliminated. Two specific signatures are examined to indicate whether the state of interest has been generated, the Schrodinger cat state-like signatures, and the fidelity. Solutions in a transient regime reveal interference fringes which are indicative of the formation of a Schrodinger cat state. The fidelity indicates the purity of our prepared state compared with the ideal pair-coherent state.
Resumo:
The evolution of event time and size statistics in two heterogeneous cellular automaton models of earthquake behavior are studied and compared to the evolution of these quantities during observed periods of accelerating seismic energy release Drier to large earthquakes. The two automata have different nearest neighbor laws, one of which produces self-organized critical (SOC) behavior (PSD model) and the other which produces quasi-periodic large events (crack model). In the PSD model periods of accelerating energy release before large events are rare. In the crack model, many large events are preceded by periods of accelerating energy release. When compared to randomized event catalogs, accelerating energy release before large events occurs more often than random in the crack model but less often than random in the PSD model; it is easier to tell the crack and PSD model results apart from each other than to tell either model apart from a random catalog. The evolution of event sizes during the accelerating energy release sequences in all models is compared to that of observed sequences. The accelerating energy release sequences in the crack model consist of an increase in the rate of events of all sizes, consistent with observations from a small number of natural cases, however inconsistent with a larger number of cases in which there is an increase in the rate of only moderate-sized events. On average, no increase in the rate of events of any size is seen before large events in the PSD model.
Resumo:
We present a method of estimating HIV incidence rates in epidemic situations from data on age-specific prevalence and changes in the overall prevalence over time. The method is applied to women attending antenatal clinics in Hlabisa, a rural district of KwaZulu/Natal, South Africa, where transmission of HIV is overwhelmingly through heterosexual contact. A model which gives age-specific prevalence rates in the presence of a progressing epidemic is fitted to prevalence data for 1998 using maximum likelihood methods and used to derive the age-specific incidence. Error estimates are obtained using a Monte Carlo procedure. Although the method is quite general some simplifying assumptions are made concerning the form of the risk function and sensitivity analyses are performed to explore the importance of these assumptions. The analysis shows that in 1998 the annual incidence of infection per susceptible woman increased from 5.4 per cent (3.3-8.5 per cent; here and elsewhere ranges give 95 per cent confidence limits) at age 15 years to 24.5 per cent (20.6-29.1 per cent) at age 22 years and declined to 1.3 per cent (0.5-2.9 per cent) at age 50 years; standardized to a uniform age distribution, the overall incidence per susceptible woman aged 15 to 59 was 11.4 per cent (10.0-13.1 per cent); per women in the population it was 8.4 per cent (7.3-9.5 per cent). Standardized to the age distribution of the female population the average incidence per woman was 9.6 per cent (8.4-11.0 per cent); standardized to the age distribution of women attending antenatal clinics, it was 11.3 per cent (9.8-13.3 per cent). The estimated incidence depends on the values used for the epidemic growth rate and the AIDS related mortality. To ensure that, for this population, errors in these two parameters change the age specific estimates of the annual incidence by less than the standard deviation of the estimates of the age specific incidence, the AIDS related mortality should be known to within +/-50 per cent and the epidemic growth rate to within +/-25 per cent, both of which conditions are met. In the absence of cohort studies to measure the incidence of HIV infection directly, useful estimates of the age-specific incidence can be obtained from cross-sectional, age-specific prevalence data and repeat cross-sectional data on the overall prevalence of HIV infection. Several assumptions were made because of the lack of data but sensitivity analyses show that they are unlikely to affect the overall estimates significantly. These estimates are important in assessing the magnitude of the public health problem, for designing vaccine trials and for evaluating the impact of interventions. Copyright (C) 2001 John Wiley & Sons, Ltd.
Resumo:
Centrifuge experiments modeling single-phase flow in prototype porous media typically use the same porous medium and permeant. Then, well-known scaling laws are used to transfer the results to the prototype. More general scaling laws that relax these restrictions are presented. For permeants that are immiscible with an accompanying gas phase, model-prototype (i.e., centrifuge model experiment-target system) scaling is demonstrated. Scaling is shown to be feasible for Miller-similar (or geometrically similar) media. Scalings are presented for a more, general class, Lisle-similar media, based on the equivalence mapping of Richards' equation onto itself. Whereas model-prototype scaling of Miller-similar media can be realized easily for arbitrary boundary conditions, Lisle-similarity in a finite length medium generally, but not always, involves a mapping to a moving boundary problem. An exception occurs for redistribution in Lisle-similar porous media, which is shown to map to spatially fixed boundary conditions. Complete model-prototype scalings for this example are derived.
Resumo:
Sum: Plant biologists in fields of ecology, evolution, genetics and breeding frequently use multivariate methods. This paper illustrates Principal Component Analysis (PCA) and Gabriel's biplot as applied to microarray expression data from plant pathology experiments. Availability: An example program in the publicly distributed statistical language R is available from the web site (www.tpp.uq.edu.au) and by e-mail from the contact. Contact: scott.chapman@csiro.au.