820 resultados para Data classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The atomic energy authorities of Canada, the United Kingdom, and the United States have agreed to the public release of certain information on low-power research reactors, including those nuclear properties of uranium of importance to the design and operation of such reactors. The following report is the officially released data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-04

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Electricity market price forecast is a changeling yet very important task for electricity market managers and participants. Due to the complexity and uncertainties in the power grid, electricity prices are highly volatile and normally carry with spikes. which may be (ens or even hundreds of times higher than the normal price. Such electricity spikes are very difficult to be predicted. So far. most of the research on electricity price forecast is based on the normal range electricity prices. This paper proposes a data mining based electricity price forecast framework, which can predict the normal price as well as the price spikes. The normal price can be, predicted by a previously proposed wavelet and neural network based forecast model, while the spikes are forecasted based on a data mining approach. This paper focuses on the spike prediction and explores the reasons for price spikes based on the measurement of a proposed composite supply-demand balance index (SDI) and relative demand index (RDI). These indices are able to reflect the relationship among electricity demand, electricity supply and electricity reserve capacity. The proposed model is based on a mining database including market clearing price, trading hour. electricity), demand, electricity supply and reserve. Bayesian classification and similarity searching techniques are used to mine the database to find out the internal relationships between electricity price spikes and these proposed. The mining results are used to form the price spike forecast model. This proposed model is able to generate forecasted price spike, level of spike and associated forecast confidence level. The model is tested with the Queensland electricity market data with promising results. Crown Copyright (C) 2004 Published by Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

beta-turns are important topological motifs for biological recognition of proteins and peptides. Organic molecules that sample the side chain positions of beta-turns have shown broad binding capacity to multiple different receptors, for example benzodiazepines. beta-turns have traditionally been classified into various types based on the backbone dihedral angles (phi 2, psi 2, phi 3 and psi 3). Indeed, 57-68% of beta-turns are currently classified into 8 different backbone families (Type I, Type II, Type I', Type II', Type VIII, Type VIa1, Type VIa2 and Type VIb and Type IV which represents unclassified beta-turns). Although this classification of beta-turns has been useful, the resulting beta-turn types are not ideal for the design of beta-turn mimetics as they do not reflect topological features of the recognition elements, the side chains. To overcome this, we have extracted beta-turns from a data set of non-homologous and high-resolution protein crystal structures. The side chain positions, as defined by C-alpha-C-beta vectors, of these turns have been clustered using the kth nearest neighbor clustering and filtered nearest centroid sorting algorithms. Nine clusters were obtained that cluster 90% of the data, and the average intra-cluster RMSD of the four C-alpha-C-beta vectors is 0.36. The nine clusters therefore represent the topology of the side chain scaffold architecture of the vast majority of beta-turns. The mean structures of the nine clusters are useful for the development of beta-turn mimetics and as biological descriptors for focusing combinatorial chemistry towards biologically relevant topological space.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and Purpose - Although implemented in 1998, no research has examined how well the Australian National Subacute and Nonacute Patient (AN-SNAP) Casemix Classification predicts length of stay (LOS), discharge destination, and functional improvement in public hospital stroke rehabilitation units in Australia. Methods - 406 consecutive admissions to 3 stroke rehabilitation units in Queensland, Australia were studied. Sociode-mographic, clinical, and functional data were collected. General linear modeling and logistic regression were used to assess the ability of AN-SNAP to predict outcomes. Results - AN-SNAP significantly predicted each outcome. There were clear relationships between the outcomes of longer LOS, poorer functional improvement and discharge into care, and the AN-SNAP classes that reflected poorer functional ability and older age. Other predictors included living situation, acute LOS, comorbidity, and stroke type. Conclusions - AN-SNAP is a consistent predictor of LOS, functional change and discharge destination, and has utility in assisting clinicians to set rehabilitation goals and plan discharge.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Systematic protocols that use decision rules or scores arc, seen to improve consistency and transparency in classifying the conservation status of species. When applying these protocols, assessors are typically required to decide on estimates for attributes That are inherently uncertain, Input data and resulting classifications are usually treated as though they arc, exact and hence without operator error We investigated the impact of data interpretation on the consistency of protocols of extinction risk classifications and diagnosed causes of discrepancies when they occurred. We tested three widely used systematic classification protocols employed by the World Conservation Union, NatureServe, and the Florida Fish and Wildlife Conservation Commission. We provided 18 assessors with identical information for 13 different species to infer estimates for each of the required parameters for the three protocols. The threat classification of several of the species varied from low risk to high risk, depending on who did the assessment. This occurred across the three Protocols investigated. Assessors tended to agree on their placement of species in the highest (50-70%) and lowest risk categories (20-40%), but There was poor agreement on which species should be placed in the intermediate categories, Furthermore, the correspondence between The three classification methods was unpredictable, with large variation among assessors. These results highlight the importance of peer review and consensus among multiple assessors in species classifications and the need to be cautious with assessments carried out 4), a single assessor Greater consistency among assessors requires wide use of training manuals and formal methods for estimating parameters that allow uncertainties to be represented, carried through chains of calculations, and reported transparently.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background & Aims: Steatosis is a frequent histologic finding in chronic hepatitis C (CHC), but it is unclear whether steatosis is an independent predictor for liver fibrosis. We evaluated the association between steatosis and fibrosis and their common correlates in persons with CHC and in subgroup analyses according to hepatitis C virus (HCV) genotype and body mass index. Methods: We conducted a meta-analysis on individual data from 3068 patients with histologically confirmed CHC recruited from 10 clinical centers in Italy, Switzerland, France, Australia, and the United States. Results: Steatosis was present in 1561 patients (50.9%) and fibrosis in 2688 (87.6%). HCV genotype was 1 in :1694 cases (55.2%), 2 in 563 (18.4%), 3 in 669 (21.8%), and 4 in :142 (4.6%). By stepwise logistic regression, steatosis was associated independently with genotype 3, the presence of fibrosis, diabetes, hepatic inflammation, ongoing alcohol abuse, higher body mass index, and older age. Fibrosis was associated independently with inflammatory activity, steatosis, male sex, and older age, whereas HCV genotype 2 was associated with reduced fibrosis. In the subgroup analyses, the association between steatosis and fibrosis invariably was dependent on a simultaneous association between steatosis and hepatic inflammation. Conclusions: In this large and geographically different group of CHC patients, steatosis is confirmed as significantly and independently associated with fibrosis in CHC. Hepatic inflammation may mediate fibrogenesis in patients with liver steatosis. Control of metabolic factors (such as overweight, via lifestyle adjustments) appears important in the management of CHC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective To compare mortality burden estimates based on direct measurement of levels and causes in communities with indirect estimates based on combining health facility cause-specific mortality structures with community measurement of mortality levels. Methods. Data from sentinel vital registration (SVR) with verbal autopsy (VA) were used to determine the cause-specific mortality burden at the community level in two areas of the United Republic of Tanzania. Proportional cause-specific mortality structures from health facilities were applied to counts of deaths obtained by SVR to produce modelled estimates. The burden was expressed in years of life lost. Findings. A total of 2884 deaths were recorded from health facilities and 2167 recorded from SVR/VAs. In the perinatal and neonatal age group cause-specific mortality rates were dominated by perinatal conditions and stillbirths in both the community and the facility data. The modelled estimates for chronic causes were very similar to those from SVR/VA. Acute febrile illnesses were coded more specifically in the facility data than in the VA. Injuries were more prevalent in the SVR/VA data than in that from the facilities. Conclusion. In this setting, improved International classification of diseases and health related problems, tenth revision (ICD-10) coding practices and applying facility-based cause structures to counts of deaths from communities, derived from SVR, appears to produce reasonable estimates of the cause-specific mortality burden in those aged 5 years and older determined directly from VA. For the perinatal and neonatal age group, VA appears to be required. Use of this approach in a nationally representative sample of facilities may produce reliable national estimates of the cause-specific mortality burden for leading causes of death in adults.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Government agencies responsible for riparian environments are assessing the utility of remote sensing for mapping and monitoring vegetation structural parameters. The objective of this work was to evaluate Ikonos and Landsat-7 ETM+ imagery for mapping structural parameters and species composition of riparian vegetation in Australian tropical savannahs for a section of Keelbottom Creek, Queensland, Australia. Vegetation indices and image texture from Ikonos data were used for estimating leaf area index (R-2 = 0.13) and canopy percentage foliage cover (R-2 = 0.86). Pan-sharpened Ikonos data were used to map riparian species composition (overall accuracy = 55 percent) and riparian zone width (accuracy within +/- 3 m). Tree crowns could not be automatically delineated due to the lack of contrast between canopies and adjacent grass cover. The ETM+ imagery was suited for mapping the extent of riparian zones. Results presented demonstrate the capabilities of high and moderate spatial resolution imagery for mapping properties of riparian zones.