903 resultados para Data selection
Resumo:
This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.
Resumo:
A large number of models have been derived from the two-parameter Weibull distribution and are referred to as Weibull models. They exhibit a wide range of shapes for the density and hazard functions, which makes them suitable for modelling complex failure data sets. The WPP and IWPP plot allows one to determine in a systematic manner if one or more of these models are suitable for modelling a given data set. This paper deals with this topic.
Resumo:
There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.
Resumo:
A 12 week kayak training programme was evaluated in children who either had or did not have the anthropometric characteristics identified as being unique to senior elite sprint kayakers. Altogether, 234 male and female school children were screened to select 10 children with and 10 children without the identified key anthropometric characteristics. Before and after training, the children completed an all-out 2 min kayak ergometer simulation test; measures of oxygen consumption, plasma lactate and total work accomplished were recorded. In addition, a 500 m time trial was performed at weeks 3 and 12. The coaches were unaware which 20 children possessed those anthropometric characteristics deemed to favour development of kayak ability. All children improved in both the 2 min ergometer simulation test and 500 m time trial. However, boys who were selected according to favourable anthropometric characteristics showed greater improvement than those without such characteristics in the 2 min ergometer test only. In summary, in a small group of children selected according to anthropometric data unique to elite adult kayakers, 12 weeks of intensive kayak training did not influence the rate of improvement of on-water sprint kayak performance.
Resumo:
One consistent functional imaging finding from patients with major depression has been abnormality of the anterior cingulate cortex (ACC). Hypoperfusion has been most commonly reported, but some studies suggest relative hyperperfusion is associated with response to somatic treatments. Despite these indications of the possible importance of the ACC in depression there have been relatively few cognitive studies ACC function in patients with major depression. The present study employed a series of reaction time (RT) tasks involving selection with melancholic and nonmelancholic depressed patients, as well as age-matched controls. Fifteen patients with unipolar major depression (7 melancholic, 8 nonmelancholic) and 8 healthy age-matched controls performed a series of response selection tasks (choice RT, spatial Stroop, spatial stimulus-response compatibility (SRC), and a combined Stroop + SRC condition). Reaction time and error data were collected. Melancholic patients were significantly slower than controls on all tasks but were slower than nonmelancholic patients only on the Stroop and Stroop + SRC conditions. Nonmelancholic patients did not differ from the control group on any task. The Stroop task seems crucial in differentiating the two depressive groups, they did not differ on the choice RT or SRC tasks. This may reflect differential task demands, the SRC involved symbolic manipulation that might engage the dorsal ACC and dorsolateral prefrontal cortex (DLPFC) to a greater extent than the, primarily inhibitory, Stroop task which may engage the ventral ACC and orbitofrontal cortex (OFC). This might suggest the melancholic group showed a greater ventral ACC-OFC deficit than the nonmelancholic group, while both groups showed similar dorsal ACC-DLPFC deficit.
Resumo:
Purpose - The purpose of this paper is to verify if Brazilian companies are adopting environmental requirements in the supplier selection process. Further, this paper intends to analyze whether there is a relation between the level of environmental management maturity and the inclusion of environmental criteria in the companies` selection of suppliers. Design/methodology/approach - A review of mainstream literature on environmental management, traditional criteria in the supplier selection process and the incorporation of environmental requirements in this context. The empirical study`s strategy is based on five Brazilian case studies with industrial companies. Face-to-face interviews and informal conversations are to be held, explanations made by e-mail with representatives from the purchasing, environmental management, logistics and other areas, and observation and the collection of company documents are also employed. Findings - Based on the cases, it is concluded that companies still use traditional criteria to select suppliers, such as quality and cost, and do not adopt environmental requirements in the supplier selection process in a uniform manner. Evidence found shows that the level of environmental management maturity influences the depth with which companies adopt environmental criteria when selecting suppliers. Thus, a company with more advanced environmental management adopts more formal procedures for selecting environmentally appropriate suppliers than others. Originality/value - This is the first known study to verify if Brazilian companies are adopting environmental requirements in the supplier selection process.
Resumo:
An analysis of the relationships of the major arthropod groups Was undertaken using mitochondrial genome data to examine the hypotheses that Hexapoda is polyphyletic and that Collembola is more closely related to branchiopod crustaceans than insects. We sought to examine the sensitivity of this relationship to outgroup choice, data treatment. gene choice and optimality criteria used in the phylogenetic analysis of mitochondrial genome data. Additionally we sequenced the mitochondrial genome of ail archaeognathan, Nesomachilis australica. to improve taxon selection in the apterygote insects, a group poorly represented in previous mitochondrial phylogenies. The sister group of the Collembola was rarely resolved in our analyses with a significant level of support. The use of different outgroups (myriapods, nematodes, or annelids + mollusks) resulted in many different placements of Collembola. The way in which the dataset was coded for analysis (DNA, DNA with the exclusion of third codon position and as amino acids) also had marked affects on tree topology. We found that nodal Support was spread evenly throughout the 13 mitochondrial genes and the exclusion of genes resulted in significantly less resolution in the inferred trees. Optimality criteria had a much lesser effect on topology than the preceding factors; parsimony and Bayesian trees for a given data set and treatment were quite similar. We therefore conclude that the relationships of the extant arthropod groups as inferred by mitochondrial genomes are highly vulnerable to outgroup choice, data treatment and gene choice, and no consistent alternative hypothesis of Collembola's relationships is supported. Pending the resolution of these identified problems with the application of mitogenomic data to basal arthropod relationships, it is difficult to justify the rejection of hexapod monophyly, which is well supported on morphological grounds. (c) The Willi Hennig Society 2004.
Resumo:
Fifty-four Large White gilts were used to determine the effect of body composition at selection (145 d of age) on the onset of puberty and subsequent reproductive development until 202 d of age. Gilts were assigned to one of three groups based on their backfat depth at selection: 10 to 12 mm (L), 13 to 15 mm (M), and 16 to 18 mm (F). All of the F gilts, 92% of the M gilts, and 67% of the L gilts reached puberty by slaughter at 202 d of age. Data from a subgroup (first 67% to reach puberty in each group; L = Lp, M = Mp, and F = Fp) was also used. The M (Mp) and F (Fp) gilts reached puberty at 172 d (166 d) and 170 d (166 d) of age, respectively, but the L (Lp) gilts at 184.5 d were 12 d (18 d) older than M(P < .05), Mp(P < .001), and F(P < .01), Fp (P < .001) gilts. The Lp (97.68 kg) and Mp (98.33 kg) gilts were lighter (P < .01) than Fp (108.72 kg) gilts at puberty. There were no differences (P < .05) among the L, M, and F gilts in terms of backfat depth or weight at puberty. The L (Lp) gilts had a mean of 1.16 (1.75) estrous cycles, which was lower (P < .01) than for M (Mp) and (P < .01) F (Fp) gilts, with 1.96 (2.29) and 2.25 (2.33) cycles, respectively. L (Lp) gilts had fewer (P < .05) follicles, 13.14 (12.63), than either M (Mp), 19.08 (18.71), or F (Fp), 18.25 (17.42) gilts. The number of corpora lutea was not influenced (P > .05) by grouping at selection, but Fp gilts had fewer (P < .05) corpora lutea than Mp or Fp gilts. Live weight at slaughter was not influenced (P > .10) by grouping at selection or subgrouping at puberty. The L gilts with a mean of 18.05 mm of backfat at slaughter were leaner (P < .05) than the F (21.66 mm) but not (P > .10) the M gilts (19.41 mm). Subgrouping had no effect. Fat deposition and protein deposition were higher (P < .05) in those animals that attained puberty. We conclude that the rate of fat and protein deposition seems to be one of the determinants of puberty attainment.
Resumo:
Cancer/testis Antigens (CTAs) are immunogenic proteins with a restricted expression pattern in normal tissues and aberrant expression in different types of tumors being considered promising candidates for immunotherapy. We used the alignment between EST sequences and the human genome sequence to identify novel CT genes. By examining the EST tissue composition of known CT clusters we defined parameters for the selection of 1184 EST clusters corresponding to putative CT genes. The expression pattern of 70 CT gene candidates was evaluated by RT-PCR in 21 normal tissues, 17 tumor cell lines and 160 primary tumors. We were able to identify 4 CT genes expressed in different types of tumors. The presence of antibodies against the protein encoded by 1 of these 4 CT genes (FAM46D) was exclusively detected in plasma samples from cancer patients. Due to its restricted expression pattern and immunogenicity FAM46D represents a novel target for cancer immunotherapy. (c) 2009 Elsevier Inc. All rights reserved.
Resumo:
Feature selection is one of important and frequently used techniques in data preprocessing. It can improve the efficiency and the effectiveness of data mining by reducing the dimensions of feature space and removing the irrelevant and redundant information. Feature selection can be viewed as a global optimization problem of finding a minimum set of M relevant features that describes the dataset as well as the original N attributes. In this paper, we apply the adaptive partitioned random search strategy into our feature selection algorithm. Under this search strategy, the partition structure and evaluation function is proposed for feature selection problem. This algorithm ensures the global optimal solution in theory and avoids complete randomness in search direction. The good property of our algorithm is shown through the theoretical analysis.
Resumo:
Whether contemporary human populations are still evolving as a result of natural selection has been hotly debated. For natural selection to cause evolutionary change in a trait, variation in the trait must be correlated with fitness and be genetically heritable and there must be no genetic constraints to evolution. These conditions have rarely been tested in human populations. In this study, data from a large twin cohort were used to assess whether selection Will cause a change among women in contemporary Western population for three life-history traits: age at menarche, age at first reproduction, and age at menopause. We control for temporal variation in fecundity (the baby boom phenomenon) and differences between women in educational background and religious affiliation. University-educated women have 35% lower fitness than those with less than seven years education, and Roman Catholic women have about 20% higher fitness than those of other religions. Although these differences were significant, education and religion only accounted for 2% and 1% of variance in fitness, respectively. Using structural equation modeling, we reveal significant genetic influences for all three life-history traits, with heritability estimates of 0.50, 0.23, and 0.45, respectively. However, strong genetic covariation with reproductive fitness could only be demonstrated for age at first reproduction, with much weaker covariation for age at menopause and no significant covariation for age at menarche. Selection may, therefore, lead to the evolution of earlier age at first reproduction in this population. We also estimate substantial heritable variation in fitness itself, with approximately 39% of the variance attributable to additive genetic effects, the remainder consisting of unique environmental effects and small effects from education and religion. We discuss mechanisms that could be maintaining such a high heritability for fitness. Most likely is that selection is now acting on different traits from which it did in pre-industrial human populations.
Resumo:
A data warehouse is a data repository which collects and maintains a large amount of data from multiple distributed, autonomous and possibly heterogeneous data sources. Often the data is stored in the form of materialized views in order to provide fast access to the integrated data. One of the most important decisions in designing a data warehouse is the selection of views for materialization. The objective is to select an appropriate set of views that minimizes the total query response time with the constraint that the total maintenance time for these materialized views is within a given bound. This view selection problem is totally different from the view selection problem under the disk space constraint. In this paper the view selection problem under the maintenance time constraint is investigated. Two efficient, heuristic algorithms for the problem are proposed. The key to devising the proposed algorithms is to define good heuristic functions and to reduce the problem to some well-solved optimization problems. As a result, an approximate solution of the known optimization problem will give a feasible solution of the original problem. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
Genetic research on risk of alcohol, tobacco or drug dependence must make allowance for the partial overlap of risk-factors for initiation of use, and risk-factors for dependence or other outcomes in users. Except in the extreme cases where genetic and environmental risk-factors for initiation and dependence overlap completely or are uncorrelated, there is no consensus about how best to estimate the magnitude of genetic or environmental correlations between Initiation and Dependence in twin and family data. We explore by computer simulation the biases to estimates of genetic and environmental parameters caused by model misspecification when Initiation can only be defined as a binary variable. For plausible simulated parameter values, the two-stage genetic models that we consider yield estimates of genetic and environmental variances for Dependence that, although biased, are not very discrepant from the true values. However, estimates of genetic (or environmental) correlations between Initiation and Dependence may be seriously biased, and may differ markedly under different two-stage models. Such estimates may have little credibility unless external data favor selection of one particular model. These problems can be avoided if Initiation can be assessed as a multiple-category variable (e.g. never versus early-onset versus later onset user), with at least two categories measurable in users at risk for dependence. Under these conditions, under certain distributional assumptions., recovery of simulated genetic and environmental correlations becomes possible, Illustrative application of the model to Australian twin data on smoking confirmed substantial heritability of smoking persistence (42%) with minimal overlap with genetic influences on initiation.
Resumo:
Most sugarcane breeding programs in Australia use large unreplicated trials to evaluate clones in the early stages of selection. Commercial varieties that are replicated provide a method of local control of soil fertility. Although such methods may be useful in detecting broad trends in the field, variation often occurs on a much smaller scale. Methods such as spatial analysis adjust a plot for variability by using information from immediate neighbours. These techniques are routinely used to analyse cereal data in Australia and have resulted in increased accuracy and precision in the estimates of variety effects. In this paper, spatial analyses in which the variability is decomposed into local, natural, and extraneous components are applied to early selection trials in sugarcane. Interplot competition in cane yield and trend in sugar content were substantial in many of the trials and there were often large differences in the selections between the spatial and current method used by the Bureau of Sugar Experiment Stations. A joint modelling approach for tonnes sugar per hectare in response to fertility trends and interplot competition is recommended.