930 resultados para selection methods


Relevância:

60.00% 60.00%

Publicador:

Resumo:

A significant proportion of the cost of software development is due to software testing and maintenance. This is in part the result of the inevitable imperfections due to human error, lack of quality during the design and coding of software, and the increasing need to reduce faults to improve customer satisfaction in a competitive marketplace. Given the cost and importance of removing errors improvements in fault detection and removal can be of significant benefit. The earlier in the development process faults can be found, the less it costs to correct them and the less likely other faults are to develop. This research aims to make the testing process more efficient and effective by identifying those software modules most likely to contain faults, allowing testing efforts to be carefully targeted. This is done with the use of machine learning algorithms which use examples of fault prone and not fault prone modules to develop predictive models of quality. In order to learn the numerical mapping between module and classification, a module is represented in terms of software metrics. A difficulty in this sort of problem is sourcing software engineering data of adequate quality. In this work, data is obtained from two sources, the NASA Metrics Data Program, and the open source Eclipse project. Feature selection before learning is applied, and in this area a number of different feature selection methods are applied to find which work best. Two machine learning algorithms are applied to the data - Naive Bayes and the Support Vector Machine - and predictive results are compared to those of previous efforts and found to be superior on selected data sets and comparable on others. In addition, a new classification method is proposed, Rank Sum, in which a ranking abstraction is laid over bin densities for each class, and a classification is determined based on the sum of ranks over features. A novel extension of this method is also described based on an observed polarising of points by class when rank sum is applied to training data to convert it into 2D rank sum space. SVM is applied to this transformed data to produce models the parameters of which can be set according to trade-off curves to obtain a particular performance trade-off.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data preprocessing is widely recognized as an important stage in anomaly detection. This paper reviews the data preprocessing techniques used by anomaly-based network intrusion detection systems (NIDS), concentrating on which aspects of the network traffic are analyzed, and what feature construction and selection methods have been used. Motivation for the paper comes from the large impact data preprocessing has on the accuracy and capability of anomaly-based NIDS. The review finds that many NIDS limit their view of network traffic to the TCP/IP packet headers. Time-based statistics can be derived from these headers to detect network scans, network worm behavior, and denial of service attacks. A number of other NIDS perform deeper inspection of request packets to detect attacks against network services and network applications. More recent approaches analyze full service responses to detect attacks targeting clients. The review covers a wide range of NIDS, highlighting which classes of attack are detectable by each of these approaches. Data preprocessing is found to predominantly rely on expert domain knowledge for identifying the most relevant parts of network traffic and for constructing the initial candidate set of traffic features. On the other hand, automated methods have been widely used for feature extraction to reduce data dimensionality, and feature selection to find the most relevant subset of features from this candidate set. The review shows a trend toward deeper packet inspection to construct more relevant features through targeted content parsing. These context sensitive features are required to detect current attacks.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Antibiotics overuse is a global public health issue influenced by several factors, of which some are parent-related psychosocial factors that can only be measured using valid and reliable psychosocial measurement instruments. The PAPA scale was developed to measure these factors and the content validity of this instrument was assessed. Aim This study further validated the recently developed instrument in terms of (1) face validity and (2) construct validity including: deciding the number and nature of factors, and item selection. Methods Questionnaires were self-administered to parents of children between the ages of 0 and 12 years old. Parents were conveniently recruited from schools’ parental meetings in the Eastern Province, Saudi Arabia. Face validity was assessed with regards to questionnaire clarity and unambiguity. Construct validity and item selection processes were conducted using Exploratory factor analysis. Results Parallel analysis and Exploratory factor analysis using principal axis factoring produced six factors in the developed instrument: knowledge and beliefs, behaviours, sources of information, adherence, awareness about antibiotics resistance, and parents’ perception regarding doctors’ prescribing behaviours. Reliability was assessed (Cronbach’s alpha = 0.78) which demonstrates the instrument as being reliable. Conclusion The ‘factors’ produced in this study coincide with the constructs contextually identified in the development phase of other instruments used to study antibiotic use. However, no other study considering perceptions of antibiotic use had gone beyond content validation of such instruments. This study is the first to constructively validate the factors underlying perceptions regarding antibiotic use in any population and in parents in particular.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The high burden of parental concern in children with chronic cough has been well documented. Acute cough in children (lasting less than 2 weeks) also has a significant impact on families, reflected by the number of doctor visits for cough. Currently there is no validated acute cough specific quality of life (QOL) measure for children. The objective of this study is to develop and validate an acute cough specific QOL questionnaire (PAC-QOL) for paediatric use. Here we present our data on item selection. Methods Two independent focus groups were conducted to determine relevant items. Parents discussed the impact of their child’s current or previous episodes of acute cough on their child, themselves and their family functioning. Transcripts were analyzed to determine whether discussions had reached an item saturation point. Items were also compared against our previously validated parent-centred children’s chronic cough specific QOL questionnaire (PC-QOL), which was used as a model. The newly developed acute cough specific QOL questionnaire is designed to assess the level of frequency of parents’ feelings and worry related to their child’s acute cough, using a 24-h time-point reference. Results Newly identified acute cough specific items include parental worry around whether or not they should take their child to a doctor or emergency department, and frequency of seeking assistance from friends and family. Conclusions While there are similarities between items identified for both acute and chronic cough, there are distinct features. Further data will be collected for item reduction and validation of this children’s acute cough specific QOL questionnaire.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Textual document set has become an important and rapidly growing information source in the web. Text classification is one of the crucial technologies for information organisation and management. Text classification has become more and more important and attracted wide attention of researchers from different research fields. In this paper, many feature selection methods, the implement algorithms and applications of text classification are introduced firstly. However, because there are much noise in the knowledge extracted by current data-mining techniques for text classification, it leads to much uncertainty in the process of text classification which is produced from both the knowledge extraction and knowledge usage, therefore, more innovative techniques and methods are needed to improve the performance of text classification. It has been a critical step with great challenge to further improve the process of knowledge extraction and effectively utilization of the extracted knowledge. Rough Set decision making approach is proposed to use Rough Set decision techniques to more precisely classify the textual documents which are difficult to separate by the classic text classification methods. The purpose of this paper is to give an overview of existing text classification technologies, to demonstrate the Rough Set concepts and the decision making approach based on Rough Set theory for building more reliable and effective text classification framework with higher precision, to set up an innovative evaluation metric named CEI which is very effective for the performance assessment of the similar research, and to propose a promising research direction for addressing the challenging problems in text classification, text mining and other relative fields.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Several websites utilise a rule-base recommendation system, which generates choices based on a series of questionnaires, for recommending products to users. This approach has a high risk of customer attrition and the bottleneck is the questionnaire set. If the questioning process is too long, complex or tedious; users are most likely to quit the questionnaire before a product is recommended to them. If the questioning process is short; the user intensions cannot be gathered. The commonly used feature selection methods do not provide a satisfactory solution. We propose a novel process combining clustering, decisions tree and association rule mining for a group-oriented question reduction process. The question set is reduced according to common properties that are shared by a specific group of users. When applied on a real-world website, the proposed combined method outperforms the methods where the reduction of question is done only by using association rule mining or only by observing distribution within the group.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Progress in crop improvement is limited by the ability to identify favourable combinations of genotypes (G) and management practices (M) in relevant target environments (E) given the resources available to search among the myriad of possible combinations. To underpin yield advance we require prediction of phenotype based on genotype. In plant breeding, traditional phenotypic selection methods have involved measuring phenotypic performance of large segregating populations in multi-environment trials and applying rigorous statistical procedures based on quantitative genetic theory to identify superior individuals. Recent developments in the ability to inexpensively and densely map/sequence genomes have facilitated a shift from the level of the individual (genotype) to the level of the genomic region. Molecular breeding strategies using genome wide prediction and genomic selection approaches have developed rapidly. However, their applicability to complex traits remains constrained by gene-gene and gene-environment interactions, which restrict the predictive power of associations of genomic regions with phenotypic responses. Here it is argued that crop ecophysiology and functional whole plant modelling can provide an effective link between molecular and organism scales and enhance molecular breeding by adding value to genetic prediction approaches. A physiological framework that facilitates dissection and modelling of complex traits can inform phenotyping methods for marker/gene detection and underpin prediction of likely phenotypic consequences of trait and genetic variation in target environments. This approach holds considerable promise for more effectively linking genotype to phenotype for complex adaptive traits. Specific examples focused on drought adaptation are presented to highlight the concepts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis describes methods for the reliable identification of hadronically decaying tau leptons in the search for heavy Higgs bosons of the minimal supersymmetric standard model of particle physics (MSSM). The identification of the hadronic tau lepton decays, i.e. tau-jets, is applied to the gg->bbH, H->tautau and gg->tbH+, H+->taunu processes to be searched for in the CMS experiment at the CERN Large Hadron Collider. Of all the event selections applied in these final states, the tau-jet identification is the single most important event selection criterion to separate the tiny Higgs boson signal from a large number of background events. The tau-jet identification is studied with methods based on a signature of a low charged track multiplicity, the containment of the decay products within a narrow cone, an isolated electromagnetic energy deposition, a non-zero tau lepton flight path, the absence of electrons, muons, and neutral hadrons in the decay signature, and a relatively small tau lepton mass compared to the mass of most hadrons. Furthermore, in the H+->taunu channel, helicity correlations are exploited to separate the signal tau jets from those originating from the W->taunu decays. Since many of these identification methods rely on the reconstruction of charged particle tracks, the systematic uncertainties resulting from the mechanical tolerances of the tracking sensor positions are estimated with care. The tau-jet identification and other standard selection methods are applied to the search for the heavy neutral and charged Higgs bosons in the H->tautau and H+->taunu decay channels. For the H+->taunu channel, the tau-jet identification is redone and optimized with a recent and more detailed event simulation than previously in the CMS experiment. Both decay channels are found to be very promising for the discovery of the heavy MSSM Higgs bosons. The Higgs boson(s), whose existence has not yet been experimentally verified, are a part of the standard model and its most popular extensions. They are a manifestation of a mechanism which breaks the electroweak symmetry and generates masses for particles. Since the H->tautau and H+->taunu decay channels are important for the discovery of the Higgs bosons in a large region of the permitted parameter space, the analysis described in this thesis serves as a probe for finding out properties of the microcosm of particles and their interactions in the energy scales beyond the standard model of particle physics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of assigning customers to satellite channels is considered. Finding an optimal allocation of customers to satellite channels is a difficult combinatorial optimization problem and is shown to be NP-complete in an earlier study. We propose a genetic algorithm (GA) approach to search for the best/optimal assignment of customers to satellite channels. Various issues related to genetic algorithms such as solution representation, selection methods, genetic operators and repair of invalid solutions are presented. A comparison of this approach with the standard optimization method is presented to show the advantages of this approach in terms of computation time

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of reproductive and genetic technologies can increase the efficiency of selective breeding programs for aquaculture species. Four technologies are considered, namely: marker-assisted selection, DNA fingerprinting, in-vitro fertilization, and cryopreservation. Marker-assisted selection can result in greater genetic gain, particularly for traits difficult or expensive to measure, than conventional selection methods, but its application is currently limited by lack of high density linkage maps and by the high cost of genotyping. DNA fingerprinting is most useful for genetic tagging and parentage verification. Both in-vitro fertilization and cryopreservation techniques can increase the accuracy of selection while controlling accumulation of inbreeding in long-term selection programs. Currently, the cost associated with the utilization of reproductive and genetic techniques is possibly the most important factor limiting their use in genetic improvement programs for aquatic species.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Common carp is one of the most important cultured freshwater fish species in the world. Its production in freshwater areas is the second largest in Europe after rainbow trout. Common carp production in Europe was 146,845 t in 2004 (FAO Fishstat Plus 2006). Common carp production is concentrated mainly in Central and Eastern Europe. In Hungary, common carp has been traditionally cultured in earthen ponds since the late 19th century, following the sharp drop in catches from natural waters, due to the regulation of main river systems. Different production technologies and unintentional selection methods resulted in a wide variety of this species. Just before the intensification of rearing technology and the exchange of stocking materials among fish farms (early sixties), “landraces” of carp were collected from practically all Hungarian fish farms into a live gene bank at the Research Institute for Fisheries, Aquaculture and Irrigation (HAKI) at Szarvas (Bakos and Gorda 1995; Bakos and Gorda 2001). In order to provide highly productive hybrids for production purposes starting from 1964, different strains and crosses between Hungarian landraces were created and tested. During the last 40 years, approximately 150 two-, three-, and four-line hybrids were produced. While developing parental lines, methods of individual selection, inbreeding, backcrossing of lines, gynogenesis and sex reversal were used. This breeding program resulted in three outstanding hybrids: “Szarvas 215 mirror” and “Szarvas P31 scaly” for pond production, and “Szarvas P34 scaly” for angling waters. Besides satisfying the needs of industry, the live gene bank helped to conserve the biological diversity of Hungarian carp landraces. Fifteen Hungarian carp landraces are still maintained today in the gene bank. Through exchange programs fifteen foreign carp strains were added to the collection from Central and Eastern Europe, as well as Southeast Asia (Bakos and Gorda 2001). Besides developing the methodology to maintain live specimens in the gene bank, the National Carp Breeding Program has been initiated in cooperation with all the key stakeholders in Hungary, namely the National Association of Fish Producers (HOSZ), the National Institute for Agricultural Quality Control (OMMI), and the Research Institute for Fisheries, Aquaculture and Irrigation (HAKI). In addition, methodologies or technologies for broodstock management and carp performance testing have been developed. This National Carp Breeding Program is being implemented successfully since the mid-1990s.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hyper-spectral data allows the construction of more robust statistical models to sample the material properties than the standard tri-chromatic color representation. However, because of the large dimensionality and complexity of the hyper-spectral data, the extraction of robust features (image descriptors) is not a trivial issue. Thus, to facilitate efficient feature extraction, decorrelation techniques are commonly applied to reduce the dimensionality of the hyper-spectral data with the aim of generating compact and highly discriminative image descriptors. Current methodologies for data decorrelation such as principal component analysis (PCA), linear discriminant analysis (LDA), wavelet decomposition (WD), or band selection methods require complex and subjective training procedures and in addition the compressed spectral information is not directly related to the physical (spectral) characteristics associated with the analyzed materials. The major objective of this article is to introduce and evaluate a new data decorrelation methodology using an approach that closely emulates the human vision. The proposed data decorrelation scheme has been employed to optimally minimize the amount of redundant information contained in the highly correlated hyper-spectral bands and has been comprehensively evaluated in the context of non-ferrous material classification

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Problem of DC link size in a stiff voltage-source inverter for electric drive is described in the paper. Advantages of advanced film capacitor technology over conventional one for DC link application are reviewed. Conventional DC link capacitor selection methods are questioned in view of advanced capacitor technology utilization in stiff voltage-source inverter. For capacitor selection maximum ripple rms current point is shown. DC link ripple current spectrum analysis under modern PWM techniques is presented. Some capacitor selection recommendations are given. The analysis has been aided greatly by computer modeling in PSpice. ©2005 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computational models of visual cortex, and in particular those based on sparse coding, have enjoyed much recent attention. Despite this currency, the question of how sparse or how over-complete a sparse representation should be, has gone without principled answer. Here, we use Bayesian model-selection methods to address these questions for a sparse-coding model based on a Student-t prior. Having validated our methods on toy data, we find that natural images are indeed best modelled by extremely sparse distributions; although for the Student-t prior, the associated optimal basis size is only modestly over-complete.