Biblioteca Digital

927 resultados para THRESHOLD SELECTION METHOD

A survey of UK selection practices across different organization sizes and industry sectors

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents results of a study examining the methods used to select employees in 579 UK organizations representing a range of different organization sizes and industry sectors. Overall, a smaller proportion of organizations in this sample reported using formalized methods (e.g., assessment centres) than informal methods (e.g., unstructured interviews). The curriculum vitae (CVs) was the most commonly used selection method, followed by the traditional triad of application form, interviews, and references. Findings also indicated that the use of different selection methods was similar in both large organizations and small-to-medium-sized enterprises. Differences were found across industry sector with public and voluntary sectors being more likely to use formalized techniques (e.g., application forms rather than CVs and structured rather than unstructured interviews). The results are discussed in relation to their implications, both in terms of practice and future research.

Information theoretic prototype selection for unattributed graphs

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we propose a prototype size selection method for a set of sample graphs. Our first contribution is to show how approximate set coding can be extended from the vector to graph domain. With this framework to hand we show how prototype selection can be posed as optimizing the mutual information between two partitioned sets of sample graphs. We show how the resulting method can be used for prototype graph size selection. In our experiments, we apply our method to a real-world dataset and investigate its performance on prototype size selection tasks. © 2012 Springer-Verlag Berlin Heidelberg.

Distributed Feature Selection in Large n and Large p Regression Problems

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Search engine content analysis

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Search engines have forever changed the way people access and discover knowledge, allowing information about almost any subject to be quickly and easily retrieved within seconds. As increasingly more material becomes available electronically the influence of search engines on our lives will continue to grow. This presents the problem of how to find what information is contained in each search engine, what bias a search engine may have, and how to select the best search engine for a particular information need. This research introduces a new method, search engine content analysis, in order to solve the above problem. Search engine content analysis is a new development of traditional information retrieval field called collection selection, which deals with general information repositories. Current research in collection selection relies on full access to the collection or estimations of the size of the collections. Also collection descriptions are often represented as term occurrence statistics. An automatic ontology learning method is developed for the search engine content analysis, which trains an ontology with world knowledge of hundreds of different subjects in a multilevel taxonomy. This ontology is then mined to find important classification rules, and these rules are used to perform an extensive analysis of the content of the largest general purpose Internet search engines in use today. Instead of representing collections as a set of terms, which commonly occurs in collection selection, they are represented as a set of subjects, leading to a more robust representation of information and a decrease of synonymy. The ontology based method was compared with ReDDE (Relevant Document Distribution Estimation method for resource selection) using the standard R-value metric, with encouraging results. ReDDE is the current state of the art collection selection method which relies on collection size estimation. The method was also used to analyse the content of the most popular search engines in use today, including Google and Yahoo. In addition several specialist search engines such as Pubmed and the U.S. Department of Agriculture were analysed. In conclusion, this research shows that the ontology based method mitigates the need for collection size estimation.

Multiple camera management using wide baseline matching

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Camera calibration information is required in order for multiple camera networks to deliver more than the sum of many single camera systems. Methods exist for manually calibrating cameras with high accuracy. Manually calibrating networks with many cameras is, however, time consuming, expensive and impractical for networks that undergo frequent change. For this reason, automatic calibration techniques have been vigorously researched in recent years. Fully automatic calibration methods depend on the ability to automatically find point correspondences between overlapping views. In typical camera networks, cameras are placed far apart to maximise coverage. This is referred to as a wide base-line scenario. Finding sufficient correspondences for camera calibration in wide base-line scenarios presents a significant challenge. This thesis focuses on developing more effective and efficient techniques for finding correspondences in uncalibrated, wide baseline, multiple-camera scenarios. The project consists of two major areas of work. The first is the development of more effective and efficient view covariant local feature extractors. The second area involves finding methods to extract scene information using the information contained in a limited set of matched affine features. Several novel affine adaptation techniques for salient features have been developed. A method is presented for efficiently computing the discrete scale space primal sketch of local image features. A scale selection method was implemented that makes use of the primal sketch. The primal sketch-based scale selection method has several advantages over the existing methods. It allows greater freedom in how the scale space is sampled, enables more accurate scale selection, is more effective at combining different functions for spatial position and scale selection, and leads to greater computational efficiency. Existing affine adaptation methods make use of the second moment matrix to estimate the local affine shape of local image features. In this thesis, it is shown that the Hessian matrix can be used in a similar way to estimate local feature shape. The Hessian matrix is effective for estimating the shape of blob-like structures, but is less effective for corner structures. It is simpler to compute than the second moment matrix, leading to a significant reduction in computational cost. A wide baseline dense correspondence extraction system, called WiDense, is presented in this thesis. It allows the extraction of large numbers of additional accurate correspondences, given only a few initial putative correspondences. It consists of the following algorithms: An affine region alignment algorithm that ensures accurate alignment between matched features; A method for extracting more matches in the vicinity of a matched pair of affine features, using the alignment information contained in the match; An algorithm for extracting large numbers of highly accurate point correspondences from an aligned pair of feature regions. Experiments show that the correspondences generated by the WiDense system improves the success rate of computing the epipolar geometry of very widely separated views. This new method is successful in many cases where the features produced by the best wide baseline matching algorithms are insufficient for computing the scene geometry.

Fast rates for estimation error and oracle inequalities for model section

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider complexity penalization methods for model selection. These methods aim to choose a model to optimally trade off estimation and approximation errors by minimizing the sum of an empirical risk term and a complexity penalty. It is well known that if we use a bound on the maximal deviation between empirical and true risks as a complexity penalty, then the risk of our choice is no more than the approximation error plus twice the complexity penalty. There are many cases, however, where complexity penalties like this give loose upper bounds on the estimation error. In particular, if we choose a function from a suitably simple convex function class with a strictly convex loss function, then the estimation error (the difference between the risk of the empirical risk minimizer and the minimal risk in the class) approaches zero at a faster rate than the maximal deviation between empirical and true risks. In this paper, we address the question of whether it is possible to design a complexity penalized model selection method for these situations. We show that, provided the sequence of models is ordered by inclusion, in these cases we can use tight upper bounds on estimation error as a complexity penalty. Surprisingly, this is the case even in situations when the difference between the empirical risk and true risk (and indeed the error of any estimate of the approximation error) decreases much more slowly than the complexity penalty. We give an oracle inequality showing that the resulting model selection method chooses a function with risk no more than the approximation error plus a constant times the complexity penalty.

Quantification of the accuracy of MRI generated 3D models of long bones compared to CT generated 3D models

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Orthopaedic fracture fixation implants are increasingly being designed using accurate 3D models of long bones based on computer tomography (CT). Unlike CT, magnetic resonance imaging (MRI) does not involve ionising radiation and is therefore a desirable alternative to CT. This study aims to quantify the accuracy of MRI-based 3D models compared to CT-based 3D models of long bones. The femora of five intact cadaver ovine limbs were scanned using a 1.5T MRI and a CT scanner. Image segmentation of CT and MRI data was performed using a multi-threshold segmentation method. Reference models were generated by digitising the bone surfaces free of soft tissue with a mechanical contact scanner. The MRI- and CT-derived models were validated against the reference models. The results demonstrated that the CT-based models contained an average error of 0.15mm while the MRI-based models contained an average error of 0.23mm. Statistical validation shows that there are no significant differences between 3D models based on CT and MRI data. These results indicate that the geometric accuracy of MRI based 3D models was comparable to that of CT-based models and therefore MRI is a potential alternative to CT for generation of 3D models with high geometric accuracy.

Determining the appropriate proportion of owner-provided design in design-build contracts - a content analysis approach

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Request For Proposal (RFP) with the design‐build (DB) procurement arrangement is a document in which an owner develops his requirements and conveys the project scope to DB contractors. Owners should provide an appropriate level of design in DB RFPs to adequately describe their requirements without compromising the prospects for innovation. This paper examines and compares the different levels of owner‐provided design in DB RFPs by the content analysis of 84 requests for RFPs for public DB projects advertised between 2000 and 2010 with an aggregate contract value of over $5.4 billion. A statistical analysis was also conducted in order to explore the relationship between the proportion of owner‐provided design and other project information, including project type, advertisement time, project size, contractor selection method, procurement process and contract type. The results show that the majority (64.8%) of the RFPs provide less than 10% of the owner‐provided design. The owner‐provided design proportion has a significant association with project type, project size, contractor selection method and contract type. In addition, owners are generally providing less design in recent years than hitherto. The research findings also provide owners with perspectives to determine the appropriate level of owner‐provided design in DB RFPs.

Correction of step artefact associated with MRI scanning of long bones

Relevância:

80.00% 80.00%

Publicador:

Resumo:

3D models of long bones are being utilised for a number of fields including orthopaedic implant design. Accurate reconstruction of 3D models is of utmost importance to design accurate implants to allow achieving a good alignment between two bone fragments. Thus for this purpose, CT scanners are employed to acquire accurate bone data exposing an individual to a high amount of ionising radiation. Magnetic resonance imaging (MRI) has been shown to be a potential alternative to computed tomography (CT) for scanning of volunteers for 3D reconstruction of long bones, essentially avoiding the high radiation dose from CT. In MRI imaging of long bones, the artefacts due to random movements of the skeletal system create challenges for researchers as they generate inaccuracies in the 3D models generated by using data sets containing such artefacts. One of the defects that have been observed during an initial study is the lateral shift artefact occurring in the reconstructed 3D models. This artefact is believed to result from volunteers moving the leg during two successive scanning stages (the lower limb has to be scanned in at least five stages due to the limited scanning length of the scanner). As this artefact creates inaccuracies in the implants designed using these models, it needs to be corrected before the application of 3D models to implant design. Therefore, this study aimed to correct the lateral shift artefact using 3D modelling techniques. The femora of five ovine hind limbs were scanned with a 3T MRI scanner using a 3D vibe based protocol. The scanning was conducted in two halves, while maintaining a good overlap between them. A lateral shift was generated by moving the limb several millimetres between two scanning stages. The 3D models were reconstructed using a multi threshold segmentation method. The correction of the artefact was achieved by aligning the two halves using the robust iterative closest point (ICP) algorithm, with the help of the overlapping region between the two. The models with the corrected artefact were compared with the reference model generated by CT scanning of the same sample. The results indicate that the correction of the artefact was achieved with an average deviation of 0.32 ± 0.02 mm between the corrected model and the reference model. In comparison, the model obtained from a single MRI scan generated an average error of 0.25 ± 0.02 mm when compared with the reference model. An average deviation of 0.34 ± 0.04 mm was seen when the models generated after the table was moved were compared to the reference models; thus, the movement of the table is also a contributing factor to the motion artefacts.

Fast exact nearest neighbour matching in high dimensions using d-D Sort

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Data structures such as k-D trees and hierarchical k-means trees perform very well in approximate k nearest neighbour matching, but are only marginally more effective than linear search when performing exact matching in high-dimensional image descriptor data. This paper presents several improvements to linear search that allows it to outperform existing methods and recommends two approaches to exact matching. The first method reduces the number of operations by evaluating the distance measure in order of significance of the query dimensions and terminating when the partial distance exceeds the search threshold. This method does not require preprocessing and significantly outperforms existing methods. The second method improves query speed further by presorting the data using a data structure called d-D sort. The order information is used as a priority queue to reduce the time taken to find the exact match and to restrict the range of data searched. Construction of the d-D sort structure is very simple to implement, does not require any parameter tuning, and requires significantly less time than the best-performing tree structure, and data can be added to the structure relatively efficiently.

Achieving world leadership in sustainable buildings - a Brisbane case study

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In Australia, the building and construction industry is taking significant steps towards the enhancement of environmental performance of the built environment. A large number of world class sustainable buildings have been constructed in recent years, offering researchers and practitioners alike a good opportunity to identify the best practices and real life experiences in delivering high performance buildings. A case study of ONE ONE ONE Eagle Street, a 6 Star Green Star office building in Brisbane, was conducted to investigate the best practice in achieving this “world leader” green office building. The study identified a number of key factors relating to project delivery system, contractor selection method, client’s early commitment, design integration, communication as major contributors to the successful delivery of this project. Additionally, key environmentally sustainable features and their cost implications were explored through in-depth interviews with the main contractor. The findings of this study will shed lights on the successful delivery of sustainable buildings and provide practical implications for different stakeholders.

Quantitative estimation of bioturbation based on digital image analysis

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Quantitative determination of modification of primary sediment features, by the activity of organisms (i.e., bioturbation) is essential in geosciences. Some methods proposed since the 1960s are mainly based on visual or subjective determinations. The first semiquantitative evaluations of the Bioturbation Index, Ichnofabric Index, or the amount of bioturbation were attempted, in the best cases using a series of flashcards designed in different situations. Recently, more effective methods involve the use of analytical and computational methods such as X-rays, magnetic resonance imaging or computed tomography; these methods are complex and often expensive. This paper presents a compilation of different methods, using Adobe® Photoshop® software CS6, for digital estimation that are a part of the IDIAP (Ichnological Digital Analysis Images Package), which is an inexpensive alternative to recently proposed methods, easy to use, and especially recommended for core samples. The different methods — “Similar Pixel Selection Method (SPSM)”, “Magic Wand Method (MWM)” and the “Color Range Selection Method (CRSM)” — entail advantages and disadvantages depending on the sediment (e.g., composition, color, texture, porosity, etc.) and ichnological features (size of traces, infilling material, burrow wall, etc.). The IDIAP provides an estimation of the amount of trace fossils produced by a particular ichnotaxon, by a whole ichnocoenosis or even for a complete ichnofabric. We recommend the application of the complete IDIAP to a given case study, followed by selection of the most appropriate method. The IDIAP was applied to core material recovered from the IODP Expedition 339, enabling us, for the first time, to arrive at a quantitative estimation of the discrete trace fossil assemblage in core samples.

A convolutional neural network for automatic analysis of aerial imagery

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper introduces a new method to automate the detection of marine species in aerial imagery using a Machine Learning approach. Our proposed system has at its core, a convolutional neural network. We compare this trainable classifier to a handcrafted classifier based on color features, entropy and shape analysis. Experiments demonstrate that the convolutional neural network outperforms the handcrafted solution. We also introduce a negative training example-selection method for situations where the original training set consists of a collection of labeled images in which the objects of interest (positive examples) have been marked by a bounding box. We show that picking random rectangles from the background is not necessarily the best way to generate useful negative examples with respect to learning.

Domain Adaptation on the Statistical Manifold

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions be- come similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that prob- ability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this man- ifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the cor- responding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of- the-art results on a standard object recognition benchmark.

Quantitative and molecular genetics of juvenile wood traits in radiata and slash/Caribbean pines.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Juvenile Wood Initiative (JWI) project has been running successfully since July 2003 under a Research Agreement with FWPA and Letters of Association with the consortium partners STBA (Southern Tree Breeding Association), ArborGen and FPQ (Forestry Plantations Queensland). Over the last five and half years, JWI scientists in CSIRO, FPQ, and STBA have completed all 12 major milestones and 28 component milestones according to the project schedule. We have made benchmark progress in understanding the genetic control of wood formation and interrelationships among wood traits. The project has made 15 primary scientific findings and several results have been adopted by industry as summarized below. This progress was detailed in 10 technical reports to funding organizations and industry clients. Team scientists produced 16 scientific manuscripts (8 published, 1 in press, 2 submitted, and several others in the process of submission) and 15 conference papers or presentations. Primary Scientific Findings. The 15 major scientific findings related to wood science, inheritance and the genetic basis of juvenile wood traits are: 1. An optimal method to predict stiffness of standing trees in slash/Caribbean pine is to combine gravimetric basic density from 12 mm increment cores with a standing tree prediction of MoE using a time of flight acoustic tool. This was the most accurate and cheapest way to rank trees for breeding selection for slash/Caribbean hybrid pine. This method was also recommended for radiata pine. 2. Wood density breeding values were predicted for the first time in the STBA breeding population using a large sample of 7,078 trees (increment cores) and it was estimated that selection of the best 250 trees for deployment will produce wood density gains of 12.4%. 3. Large genetic variation for a suite of wood quality traits including density, MFA, spiral grain, shrinkage, acoustic and non-acoustic stiffness (MoE) for clear wood and standing trees were observed. Genetic gains of between 8 and 49% were predicted for these wood quality traits with selection intensity between 1 to 10% for radiata pine. 4. Site had a major effect on juvenile-mature wood transition age and the effect of selective breeding for a shorter juvenile wood formation phase was only moderate (about 10% genetic gain with 10% selection intensity, equivalent to about 2 years reduction of juvenile wood). 5. The study found no usable site by genotype interactions for the wood quality traits of density, MFA and MoE for both radiata and slash/Caribbean pines, suggesting that assessment of wood properties on one or two sites will provide reliable estimates of the genetic worth of individuals for use in future breeding. 6. There were significant and sizable genotype by environment interactions between the mainland and Tasmanian regions and within Tasmania for DBH and branch size. 7. Strong genetic correlations between rings for density, MFA and MoE for both radiata and slash/Caribbean pines were observed. This suggests that selection for improved wood properties in the innermost rings would also result in improvement of wood properties in the subsequent rings, as well as improved average performance of the entire core. 8. Strong genetic correlations between pure species and hybrid performance for each of the wood quality traits were observed in the hybrid pines. Parental performance can be used to identify the hybrid families which are most likely to have superior juvenile wood properties of the slash/Caribbean F1 hybrid in southeast Queensland. 9. Large unfavourable genetic correlations between growth and wood quality traits were a prominent feature in radiata pine, indicating that overcoming this unfavourable genetic correlation will be a major technical issue in progressing radiata pine breeding. 10. The project created the first radiata pine 18 k cDNA microarray and generated 5,952 radiata pine xylogenesis expressed sequence tags (ESTs) which assembled into 3,304 unigenes. 11. A total of 348 genes were identified as preferentially expressed genes in earlywood or latewood while a total of 168 genes were identified as preferentially expressed genes in either juvenile or mature wood. 12. Juvenile earlywood has a distinct transcriptome relative to other stages of wood development. 13. Discovered rapid decay of linkage disequilibrium (LD) in radiata pine with LD decaying to approximately 50% within 1,700 base pairs (within a typical gene). A total of 913 SNPS from sequencing 177,380 base pairs were identified for association genetic studies. 14. 149 SNPs from 44 genes and 255 SNPs from a further 51 genes (total 95 genes) were selected for association analysis with 62 wood traits, and 30 SNPs were shortlisted for their significant association with variation of wood quality traits (density, MFA and MoE) with individual significant SNPs accounting for between 1.9 and 9.7% of the total genetic variation in traits. 15. Index selection using breeding objectives was the most profitable selection method for radiata pine, but in the long term it may not be the most effective in dealing with negative genetic correlations between wood volume and quality traits. A combination of economic and biological approaches may be needed to deal with the strong adverse correlation.

«
1
2
3
4
5
6
7
8
...
61
62
»