960 resultados para data sets


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Longitudinal panel studies of large, random samples of business start-ups captured at the pre-operational stage allow researchers to address core issues for entrepreneurship research, namely, the processes of creation of new business ventures as well as their antecedents and outcomes. Here, we perform a methods-orientated review of all 83 journal articles that have used this type of data set, our purpose being to assist users of current data sets as well as designers of new projects in making the best use of this innovative research approach. Our review reveals a number of methods issues that are largely particular to this type of research. We conclude that amidst exemplary contributions, much of the reviewed research has not adequately managed these methods challenges, nor has it made use of the full potential of this new research approach. Specifically, we identify and suggest remedies for context-specific and interrelated methods challenges relating to sample definition, choice of level of analysis, operationalization and conceptualization, use of longitudinal data and dealing with various types of problematic heterogeneity. In addition, we note that future research can make further strides towards full utilization of the advantages of the research approach through better matching (from either direction) between theories and the phenomena captured in the data, and by addressing some under-explored research questions for which the approach may be particularly fruitful.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Technology-mediated collaboration process has been extensively studied for over a decade. Most applications with collaboration concepts reported in the literature focus on enhancing efficiency and effectiveness of the decision-making processes in objective and well-structured workflows. However, relatively few previous studies have investigated the applications of collaboration schemes to problems with subjective and unstructured nature. In this paper, we explore a new intelligent collaboration scheme for fashion design which, by nature, relies heavily on human judgment and creativity. Techniques such as multicriteria decision making, fuzzy logic, and artificial neural network (ANN) models are employed. Industrial data sets are used for the analysis. Our experimental results suggest that the proposed scheme exhibits significant improvement over the traditional method in terms of the time–cost effectiveness, and a company interview with design professionals has confirmed its effectiveness and significance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A significant proportion of the cost of software development is due to software testing and maintenance. This is in part the result of the inevitable imperfections due to human error, lack of quality during the design and coding of software, and the increasing need to reduce faults to improve customer satisfaction in a competitive marketplace. Given the cost and importance of removing errors improvements in fault detection and removal can be of significant benefit. The earlier in the development process faults can be found, the less it costs to correct them and the less likely other faults are to develop. This research aims to make the testing process more efficient and effective by identifying those software modules most likely to contain faults, allowing testing efforts to be carefully targeted. This is done with the use of machine learning algorithms which use examples of fault prone and not fault prone modules to develop predictive models of quality. In order to learn the numerical mapping between module and classification, a module is represented in terms of software metrics. A difficulty in this sort of problem is sourcing software engineering data of adequate quality. In this work, data is obtained from two sources, the NASA Metrics Data Program, and the open source Eclipse project. Feature selection before learning is applied, and in this area a number of different feature selection methods are applied to find which work best. Two machine learning algorithms are applied to the data - Naive Bayes and the Support Vector Machine - and predictive results are compared to those of previous efforts and found to be superior on selected data sets and comparable on others. In addition, a new classification method is proposed, Rank Sum, in which a ranking abstraction is laid over bin densities for each class, and a classification is determined based on the sum of ranks over features. A novel extension of this method is also described based on an observed polarising of points by class when rank sum is applied to training data to convert it into 2D rank sum space. SVM is applied to this transformed data to produce models the parameters of which can be set according to trade-off curves to obtain a particular performance trade-off.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Optimum Wellness involves the development, refinement and practice of lifestyle choices which resonate with personally meaningful frames of reference. Personal transformations are the means by which our frames of reference are refined across the lifespan. It is through critical reflection, supportive relationships and meaning making of our experiences that we construct and reconstruct our life paths. When individuals are able to be what they are destined to be or reach their higher purpose, then they are able to contribute to the world in positive and meaningful ways. Transformative education facilitates the changes in perspective that enable one to contemplate and travel a path in life that leads to self-actualisation. This thesis argues for an integrated theoretical framework for optimum Wellness Education. It establishes a learner centred approach to Wellness education in the form of an integrated instructional design framework derived from both Wellness and Transformative education constructs. Students’ approaches to learning and their study strategies in a Wellness education context serve to highlight convergences in the manner in which students can experience perspective transformation. As they learn to critically reflect, pursue relationships and adapt their frames of reference to sustain their pursuit of both learning and Wellness goals, strengthening the nexus between instrumental and transformative learning is a strategically important goal for educators. The aim of this exploratory research study was to examine those facets that serve to optimise the learning experiences of students in a Wellness course. This was accomplished through three research issues: 1) What are the relationships between Wellness, approaches to learning and academic success? 2) How are students approaching learning in an undergraduate Wellness subject? Why are students approaching their learning in the ways they do? 3) What sorts of transformations are students experiencing in their Wellness? How can transformative education be formulated in the context of an undergraduate Wellness subject? Subsequent to a thorough review of the literature pertaining to Wellness education, a mixed method embedded case study design was formulated to explore the research issues. This thesis examines the interrelationships between student, content and context in a one semester university undergraduate unit (a coherent set of learning activities which is assigned a unit code and a credit point value). The experiences of a cohort of 285 undergraduate students in a Wellness course formed the unit of study and seven individual students from a total of sixteen volunteers whose profiles could be constructed from complete data sets were selected for analysis as embedded cases. The introductory level course required participants to engage in a personal project involving a behaviour modification plan for a self-selected, single dimension of Wellness. Students were given access to the Standard Edition Testwell Survey to assess and report their Wellness as a part of their personal projects. To identify relationships among the constructs of Self-Regulated Learning (SRL), Wellness and Student Approaches to Learning (SAL) a blend of quantitative and qualitative methods to collect and analyse data was formulated. Surveys were the primary instruments for acquiring quantitative data. Sources included the Wellness data from Testwell surveys, SAL data from R-SPQ surveys, SRL data from MSLQ surveys and student self-evaluation data from an end of semester survey. Students’ final grades and GPA scores were used as indicators of academic performance. The sources of qualitative data included subject documentation, structured interview transcripts and open-ended responses to survey items. Subsequent to a pilot study in which survey reliability and validity were tested in context, amendments to processes for and instruments of data collection were made. Students who adopted meaning oriented (deep/achieving) approaches tended to assess their Wellness at a higher level, seek effective learning strategies and perform better in formal study. Posttest data in the main study revealed that there were significant positive statistical relationships between academic performance and total wellness scores (rs=.297, n=205, p<.01). Deep (rs=.343, n=137, p<.01) and achieving (rs=.286, n=123, p<.01) approaches to learning also significantly correlated with Wellness whilst surface approaches had negative correlations that were not significant. SRL strategies including metacognitive selfregulation, effort, help-seeking and critical thinking were increasingly correlated with Wellness. Qualitative findings suggest that while all students adopt similar patterns of day to day activities for example attending classes, taking notes, working on assignments the level of care with which these activities is undertaken varies considerably. The dominant motivational trigger for students in this cohort was the personal relevance and associated benefits of the material being learned and practiced. Students were inclined to set goals that had a positive impact on affect and used “sense of happiness” to evaluate their achievement status. Students who had a higher drive to succeed and/or understand tended to have or seek a wider range of strategies. Their goal orientations were generally learning rather than performance based and barriers presented a challenge which could be overcome as opposed to a blockage which prevented progress. Findings from an empirical analysis of the Testwell data suggest that a single third order Wellness construct exists. A revision of the instrument is necessary in order to juxtapose it with the chosen six dimensional Wellness model that forms the foundation construct in the course. Further, redevelopment should be sensitive to the Australian context and culture including choice of language, examples and scenarios used in item construction. This study concludes with an heuristic for use in Wellness education. Guided by principles of Transformative education theory and behaviour change theory, and informed by this representative case study the “CARING” heuristic is proposed as an instructional design tool for Wellness educators seeking to foster transformative learning. Based upon this study, recommendations were made for university educators to provide authentic and personal experiences in Wellness curricula. Emphasis must focus on involving students and teachers in a partnership for implementing Wellness programs both in the curriculum and co-curricularly. The implications of this research for practice are predicated on the willingness of academics to embrace transformative learning at a personal level and a professional one. To explore students’ profiles in detail is not practical however teaching students how to guide us in supporting them through the “pain” of learning is a skill which would benefit them and optimise the learning and teaching process. At a theoretical level, this research contributes to an ecological theory of Wellness education as transformational change. By signposting the wider contexts in which learning takes place, it seeks to encourage changing paradigms to ones which harness the energy of each successive contextual layer in which students live. Future research which amplifies the qualities of individuals and groups who are “Well” and seeks the refinement and development of instruments to measure Wellness constructs would be desirable for both theoretical and applied knowledge bases. Mixed method Wellness research derived and conducted by teams that incorporate expertise from multiple disciplines such as psychology, anthropology, education, and medicine would enable creative and multi-perspective programs of investigation to be designed and implemented. Congruences and inconsistencies in health promotion and education would provide valuable material for strengthening the nexus between transformational learning and behaviour change theories. Future development of and research on the effectiveness of the CARING heuristic would be valuable in advancing the understanding of pedagogies which advance rather than impede learning as a transformative process. Exploring pedagogical models that marry with transformative education may render solutions to the vexing challenge of teaching and learning in diverse contexts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Australian e-Health Research Centre in collaboration with the Queensland University of Technology's Paediatric Spine Research Group is developing software for visualisation and manipulation of large three-dimensional (3D) medical image data sets. The software allows the extraction of anatomical data from individual patients for use in preoperative planning. State-of-the-art computer technology makes it possible to slice through the image dataset at any angle, or manipulate 3D representations of the data instantly. Although the software was initially developed to support planning for scoliosis surgery, it can be applied to any dataset whether obtained from computed tomography, magnetic resonance imaging or any other imaging modality.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a fault diagnosis method based on adaptive neuro-fuzzy inference system (ANFIS) in combination with decision trees. Classification and regression tree (CART) which is one of the decision tree methods is used as a feature selection procedure to select pertinent features from data set. The crisp rules obtained from the decision tree are then converted to fuzzy if-then rules that are employed to identify the structure of ANFIS classifier. The hybrid of back-propagation and least squares algorithm are utilized to tune the parameters of the membership functions. In order to evaluate the proposed algorithm, the data sets obtained from vibration signals and current signals of the induction motors are used. The results indicate that the CART–ANFIS model has potential for fault diagnosis of induction motors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Phylogeographic reconstruction of some bacterial populations is hindered by low diversity coupled with high levels of lateral gene transfer. A comparison of recombination levels and diversity at seven housekeeping genes for eleven bacterial species, most of which are commonly cited as having high levels of lateral gene transfer shows that the relative contributions of homologous recombination versus mutation for Burkholderia pseudomallei is over two times higher than for Streptococcus pneumoniae and is thus the highest value yet reported in bacteria. Despite the potential for homologous recombination to increase diversity, B. pseudomallei exhibits a relative lack of diversity at these loci. In these situations, whole genome genotyping of orthologous shared single nucleotide polymorphism loci, discovered using next generation sequencing technologies, can provide very large data sets capable of estimating core phylogenetic relationships. We compared and searched 43 whole genome sequences of B. pseudomallei and its closest relatives for single nucleotide polymorphisms in orthologous shared regions to use in phylogenetic reconstruction. Results Bayesian phylogenetic analyses of >14,000 single nucleotide polymorphisms yielded completely resolved trees for these 43 strains with high levels of statistical support. These results enable a better understanding of a separate analysis of population differentiation among >1,700 B. pseudomallei isolates as defined by sequence data from seven housekeeping genes. We analyzed this larger data set for population structure and allele sharing that can be attributed to lateral gene transfer. Our results suggest that despite an almost panmictic population, we can detect two distinct populations of B. pseudomallei that conform to biogeographic patterns found in many plant and animal species. That is, separation along Wallace's Line, a biogeographic boundary between Southeast Asia and Australia. Conclusion We describe an Australian origin for B. pseudomallei, characterized by a single introduction event into Southeast Asia during a recent glacial period, and variable levels of lateral gene transfer within populations. These patterns provide insights into mechanisms of genetic diversification in B. pseudomallei and its closest relatives, and provide a framework for integrating the traditionally separate fields of population genetics and phylogenetics for other bacterial species with high levels of lateral gene transfer.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Electrocardiogram (ECG) is an important bio-signal representing the sum total of millions of cardiac cell depolarization potentials. It contains important insight into the state of health and nature of the disease afflicting the heart. Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart by the sympathetic and parasympathetic branches of the autonomic nervous system. The HRV signal can be used as a base signal to observe the heart's functioning. These signals are non-linear and non-stationary in nature. So, higher order spectral (HOS) analysis, which is more suitable for non-linear systems and is robust to noise, was used. An automated intelligent system for the identification of cardiac health is very useful in healthcare technology. In this work, we have extracted seven features from the heart rate signals using HOS and fed them to a support vector machine (SVM) for classification. Our performance evaluation protocol uses 330 subjects consisting of five different kinds of cardiac disease conditions. We demonstrate a sensitivity of 90% for the classifier with a specificity of 87.93%. Our system is ready to run on larger data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Analysis of either footprints or footwear impressions which have been recovered from a crime scene is a well known and well accepted part of forensic investigation. When this evidence is obtained by investigating officers, comparative analysis to a suspect’s evidence may be undertaken. This can be done either by the detectives or in some cases, podiatrists with experience in forensic analysis. Frequently asked questions of a podiatrist include; “What additional information should be collected from a suspect (for the purposes of comparison), and how should it be collected?” This paper explores the answers to these and related questions based on 20 years of practical experience in the field of crime scene analysis as it relates to podiatry and forensics. Elements of normal and abnormal foot function are explored and used to explain the high degree of variability in wear patterns produced by the interaction of the foot and footwear. Based on this understanding the potential for identifying unique features of the user and correlating this to footwear evidence becomes apparent. Standard protocols adopted by podiatrists allow for more precise, reliable, and valid results to be obtained from their analysis. Complex data sets are now being obtained by investigating officers and, in collaboration with the podiatrist; higher quality conclusions are being achieved. This presentation details the results of investigations which have used standard protocols to collect and analyse footwear and suspects of recent major crimes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Online learning algorithms have recently risen to prominence due to their strong theoretical guarantees and an increasing number of practical applications for large-scale data analysis problems. In this paper, we analyze a class of online learning algorithms based on fixed potentials and nonlinearized losses, which yields algorithms with implicit update rules. We show how to efficiently compute these updates, and we prove regret bounds for the algorithms. We apply our formulation to several special cases where our approach has benefits over existing online learning methods. In particular, we provide improved algorithms and bounds for the online metric learning problem, and show improved robustness for online linear prediction problems. Results over a variety of data sets demonstrate the advantages of our framework.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a method of spatial sampling based on stratification by Local Moran’s I i calculated using auxiliary information. The sampling technique is compared to other design-based approaches including simple random sampling, systematic sampling on a regular grid, conditional Latin Hypercube sampling and stratified sampling based on auxiliary information, and is illustrated using two different spatial data sets. Each of the samples for the two data sets is interpolated using regression kriging to form a geostatistical map for their respective areas. The proposed technique is shown to be competitive in reproducing specific areas of interest with high accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we propose a search-based approach to join two tables in the absence of clean join attributes. Non-structured documents from the web are used to express the correlations between a given query and a reference list. To implement this approach, a major challenge we meet is how to efficiently determine the number of times and the locations of each clean reference from the reference list that is approximately mentioned in the retrieved documents. We formalize the Approximate Membership Localization (AML) problem and propose an efficient partial pruning algorithm to solve it. A study using real-word data sets demonstrates the effectiveness of our search-based approach, and the efficiency of our AML algorithm.