41 resultados para causal proximity

em Deakin Research Online - Australia


20.00% 20.00%



Software reuse is an important topic due to its potential benefits in increasing product quality and decreasing cost. Although more and more people are aware that not only technical issues, but also nontechnical issues are important to the success of software reuse, people are still not certain which factors will have direct effect on the success of reuse. In this paper, we applied a causal discovery algorithm to the software reuse survey data [2]. Ensemble strategy is incorporated to locate a probable causal model structure for software reuse, and find all those factors which have direct effect on the success of reuse. Our discovery results reinforced some conclusions of Morisio et al. and found some new conclusions which might significantly improve the odds of a reuse project succeeding.


20.00% 20.00%



Efficiently inducing precise causal models accurately reflecting given data sets is the ultimate goal of causal discovery. The algorithms proposed by Dai et al. has demonstrated the ability of the Minimum Message Length (MML) principle in discovering Linear Causal Models from training data. In order to further explore ways to improve efficiency, this paper incorporates the Hoeffding Bounds into the learning process. At each step of causal discovery, if a small number of data items is enough to distinguish the better model from the rest, the computation cost will be reduced by ignoring the other data items. Experiments with data set from related benchmark models indicate that the new algorithm achieves speedup over previous work in terms of learning efficiency while preserving the discovery accuracy.


20.00% 20.00%



This paper presents an ensemble MML approach for the discovery of causal models. The component learners are formed based on the MML causal induction methods. Six different ensemble causal induction algorithms are proposed. Our experiential results reveal that (1) the ensemble MML causal induction approach has achieved an improved result compared with any single learner in terms of learning accuracy and correctness; (2) Among all the ensemble causal induction algorithms examined, the weighted voting without seeding algorithm outperforms all the rest; (3) It seems that the ensembled CI algorithms could alleviate the local minimum problem. The only drawback of this method is that the time complexity is increased by δ times, where δ is the ensemble size.


20.00% 20.00%



OBJECTIVE: To clarify relationships between body mass index (BMI) and self-esteem in young children at a population level. To assess whether low self-esteem precedes or follows development of overweight/obesity in children. DESIGN: Prospective cohort study in elementary schools throughout Victoria, Australia. Child BMI and self-esteem were measured in 1997 and 2000. SUBJECTS: Random sample of 1,157 children who were in the first 4 y of elementary school (aged 5-10 y) at baseline. MEASURES: BMI was calculated from measured height and weight, then transformed to z-scores. Children were classified as nonoverweight, overweight or obese based on international cut-points. Low child self-esteem was defined as a score below the 15th percentile on the self-esteem subscale of the parent-reported Child Health Questionnaire. RESULTS: Overweight/obese children had lower median self-esteem scores than nonoverweight children at both timepoints, especially at follow-up. After accounting for baseline self-esteem, higher baseline BMI z-score predicted poorer self-esteem at follow-up (P=0.008). After accounting for baseline BMI z-score, poorer baseline self-esteem did not predict higher BMI z-score at follow-up. While nonoverweight children with low baseline self-esteem were more likely to develop overweight/obesity (OR=2.1, 95% CI=1.2, 3.6), this accounted for only a small proportion of the incidence of overweight. CONCLUSIONS: Our data show an increasingly strong association between lower self-esteem and higher body mass across the elementary school years. Overweight/obesity precedes low self-esteem in many children, suggesting a causal relationship. This indicates that prevention and management strategies for childhood overweight/obesity need to begin early to minimise the impact on self-esteem.


20.00% 20.00%



20.00% 20.00%



Determining the causal structure of a domain is a key task in the area of Data Mining and Knowledge Discovery.The algorithm proposed by Wallace et al. [15] has demonstrated its strong ability in discovering Linear Causal Models from given data sets. However, some experiments showed that this algorithm experienced difficulty in discovering linear relations with small deviation, and it occasionally gives a negative message length, which should not be allowed. In this paper, a more efficient and precise MML encoding scheme is proposed to describe the model structure and the nodes in a Linear Causal Model. The estimation of different parameters is also derived. Empirical results show that the new algorithm outperformed the previous MML-based algorithm in terms of both speed and precision.


20.00% 20.00%



Discovering a precise causal structure accurately reflecting the given data is one of the most essential tasks in the area of data mining and machine learning. One of the successful causal discovery approaches is the information-theoretic approach using the Minimum Message Length Principle[19]. This paper presents an improved and further experimental results of the MML discovery algorithm. We introduced a new encoding scheme for measuring the cost of describing the causal structure. Stiring function is also applied to further simplify the computational complexity and thus works more efficiently. The experimental results of the current version of the discovery system show that: (1) the current version is capable of discovering what discovered by previous system; (2) current system is capable of discovering more complicated causal models with large number of variables; (3) the new version works more efficiently compared with the previous version in terms of time complexity.


20.00% 20.00%



Efficiently inducing precise causal models accurately reflecting given data sets is the ultimate goal of causal discovery. The algorithm proposed by Wallace et al. [10] has demonstrated its ability in discovering Linear Causal Models from data. To explore the ways to improve efficiency, this research examines three different encoding schemes and four searching strategies. The experimental results reveal that (1) specifying parents encoding method is the best among three encoding methods we examined; (2) In the discovery of linear causal models, local Hill climbing works very well compared to other more sophisticated methods, like Markov Chain Monte Carto (MCMC), Genetic Algorithm (GA) and Parallel MCMC searching.


20.00% 20.00%



The approaches proposed in the past for discovering sequential patterns mainly focused on single sequential data. In the real world, however, some sequential patterns hide their essences among multi-sequential event data. It has been noted that knowledge discovery with either user-specified constraints, or templates, or skeletons is receiving wide attention because it is more efficient and avoids the tedious selection of useful patterns from the mass-produced results. In this paper, a novel pattern in multi-sequential event data that are correlated and its mining approach are presented. We call this pattern sequential causal pattern. A group of skeletons of sequential causal patterns, which may be specified by the user or generated by the program, are verified or mined by embedding them into the mining engine. Experiments show that this method, when applied to discovering the occurring regularities of a crop pest in a region, is successful in mining sequential causal patterns with user-specified skeletons in multi-sequential event data.


20.00% 20.00%



This paper presents an examination report on the performance of the improved MML based causal model discovery algorithm. In this paper, We firstly describe our improvement to the causal discovery algorithm which introduces a new encoding scheme for measuring the cost of describing the causal structure. Stiring function is also applied to further simplify the computational complexity and thus works more efficiently. It is followed by a detailed examination report on the performance of our improved discovery algorithm. The experimental results of the current version of the discovery system show that: (l) the current version is capable of discovering what discovered by previous system; (2) current system is capable of discovering more complicated causal networks with large number of variables; (3) the new version works more efficiently compared with the previous version in terms of time complexity.


20.00% 20.00%



One common drawback in algorithms for learning Linear Causal Models is that they can not deal with incomplete data set. This is unfortunate since many real problems involve missing data or even hidden variable. In this paper, based on multiple imputation, we propose a three-step process to learn linear causal models from incomplete data set. Experimental results indicate that this algorithm is better than the single imputation method (EM algorithm) and the simple list deletion method, and for lower missing rate, this algorithm can even find models better than the results from the greedy learning algorithm MLGS working in a complete data set. In addition, the method is amenable to parallel or distributed processing, which is an important characteristic for data mining in large data sets.


20.00% 20.00%



One major difficulty frustrating the application of linear causal models is that they are not easily adapted to cope with discrete data. This is unfortunate since most real problems involve both continuous and discrete variables. In this paper, we consider a class of graphical models which allow both continuous and discrete variables, and propose the parameter estimation method and a structure discovery algorithm based on Minimum Message Length and parameter estimation. Experimental results are given to demonstrate the potential for the application of this method.


20.00% 20.00%



Determining the causal relation among attributes in a domain
is a key task in the data mining and knowledge discovery. In this
paper, we applied a causal discovery algorithm to the business traveler
expenditure survey data [1]. A general class of causal models is adopted in
this paper to discover the causal relationship among continuous and discrete variables. All those factors which have direct effect on the expense
pattern of travelers could be detected. Our discovery results reinforced
some conclusions of the rough set analysis and found some new conclusions which might significantly improve the understanding of expenditure behaviors of the business traveler.


20.00% 20.00%



Issue addressed: Walking for transport can contribute significantly to health-enhancing physical activity. We examined the prevalence and duration of walking to and from school, together with perceived influences on doing so, among parents of primary school children. Methods: Questionnaires were completed by parents from four primary schools (one government and three private) located in south-east Queensland (n=559; 40% response rate). Results: Eighteen per cent of parents reported walking for at least 10 minutes during journeys to school. Significantly greater proportions of parents with only one car in their household, with a child who attended a government school, with no driver’s licence, who had less than 11 years of education, and lived within two kilometres of the school walked for at least 10 minutes during the school journey. Factors perceived by parents most strongly to influence walking to school were: being physically active; safety concerns for the child walking alone; not having to park; walking being the child’s preferred option; too much motor vehicle traffic; and their child’s age and level of road sense. Conclusions: Despite the overall low prevalence of walking to school by parents, health-enhancing benefits may be achieved even when other modes of transport are used in conjunction with walking.