220 resultados para Survival data
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.
Resumo:
Background: This study used household survey data on the prevalence of child, parent and family variables to establish potential targets for a population-level intervention to strengthen parenting skills in the community. The goals of the intervention include decreasing child conduct problems, increasing parental self-efficacy, use of positive parenting strategies, decreasing coercive parenting and increasing help-seeking, social support and participation in positive parenting programmes. Methods: A total of 4010 parents with a child under the age of 12 years completed a statewide telephone survey on parenting. Results: One in three parents reported that their child had a behavioural or emotional problem in the previous 6 months. Furthermore, 9% of children aged 2–12 years meet criteria for oppositional defiant disorder. Parents who reported their child's behaviour to be difficult were more likely to perceive parenting as a negative experience (i.e. demanding, stressful and depressing). Parents with greatest difficulties were mothers without partners and who had low levels of confidence in their parenting roles. About 20% of parents reported being stressed and 5% reported being depressed in the 2 weeks prior to the survey. Parents with personal adjustment problems had lower levels of parenting confidence and their child was more difficult to manage. Only one in four parents had participated in a parent education programme. Conclusions: Implications for the setting of population-level goals and targets for strengthening parenting skills are discussed.
Resumo:
Reviews the ecological status of the mahogany glider and describes its distribution, habitat and abundance, life history and threats to it. Three serial surveys of Brisbane residents provide data on the knowledge of respondents about the mahogany glider. The results provide information about the attitudes of respondents to the mahogany glider, to its conservation and relevant public policies and about variations in these factors as the knowledge of participants of the mahogany glider alters. Similarly data is provided and analysed about the willingness to pay of respondents to conserve the mahogany glider. Population viability analysis is applied to estimate the required habitat area for a minimum viable population of the mahogany glider to ensure at least a 95% probability of its survival for 100 years. Places are identified in Queensland where the requisite minimum area of critical habitat can be conserved. Using the survey results as a basis, the likely willingness of groups of Australians to pay for the conservation of the mahogany glider is estimated and consequently their willingness to pay for the minimum required area of its habitat. Methods for estimating the cost of protecting this habitat are outlined. Australia-wide benefits seem to exceed the costs. Establishing a national park containing the minimum viable population of the mahogany glider is an appealing management option. This would also be beneficial in conserving other endangered wildlife species. Therefore, additional economic benefits to those estimated on account of the mahogany glider itself can be obtained.
Resumo:
This paper discusses a multi-layer feedforward (MLF) neural network incident detection model that was developed and evaluated using field data. In contrast to published neural network incident detection models which relied on simulated or limited field data for model development and testing, the model described in this paper was trained and tested on a real-world data set of 100 incidents. The model uses speed, flow and occupancy data measured at dual stations, averaged across all lanes and only from time interval t. The off-line performance of the model is reported under both incident and non-incident conditions. The incident detection performance of the model is reported based on a validation-test data set of 40 incidents that were independent of the 60 incidents used for training. The false alarm rates of the model are evaluated based on non-incident data that were collected from a freeway section which was video-taped for a period of 33 days. A comparative evaluation between the neural network model and the incident detection model in operation on Melbourne's freeways is also presented. The results of the comparative performance evaluation clearly demonstrate the substantial improvement in incident detection performance obtained by the neural network model. The paper also presents additional results that demonstrate how improvements in model performance can be achieved using variable decision thresholds. Finally, the model's fault-tolerance under conditions of corrupt or missing data is investigated and the impact of loop detector failure/malfunction on the performance of the trained model is evaluated and discussed. The results presented in this paper provide a comprehensive evaluation of the developed model and confirm that neural network models can provide fast and reliable incident detection on freeways. (C) 1997 Elsevier Science Ltd. All rights reserved.
Resumo:
Multifrequency bioimpedance analysis has the potential to provide a non-invasive technique for determining body composition in live cattle. A bioimpedance meter developed for use in clinical medicine was adapted and evaluated in 2 experiments using a total of 31 cattle. Prediction equations were obtained for total body water, extracellular body water, intracellular body water, carcass water and carcass protein. There were strong correlations between the results obtained through chemical markers and bioimpedance analysis when determined in cattle that had a wide range of liveweights and conditions. The r(2) values obtained were 0.87 and 0.91 for total body water and extracellular body water respectively. Bioimpedance also correlated with carcass water, measured by chemical analysis (r(2) = 0.72), but less well with carcass protein (r(2) = 0.46). These correlations were improved by inclusion of liveweight and sex as variables in multiple regression analysis. However, the resultant equations were poor predictors of protein and water content in the carcasses of a group of small underfed beef cattle, that had a narrow range of liveweights. In this case, although there was no statistical difference between the predicted and measured values overall, bioimpedance analysis did not detect the differences in carcass protein between the 2 groups that were apparent following chemical analysis. Further work is required to determine the sensitivity of the technique in small underfed cattle, and its potential use in heavier well fed cattle close to slaughter weight.
Resumo:
Multi-frequency bioimpedance analysis (MFBIA) was used to determine the impedance, reactance and resistance of 103 lamb carcasses (17.1-34.2 kg) immediately after slaughter and evisceration. Carcasses were halved, frozen and one half subsequently homogenized and analysed for water, crude protein and fat content. Three measures of carcass length were obtained. Diagonal length between the electrodes (right side biceps femoris to left side of neck) explained a greater proportion of the variance in water mass than did estimates of spinal length and was selected for use in the index L-2/Z to predict the mass of chemical components in the carcass. Use of impedance (Z) measured at the characteristic frequency (Z(c)) instead of 50 kHz (Z(50)) did not improve the power of the model to predict the mass of water, protein or fat in the carcass. While L-2/Z(50) explained a significant proportion of variation in the masses of body water (r(2) 0.64), protein (r(2) 0.34) and fat (r(2) 0.35), its inclusion in multi-variate indices offered small or no increases in predictive capacity when hot carcass weight (HCW) and a measure of rib fat-depth (GR) were present in the model. Optimized equations were able to account for 65-90 % of the variance observed in the weight of chemical components in the carcass. It is concluded that single frequency impedance data do not provide better prediction of carcass composition than can be obtained from measures of HCW and GR. Indices of intracellular water mass derived from impedance at zero frequency and the characteristic frequency explained a similar proportion of the variance in carcass protein mass as did the index L-2/Z(50).
Resumo:
OBJECTIVE- To assess the relationship between clinical course after acute myocardial infarction (AMI) and diabetes treatment. RESEARCH DESIGN AND METHODS- Retrospective analysis of data from all patients aged 25-64 years admitted to hospitals in Perth, Australia, between 1985 and 1993 with AMI diagnosed according to the International Classification of Diseases (9th revision) criteria was conducted. Short- (28-day) and long-term survival and complications in diabetic and nondiabetic patients were compared. For diabetic patients, 28-day survival, dysrhythmias, heart block, and pulmonary edema were treated as outcomes, and factors related to each were assessed using multiple logistic regression. Diabetes treatment was added to the model to assess its significance. Long-term survival was compared by means of a Cox proportional hazards model. RESULTS- Of 5,715 patients, 745 (12.9%) were diabetic. Mortality at 28 days was 12.0 and 28.1% for nondiabetic and diabetic patients, respectively (P < 0.001); there were no significant drug effects in the diabetic group. Ventricular fibrillation in diabetic patients taking glibenclamide (11.8%) was similar to that of nondiabetic patients (11.0%) but was lower than that for those patients taking either gliclazide (18.0%; 0.1 > P > 0.05) or insulin (22.8%; P < 0.05). There were no other treatment-related differences in acute complications. Long-term survival in diabetic patients was reduced in those taking digitalis and/or diuretics but type of diabetes treatment at discharge had no significant association with outcome. CONCLUSlONS- These results do not suggest that ischemic heart disease should influence the choice of diabetes treatment regimen in general or of sulfonylurea drug in particular.
Resumo:
The World Health Organization (WHO) MONICA Project is a 10-year study monitoring trends and determinants of cardiovascular disease in geographically defined populations. Data were collected from over 100 000 randomly selected participants in two risk factor surveys conducted approximately 5 years apart in 38 populations using standardized protocols. The net effects of changes in the risk factor levels were estimated using risk scores derived from longitudinal studies in the Nordic countries. The prevalence of cigarette smoking decreased among men in most populations, but the trends for women varied. The prevalence of hypertension declined in two-thirds of the populations. Changes in the prevalence of raised total cholesterol were small but highly correlated between the genders (r = 0.8). The prevalence of obesity increased in three-quarters of the populations for men and in more than half of the populations for women. In almost half of the populations there were statistically significant declines in the estimated coronary risk for both men and women, although for Beijing the risk score increased significantly for both genders. The net effect of the changes in the risk factor levels in the 1980s in most of the study populations of the WHO MONICA Project is that the rates of coronary disease are predicted to decline in the 1990s.
Resumo:
The performance of three analytical methods for multiple-frequency bioelectrical impedance analysis (MFBIA) data was assessed. The methods were the established method of Cole and Cole, the newly proposed method of Siconolfi and co-workers and a modification of this procedure. Method performance was assessed from the adequacy of the curve fitting techniques, as judged by the correlation coefficient and standard error of the estimate, and the accuracy of the different methods in determining the theoretical values of impedance parameters describing a set of model electrical circuits. The experimental data were well fitted by all curve-fitting procedures (r = 0.9 with SEE 0.3 to 3.5% or better for most circuit-procedure combinations). Cole-Cole modelling provided the most accurate estimates of circuit impedance values, generally within 1-2% of the theoretical values, followed by the Siconolfi procedure using a sixth-order polynomial regression (1-6% variation). None of the methods, however, accurately estimated circuit parameters when the measured impedances were low (<20 Omega) reflecting the electronic limits of the impedance meter used. These data suggest that Cole-Cole modelling remains the preferred method for the analysis of MFBIA data.
Resumo:
The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.
Resumo:
Physiological and kinematic data were collected from elite under-19 rugby union players to provide a greater understanding of the physical demands of rugby union. Heart rate, blood lactate and time-motion analysis data were collected from 24 players (mean +/- s((x) over bar): body mass 88.7 +/- 9.9 kg, height 185 +/- 7 cm, age 18.4 +/- 0.5 years) during six competitive premiership fixtures. Six players were chosen at random from each of four groups: props and locks, back row forwards, inside backs, outside backs. Heart rate records were classified based on percent time spent in four zones (>95%, 85-95%, 75-84%, <75% HRmax). Blood lactate concentration was measured periodically throughout each match, with movements being classified as standing, walking, jogging, cruising, sprinting, utility, rucking/mauling and scrummaging. The heart rate data indicated that props and locks (58.4%) and back row forwards (56.2%) spent significantly more time in high exertion (85-95% HRmax) than inside backs (40.5%) and outside backs (33.9%) (P < 0.001). Inside backs (36.5%) and outside backs (38.5%) spent significantly more time in moderate exertion (75-84% HRmax) than props and locks (22.6%) and back row forwards (19.8%) (P < 0.05). Outside backs (20.1%) spent significantly more time in low exertion (< 75% HRmax) than props and locks (5.8%) and back row forwards (5.6%) (P < 0.05). Mean blood lactate concentration did not differ significantly between groups (range: 4.67 mmol.l(-1) for outside backs to 7.22 mmol.l(-1) for back row forwards; P < 0.05). The motion analysis data indicated that outside backs (5750 m) covered a significantly greater total distance than either props and locks or back row forwards (4400 and 4080 m, respectively; P < 0.05). Inside backs and outside backs covered significantly greater distances walking (1740 and 1780 m, respectively; P < 0.001), in utility movements (417 and 475 m, respectively; P < 0.001) and sprinting (208 and 340 m, respectively; P < 0.001) than either props and locks or back row forwards (walking: 1000 and 991 m; utility movements: 106 and 154 m; sprinting: 72 and 94 m, respectively). Outside backs covered a significantly greater distance sprinting than inside backs (208 and 340 m, respectively; P < 0.001). Forwards maintained a higher level of exertion than backs, due to more constant motion and a large involvement in static high-intensity activities. A mean blood lactate concentration of 4.8-7.2 mmol.l(-1) indicated a need for 'lactate tolerance' training to improve hydrogen ion buffering and facilitate removal following high-intensity efforts. Furthermore, the large distances (4.2-5.6 km) covered during, and intermittent nature of, match-play indicated a need for sound aerobic conditioning in all groups (particularly backs) to minimize fatigue and facilitate recovery between high-intensity efforts.
Resumo:
Background and Purpose-Few community-based studies have examined the long-term risk of recurrent stroke after an acute first-ever stroke. This study aimed to determine the absolute and relative risks of a first recurrent stroke over the first 5 years after a first-ever stroke and the predictors of such recurrence in a population-based series of people with first-ever stroke in Perth, Western Australia. Methods-Between February 1989 and August 1990, all people with a suspected acute stroke or transient ischemic attack of the brain who were resident in a geographically defined region of Perth, Western Australia, with a population of 138 708 people, were registered prospectively and assessed according to standardized diagnostic criteria. Patients were followed up prospectively at 4 months, 12 months, and 5 years after the index event. Results-Three hundred seventy patients with a first-ever stroke were registered, of whom 351 survived >2 days. Data were available for 98% of the cohort at 5 years, by which time 199 patients (58%) had died and 52 (15%) had experienced a recurrent stroke, 12 (23%) of which were fatal within 28 days. The 5-year cumulative risk of first recurrent stroke was 22.5% (95% confidence limits [CL], 16.8%, 28.1%). The risk of recurrent stroke was greatest in the first 6 months after stroke, at 8.8% (95% CL, 5.4%, 12.1%). After adjustment for age and sex, the prognostic factors for recurrent stroke were advanced, but not extreme, age (75 to 84 years) (hazard ratio [HR], 2.6; 95% CL, 1.1, 6.2), hemorrhagic index stroke (HR, 2.1; 95% CL, 0.98, 4.4), and diabetes mellitus (HR, 2.1; 95% CL, 0.95, 4.4). Conclusions-Approximately 1 in 6 survivors (15%) of a first-ever stroke experience a recurrent stroke over the next 5 years, of which 25% are fatal within 28 days. The pathological subtype of the recurrent stroke is the same as that of the index stroke in 88% of cases. The predictors of first recurrent stroke in this study were advanced age, hemorrhagic index stroke, and diabetes mellitus, but numbers of recurrent events were modest. Because the risk of recurrent stroke is highest (8.8%) in the first 6 months after stroke, strategies for secondary prevention should be initiated as soon as possible after the index event.