979 resultados para Association rule mining, Redundant association ruled, Closed itemsets, Generator, Certainty factor
Resumo:
Publication suspended Aug. 1897-July 1899, inclusive
Resumo:
Frequent Itemsets mining is well explored for various data types, and its computational complexity is well understood. There are methods to deal effectively with computational problems. This paper shows another approach to further performance enhancements of frequent items sets computation. We have made a series of observations that led us to inventing data pre-processing methods such that the final step of the Partition algorithm, where a combination of all local candidate sets must be processed, is executed on substantially smaller input data. The paper shows results from several experiments that confirmed our general and formally presented observations.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
Instantaneous outbursts in underground coal mines have occurred in at least 16 countries, involving both methane (CH4) and carbon dioxide (CO2). The precise mechanisms of an instantaneous outburst are still unresolved but must consider the effects of stress, gas content and physico-mechanical properties of the coal. Other factors such as mining methods (e.g., development heading into the coal seam) and geological features (e.g., coal seam disruptions from faulting) can combine to exacerbate the problem. Prediction techniques continue to be unreliable and unexpected outburst incidents resulting in fatalities are a major concern for underground coal operations. Gas content thresholds of 9 m(3)/t for CH4 and 6 m(3)/t for CO2 are used in the Sydney Basin, to indicate outburst-prone conditions, but are reviewed on an individual mine basis and in mixed as situations. Data on the sorption behaviour of Bowen Basin coals from Australia have provided an explanation for the conflicting results obtained by coal face desorption indices used for outburst-proneness assessment. A key factor appears to be different desorption rates displayed by banded coals, which is supported by both laboratory and mine-site investigations. Dull coal bands with high fusinite and semifusinite contents tend to display rapid desorption from solid coal, for a given pressure drop. The opposite is true for bright coal bands with high vitrinite contents and dull coal bands with high inertodetrinite contents. Consequently, when face samples of dull, fusinite-or semifusinite-rich coal of small particle size are taken for desorption testing, much gas has already escaped and low readings result. The converse applies for samples taken from coal bands with high vitrinite and/or inertodetrinite contents. In terms of outburst potential, it is the bright, vitrinite-rich and the dull, inertodetrinite-rich sections of a coal seam that appear to be more outburst-prone. This is due to the ability of the solid coal to retain gas, even after pressure reduction, creating a gas content gradient across the coal face sufficient to initiate an outburst. Once the particle size of the coal is reduced, rapid gas desorption can then take place. (C) 1998 Elsevier Science.
Resumo:
The principle of using induction rules based on spatial environmental data to model a soil map has previously been demonstrated Whilst the general pattern of classes of large spatial extent and those with close association with geology were delineated small classes and the detailed spatial pattern of the map were less well rendered Here we examine several strategies to improve the quality of the soil map models generated by rule induction Terrain attributes that are better suited to landscape description at a resolution of 250 m are introduced as predictors of soil type A map sampling strategy is developed Classification error is reduced by using boosting rather than cross validation to improve the model Further the benefit of incorporating the local spatial context for each environmental variable into the rule induction is examined The best model was achieved by sampling in proportion to the spatial extent of the mapped classes boosting the decision trees and using spatial contextual information extracted from the environmental variables.
Resumo:
Background: Increasing age and cholesterol levels, male gender, and family history of early coronary heart disease (CHD) are associated with early onset of CHD in familial hypercholesterolemia (FH). Objective: Assess subclinical atherosclerosis by computed tomography coronary angiography (CTCA) and its association with clinical and laboratorial parameters in asymptomatic FH subjects. Methods: 102 FH subjects (36% male, 45 +/- 13 years, LDL-c 280 +/- 54 mg/dL) and 35 controls (40% male, 46 +/- 12 years, LDL-c 103 +/- 18 mg/dL) were submitted to CTCA. Plaques were divided into calcified, mixed and non-calcified; luminal stenosis was characterized as >50% obstruction. Results: FH had a greater atherosclerotic burden represented by higher number of patients with: plaques (48% vs. 14%, p = 0.0005), stenosis (19% vs. 3%, p = 0.015), segments with plaques (2.05 +/- 2.85 vs. 0.43 +/- 1.33, p = 0.0016) and calcium scores (55 perpendicular to 129 vs. 38 perpendicular to 140, p = 0.0028). After multivariate analysis, determinants of plaque presence were increasing age (OR = 2.06, for age change of 10 years, CI95%: 1.38-3.07, p < 0.001) and total cholesterol (OR = 1.86, for cholesterol change by 1 standard deviation, CI95%: 1.09-3.15, p = 0.027). Coronary calcium score was associated with the presence of stenosis (OR = 1.54; CI95%: 1.27-1.86, p < 0.001, for doubling the calcium score). Male gender was directly associated with the presence of non-calcified plaques (OR: 15.45, CI95% 1.72-138.23, p = 0.014) and inversely with calcified plaques (OR = 0.21, CI95%: 0.05-0.84, p = 0.027). Family history of early CHD was associated with the presence of mixed plaques (OR = 4.90, CI95%: 1.32-18.21, p = 0.018). Conclusions: Patients with FH had an increased burden of coronary atherosclerosis by CTCA. The burden of atherosclerosis and individual plaque subtypes differed with the presence of other associated risk factors, with age and cholesterol being most important. A coronary calcium score of zero ruled out obstructive disease in this higher risk population. (C) 2010 Elsevier Ireland Ltd. All rights reserved.
Resumo:
Systemic lupus erythematosus (SLE) is an autoimmune disorder of the connective tissue with a wide and heterogeneous spectrum of manifestations, with renal and neurological involvement usually related to worse prognosis. SLE more frequently affects females of reproductive age, and a high prevalence and renal manifestation seem to be associated with non-European ethnicity. The present study aims to investigate candidate loci to SLE predisposition and evaluate the influence of ethnic ancestry in the disease risk and clinical phenotypic heterogeneity of lupus at onset. Samples represented by 111 patients and 345 controls, originated from the city of Belem, located in the Northern Region of Brazil, were investigated for polymorphisms in HLA-G, HLA-C, SLC11A1, MTHFR, CASP8 and 15 KIR genes, in addition to 89 Amerindian samples genotyped for SLC11A1. We also investigated 48 insertion/deletion ancestry markers to characterize individual African, European and Amerindian ancestry proportions in the samples. Predisposition to SLE was associated with GTGT deletion at the SLC11A1 3`UTR, presence of KIR2DS2 +/KIR2DS5 +/KIR3DS1 + profile, increased number of stimulatory KIR genes, and European and Amerindian ancestries. The ancestry analysis ruled out ethnic differences between controls and patients as the source of the observed associations. Moreover, the African ancestry was associated with renal manifestations. Lupus (2011) 20, 265-273.
Resumo:
Epidemiological studies suggest that ovarian cancer is an endocrine-related tumour, and progesterone exposure specifically may decrease the risk of ovarian cancer. To assess whether the progesterone receptor (PR) exon 4 valine to leucine amino acid variant is associated with specific tumour characteristics or with overall risk of ovarian cancer, we examined 551 cases of epithelial ovarian cancer and 298 unaffected controls for the underlying G-->T nucleotide substitution polymorphism. Stratification of the ovarian cancer cases according to tumour behaviour (low malignant potential or invasive), histology, grade or stage failed to reveal any heterogeneity with respect to the genotype defined by the PR exon 4 polymorphism. Furthermore, the genotype distribution did not differ significantly between ovarian cancer cases and unaffected controls. Compared with the GG genotype, the age-adjusted odds ratio (95% confidence interval) for risk of ovarian cancer was 0.78 (0.57-1.08) for the GT genotype, and 1.39 (0.47-4.14) for the TT genotype. In conclusion, the PR exon 4 codon 660 leucine variant encoded by the T allele does not appear to be associated with ovarian tumour behaviour, histology, stage or grade. This variant is also not associated with an increased risk of ovarian cancer, and is unlikely to be associated with a large decrease in ovarian cancer risk, although we cannot rule out a moderate inverse association between the GT genotype and ovarian cancer.
Resumo:
Patterns of association of digenean families and their mollusc and vertebrate hosts are assessed by way of a new database containing information on over 1000 species of digeneans for lift-cycles and over 5000 species from fishes. Analysis of the distribution of digenean families in molluscs suggests that the group was associated primitively with gastropods and that infection of polychaetes, bivalves and scaphopods are all the results of host-switching. For the vertebrates. infections of agnathans and chondrichthyans are apparently the result of host-switching from teleosts. For digenean families the ratio of orders of fishes infected to superfamilies of molluscs infected ranges from 0.5 (Mesometridae) to 16 (Bivesiculidae) and has a mean of 5.6. Individual patterns of host association of 13 dipenean families and superfamilies are reviewed. Two, Bucephalidae and Sanguinicolidae. are exceptional in infecting a range of first intermediate hosts qualitatively as broad as their range of definitive hosts. No well-studied taxon shows narrower association with vertebrate than with mollusc clades. The range of definitive hosts of digeneans is characteristically defined by eco-physiological similarity rather than phylogenetic relationship. The range of associations of digenean families with mollusc taxa is generally much narrower. These data are considered in the light of ideas about the significance of different forms of host association. If Manter's Second Rule (the longer the association with a host group, the mure pronounced the specificity exhibited by the parasite group) is invoked, then the data may suggest that the Digenea first parasitised molluscs before adopting vertebrate hosts. This interpretation is consistent with most previous ideas about the evolution of the Digenea but contrary to current interpretations based on the monophyly of the Neodermata. The basis of Manter's Second Rule is. however, considered too flimsy for this interpretation to be robust. Problems of the inference of the evolution of patterns of parasitism in the Neodermata al-e discussed and considered so intractable that the truth may be presently unknowable. (C) 2001 Australian Society for Parasitology Inc. Published by Elsevier Science Ltd. All rights reserved.
Resumo:
This study reexamined the association between speech rate and memory span in children from kindergarten to sixth grade (N = 152) in order to potentially account for the inconsistencies within the published literature on this topic. Some of the inconsistencies in past research may reflect the different methods adopted in assessing speech rate. In particular, repeating word triples may itself involve memory demands, contaminating the correlation between speech rate and memory span in younger children. Analyses using composite speech rate and memory span measures showed that speech rate for word triples shared variance with memory span that was independent of speech rate for single words. Moreover, speech rate for word triples was largely redundant with age in explaining additional variation in memory span once the effects of speech rate for single words were controlled. (C) 2002 Elsevier Science.
Resumo:
Almost all individuals (182) belonging to an Amazonian riverine population (Portuchuelo, RO, Brazil) were investigated for ascertaining data on epidemiological aspects of malaria. Thirteen genetic blood polymorphisms were investigated (ABO, MNSs, Rh, Kell, and Duffy systems, haptoglobins, hemoglobins, and the enzymes glucose-6-phosphate dehydrogenase, glyoxalase, phosphoglucomutase, carbonic anhydrase, red cell acid phosphatase, and esterase D). The results indicated that the Duffy system is associated with susceptibility to malaria, as observed in other endemic areas. Moreover, suggestions also arose indicating that the EsD and Rh loci may be significantly associated with resistance to malaria. If statistical type II errors and sample stratification could be ruled out, hypotheses on the existence of a causal mechanism or an unknown closely linked locus involved in susceptibility to malaria infection may explain the present findings.
Resumo:
Background. Urotensin II (UII) is a potent vasoconstrictor peptide, which signals through a G-protein coupled receptor (GPCR) known as GPR14 or urotensin receptor (UTR). UII exerts a broad spectrum of actions in several systems such as vascular cell, heart muscle or pancreas, where it inhibits insulin release. Objective. Given the reported role of UII in insulin secretion, we have performed a genetic association analysis of the UTS2 gene and flanking regions with biochemical parameters related to insulin resistance (fasting glucose, glucose 2 hours after a glucose overload, fasting insulin and insulin resistance estimated as HOMA). Results and Conclusions. We have identified several polymorphisms associated with the analysed clinical traits, not only at the UTS2 gene, but also in thePER3 gene, located upstream from UTS2. Our results are compatible with a role for UII in glucose homeostasis and diabetes although we cannot rule out the possibility that PER3 gene may underlie the reported associations.