775 resultados para Data Mining, Rough Sets, Multi-Dimension, Association Rules, Constraint


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is the fourth Association for Learning Technology (ALT) Annual Survey. As with previous years, the survey was advertised predominately to ALT Members but at the same time promoted publically, and responses were collected between December 2017 and January 2018. The ALT Annual Survey contains a common core of questions asked in all annual surveys. This year the survey was supplemented with additional questions specifically aimed at gaining feedback for Certified Member of ALT (CMALT) framework and to identify other priorities 2018.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is the fifth Association for Learning Technology (ALT) Annual Survey. As with previous years, the survey was advertised predominately to ALT Members but at the same time promoted publically. The ALT Annual Survey contains a common core of questions asked in all annual surveys.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We re-mapped the soils of the Murray-Darling Basin (MDB) in 1995-1998 with a minimum of new fieldwork, making the most out of existing data. We collated existing digital soil maps and used inductive spatial modelling to predict soil types from those maps combined with environmental predictor variables. Lithology, Landsat Multi Spectral Scanner (Landsat MSS), the 9-s digital elevation model (DEM) of Australia and derived terrain attributes, all gridded to 250-m pixels, were the predictor variables. Because the basin-wide datasets were very large data mining software was used for modelling. Rule induction by data mining was also used to define the spatial domain of extrapolation for the extension of soil-landscape models from existing soil maps. Procedures to estimate the uncertainty associated with the predictions and quality of information for the new soil-landforms map of the MDB are described. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the construction of Australia-wide soil property predictions from a compiled national soils point database. Those properties considered include pH, organic carbon, total phosphorus, total nitrogen, thickness. texture, and clay content. Many of these soil properties are used directly in environmental process modelling including global climate change models. Models are constructed at the 250-m resolution using decision trees. These relate the soil property to the environment through a suite of environmental predictors at the locations where measurements are observed. These models are then used to extend predictions to the continental extent by applying the rules derived to the exhaustively available environmental predictors. The methodology and performance is described in detail for pH and summarized for other properties. Environmental variables are found to be important predictors, even at the 250-m resolution at which they are available here as they can describe the broad changes in soil property.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Geospatial clustering must be designed in such a way that it takes into account the special features of geoinformation and the peculiar nature of geographical environments in order to successfully derive geospatially interesting global concentrations and localized excesses. This paper examines families of geospaital clustering recently proposed in the data mining community and identifies several features and issues especially important to geospatial clustering in data-rich environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background & Aims: EPIC-3 is a prospective, international study that has demonstrated the efficacy of PEG-IFN alfa-2b plus weight-based ribavirin in patients with chronic hepatitis C and significant fibrosis who previously failed any interferon-alfa/ribavirin therapy. The aim of the present study was to assess FibroTest (FT), a validated non-invasive marker of fibrosis in treatment-naive patients, as a possible alternative to biopsy as the baseline predictor of subsequent early virologic (EVR) and sustained virologic response (SVR) in previously treated patients. Methods: Of 2312 patients enrolled, 1459 had an available baseline FT, biopsy, and complete data. Uni- (UV) and multi-variable (MV) analyses were performed using FT and biopsy. Results: Baseline characteristics were similar as in the overall population; METAVIR stage: 28% F2, 29% F3, and 43% F4, previous relapsers 29%, previous PEG-IFN regimen 41%, high baseline viral load (BVL) 64%. 506 patients (35%) had undetectable HCV-RNA at TW12 (TW12neg), with 58% achieving SVR. The accuracy of FT was similar to that in naive patients: AUROC curve for the diagnosis of F4 vs F2 = 0.80 (p<0.00001). Five baseline factors were associated (p<0.001) with SVR in UV and MV analyses (odds ratio: UV/MV): fibrosis stage estimated using FT (4.5/5.9) or biopsy (1.5/1.6), genotype 2/3 (4.5/5.1), BVL (1.5/1.3), prior relapse (1.6/1.6), previous treatment with non-PEG-IFN (2.6/2.0). These same factors were associated (p <= 0.001) with EVR. Among patients TW12neg, two independent factors remained highly predictive of SVR by MV analysis (p <= 0.001): genotype 2/3 (odds ratio = 2.9), fibrosis estimated with FT (4.3) or by biopsy (1.5). Conclusions: FibroTest at baseline is a possible non-invasive alternative to biopsy for the prediction of EVR at 12 weeks and SVR, in patients with previous failures and advanced fibrosis, retreated with PEG-IFN alfa-2b and ribavirin. (C) 2010 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Suicidal behaviours are one of the most important contributors to the global burden of disease among women, but little is known about prevalence and modifiable risk factors in low and middle income countries. We use data from the WHO multi-country study on women`s health and domestic violence against women to examine the prevalence of suicidal thoughts and attempts, and relationships between suicide attempts and mental health status, child sexual abuse, partner violence and other variables. Population representative cross-sectional household surveys were conducted from 2000-2003 in 13 provincial (more rural) and city (urban) sites in Brazil, Ethiopia, japan, Namibia, Peru, Samoa, Serbia, Thailand and Tanzania. 20967 women aged 15-49 years participated. Prevalence of lifetime suicide attempts, lifetime suicidal thoughts, and suicidal thoughts in the past four weeks were calculated, and multivariate logistic regression models were fit to examine factors associated with suicide attempts in each site. Prevalence of lifetime suicide attempts ranged from 0.8% (Tanzania) to 12.0% (Peru city): lifetime thoughts of suicide from 7.2% (Tanzania province) to 29.0% (Peru province), and thoughts in the past four weeks from 1.9% (Serbia) to 13.6% (Peru province). 25-50% of women with suicidal thoughts in the past four weeks had also visited a health worker in that time. The most consistent risk factors for suicide attempts after adjusting for probable common mental health disorders were: intimate partner violence, non-partner physical violence, ever being divorced, separated or widowed, childhood sexual abuse and having a mother who had experienced intimate partner violence. Mental health policies and services must recognise the consistent relationship between violence and suicidality in women in low and middle income countries. Training health sector workers to recognize and respond to the consequences of violence may substantially reduce the health burden associated with suicidal behaviour. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A mixture model incorporating long-term survivors has been adopted in the field of biostatistics where some individuals may never experience the failure event under study. The surviving fractions may be considered as cured. In most applications, the survival times are assumed to be independent. However, when the survival data are obtained from a multi-centre clinical trial, it is conceived that the environ mental conditions and facilities shared within clinic affects the proportion cured as well as the failure risk for the uncured individuals. It necessitates a long-term survivor mixture model with random effects. In this paper, the long-term survivor mixture model is extended for the analysis of multivariate failure time data using the generalized linear mixed model (GLMM) approach. The proposed model is applied to analyse a numerical data set from a multi-centre clinical trial of carcinoma as an illustration. Some simulation experiments are performed to assess the applicability of the model based on the average biases of the estimates formed. Copyright (C) 2001 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an integrated system that helps both retail companies and electricity consumers on the definition of the best retail contracts and tariffs. This integrated system is composed by a Decision Support System (DSS) based on a Consumer Characterization Framework (CCF). The CCF is based on data mining techniques, applied to obtain useful knowledge about electricity consumers from large amounts of consumption data. This knowledge is acquired following an innovative and systematic approach able to identify different consumers’ classes, represented by a load profile, and its characterization using decision trees. The framework generates inputs to use in the knowledge base and in the database of the DSS. The rule sets derived from the decision trees are integrated in the knowledge base of the DSS. The load profiles together with the information about contracts and electricity prices form the database of the DSS. This DSS is able to perform the classification of different consumers, present its load profile and test different electricity tariffs and contracts. The final outputs of the DSS are a comparative economic analysis between different contracts and advice about the most economic contract to each consumer class. The presentation of the DSS is completed with an application example using a real data base of consumers from the Portuguese distribution company.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a generator for single top-quark production via flavour-changing neutral currents. The MEtop event generator allows for Next-to-Leading-Order direct top production pp -> t and Leading-Order production of several other single top processes. A few packages with definite sets of dimension six operators are available. We discuss how to improve the bounds on the effective operators and how well new physics can be probed with each set of independent dimension six operators.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trabalho de Projeto realizado para obtenção do grau de Mestre em Engenharia Informática e de Computadores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: To evaluate the predictive value of genetic polymorphisms in the context of BCG immunotherapy outcome and create a predictive profile that may allow discriminating the risk of recurrence. MATERIAL AND METHODS: In a dataset of 204 patients treated with BCG, we evaluate 42 genetic polymorphisms in 38 genes involved in the BCG mechanism of action, using Sequenom MassARRAY technology. Stepwise multivariate Cox Regression was used for data mining. RESULTS: In agreement with previous studies we observed that gender, age, tumor multiplicity and treatment scheme were associated with BCG failure. Using stepwise multivariate Cox Regression analysis we propose the first predictive profile of BCG immunotherapy outcome and a risk score based on polymorphisms in immune system molecules (SNPs in TNFA-1031T/C (rs1799964), IL2RA rs2104286 T/C, IL17A-197G/A (rs2275913), IL17RA-809A/G (rs4819554), IL18R1 rs3771171 T/C, ICAM1 K469E (rs5498), FASL-844T/C (rs763110) and TRAILR1-397T/G (rs79037040) in association with clinicopathological variables. This risk score allows the categorization of patients into risk groups: patients within the Low Risk group have a 90% chance of successful treatment, whereas patients in the High Risk group present 75% chance of recurrence after BCG treatment. CONCLUSION: We have established the first predictive score of BCG immunotherapy outcome combining clinicopathological characteristics and a panel of genetic polymorphisms. Further studies using an independent cohort are warranted. Moreover, the inclusion of other biomarkers may help to improve the proposed model.