955 resultados para Random Subspace Method


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An efficient Lanczos subspace method has been devised for calculating state-to-state reaction probabilities. The method recasts the time-independent wave packet Lippmann-Schwinger equation [Kouri , Chem. Phys. Lett. 203, 166 (1993)] inside a tridiagonal (Lanczos) representation in which action of the causal Green's operator is affected easily with a QR algorithm. The method is designed to yield all state-to-state reaction probabilities from a given reactant-channel wave packet using a single Lanczos subspace; the spectral properties of the tridiagonal Hamiltonian allow calculations to be undertaken at arbitrary energies within the spectral range of the initial wave packet. The method is applied to a H+O-2 system (J=0), and the results indicate the approach is accurate and stable. (C) 2002 American Institute of Physics.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Alternative sampling procedures are compared to the pure random search method. It is shown that the efficiency of the algorithm can be improved with respect to the expected number of steps to reach an epsilon-neighborhood of the optimal point.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A quick new method is described for the quantification of absolute nannofossil proportions in deep-sea sediments. This method (SMS) is the combination of Spiking a sample with Microbeads and Spraying it on a cover slide. It is suitable for scanning electron microscope (SEM) analyses and for light microscope (LM) analyses. Repeated preparation and counting of the same sample (30 times) revealed a standard deviation of 10.5%. The application of tracer microbeads with different diameters and densities revealed no statistically significant differences between counts. The SMS-method yielded coccolith numbers that are statistically not significantly different from values obtained from the filtration-method. However, coccolith counts obtained by the random settling method are three times higher than the values obtained by the SMS- and the filtration-method.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Le tecniche di Machine Learning sono molto utili in quanto consento di massimizzare l’utilizzo delle informazioni in tempo reale. Il metodo Random Forests può essere annoverato tra le tecniche di Machine Learning più recenti e performanti. Sfruttando le caratteristiche e le potenzialità di questo metodo, la presente tesi di dottorato affronta due casi di studio differenti; grazie ai quali è stato possibile elaborare due differenti modelli previsionali. Il primo caso di studio si è incentrato sui principali fiumi della regione Emilia-Romagna, caratterizzati da tempi di risposta molto brevi. La scelta di questi fiumi non è stata casuale: negli ultimi anni, infatti, in detti bacini si sono verificati diversi eventi di piena, in gran parte di tipo “flash flood”. Il secondo caso di studio riguarda le sezioni principali del fiume Po, dove il tempo di propagazione dell’onda di piena è maggiore rispetto ai corsi d’acqua del primo caso di studio analizzato. Partendo da una grande quantità di dati, il primo passo è stato selezionare e definire i dati in ingresso in funzione degli obiettivi da raggiungere, per entrambi i casi studio. Per l’elaborazione del modello relativo ai fiumi dell’Emilia-Romagna, sono stati presi in considerazione esclusivamente i dati osservati; a differenza del bacino del fiume Po in cui ai dati osservati sono stati affiancati anche i dati di previsione provenienti dalla catena modellistica Mike11 NAM/HD. Sfruttando una delle principali caratteristiche del metodo Random Forests, è stata stimata una probabilità di accadimento: questo aspetto è fondamentale sia nella fase tecnica che in fase decisionale per qualsiasi attività di intervento di protezione civile. L'elaborazione dei dati e i dati sviluppati sono stati effettuati in ambiente R. Al termine della fase di validazione, gli incoraggianti risultati ottenuti hanno permesso di inserire il modello sviluppato nel primo caso studio all’interno dell’architettura operativa di FEWS.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

INTRODUCTION: Open access publishing is becoming increasingly popular within the biomedical sciences. SciELO, the Scientific Electronic Library Online, is a digital library covering a selected collection of Brazilian scientific journals many of which provide open access to full-text articles.This library includes a number of dental journals some of which may include reports of clinical trials in English, Portuguese and/or Spanish. Thus, SciELO could play an important role as a source of evidence for dental healthcare interventions especially if it yields a sizeable number of high quality reports. OBJECTIVE: The aim of this study was to identify reports of clinical trials by handsearching of dental journals that are accessible through SciELO, and to assess the overall quality of these reports. MATERIAL AND METHODS: Electronic versions of six Brazilian dental Journals indexed in SciELO were handsearched at www.scielo.br in September 2008. Reports of clinical trials were identified and classified as controlled clinical trials (CCTs - prospective, experimental studies comparing 2 or more healthcare interventions in human beings) or randomized controlled trials (RCTs - a random allocation method is clearly reported), according to Cochrane eligibility criteria. CRITERIA TO ASSESS METHODOLOGICAL QUALITY INCLUDED: method of randomization, concealment of treatment allocation, blinded outcome assessment, handling of withdrawals and losses and whether an intention-to-treat analysis had been carried out. RESULTS: The search retrieved 33 CCTs and 43 RCTs. A majority of the reports provided no description of either the method of randomization (75.3%) or concealment of the allocation sequence (84.2%). Participants and outcome assessors were reported as blinded in only 31.2% of the reports. Withdrawals and losses were only clearly described in 6.5% of the reports and none mentioned an intention-to-treat analysis or any similar procedure. CONCLUSIONS: The results of this study indicate that a substantial number of reports of trials and systematic reviews are available in the dental journals listed in SciELO, and that these could provide valuable evidence for clinical decision making. However, it is clear that the quality of a number of these reports is of some concern and that improvement in the conduct and reporting of these trials could be achieved if authors adhered to internationally accepted guidelines, e.g. the CONSORT statement.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Maize (Zea mays) and guinea corn (Sorghum bicolor) are major food items in Plateau state, Nigeria. A multistage sampling technique was used to select the markets and store/warehouses used for this study; sample collection employed a simple random sampling method from different sampling points within designated areas. A total of 18 representative samples were collected and analyzed for the following mycotoxins: aflatoxins (Aflatoxin B1 - AFB1, Aflatoxin B2 - AFB2, Aflatoxin G1 - AFG1 and Aflatoxin G2 - AFG2), fumonisins (Fumonisin B1 - FB1 and Fumonisin B2 - FB2 ) and cyclopiazonic acid (CPA). Out of 12 samples analyzed for Aflatoxins, AFB1 was detected in 5, AFB2 in 1, AFG1 in 1 and AFG2 in 6 samples respectively. The highest concentration of AFB1 and AFG2 were found in maize samples from Pankshin market. Only maize samples from Mangu market were contaminated with AFB2 and also harboured the lowest concentration of AFG2. AFG1 contamination occurred in only guinea corn from Shendam market. and FB1 was detected in all 18 samples analyzed. The mycotoxin CPA was not detected in any of the samples. Aflatoxins levels in analyzed samples were regarded as safe based on Nigerian and European Union maximum permissible levels of 4g/kg. With the exception of two samples, FB1 levels in analyzed maize samples were within European Union maximum permissible levels of 1,000 to 3000g/kg. The health and food safety implications of these results for the human and animal population are further discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

INTRODUCTION/OBJECTIVES: Detection rates for adenoma and early colorectal cancer (CRC) are insufficient due to low compliance towards invasive screening procedures, like colonoscopy.Available non-invasive screening tests have unfortunately low sensitivity and specificity performances.Therefore, there is a large unmet need calling for a cost-effective, reliable and non-invasive test to screen for early neoplastic and pre-neoplastic lesions AIMS & Methods: The objective is to develop a screening test able to detect early CRCs and adenomas.This test is based on a nucleic acids multi-gene assay performed on peripheral blood mononuclear cells (PBMCs).A colonoscopy-controlled feasibility study was conducted on 179 subjects.The first 92 subjects was used as training set to generate a statistical significant signature.Colonoscopy revealed 21 subjects with CRC,30 with adenoma bigger than 1 cm and 41 with no neoplastic or inflammatory lesions.The second group of 48 subjects (controls, CRC and polyps) was used as a test set and will be kept blinded for the entire data analysis.To determine the organ and disease specificity 38 subjects were used:24 with inflammatory bowel disease (IBD),14 with other cancers than CRC (OC).Blood samples were taken from each patient the day of the colonoscopy and PBMCs were purified. Total RNA was extracted following standard procedures.Multiplex RT-qPCR was applied on 92 different candidate biomarkers.Different univariate and multivariate statistical methods were applied on these candidates and among them 60 biomarkers with significant p-values (<0.01) were selected.These biomarkers are involved in several different biological functions as cellular movement,cell signaling and interaction,tissue and cellular development,cancer and cell growth and proliferation.Two distinct biomarker signatures are used to separate patients without lesion from those with cancer or with adenoma, named COLOX CRC and COLOX POL respectively.COLOX performances were validated using random resampling method, bootstrap. RESULTS: COLOX CRC and POL tests successfully separate patients without lesions from those with CRC (Se 67%,Sp 93%,AUC 0.87) and from those with adenoma bigger than 1cm (Se 63%,Sp 83%,AUC 0.77),respectively. 6/24 patients in the IBD group and 1/14 patients in the OC group have a positive COLOX CRC CONCLUSION: The two COLOX tests demonstrated a high sensitivity and specificity to detect the presence of CRCs and adenomas bigger than 1 cm.A prospective, multicenter, pivotal study is underway in order to confirm these promising results in a larger cohort.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objectives: To identify factors that correlate with insulin values and to examine its independent associations among adolescents. Methods: A cross-sectional population-based study was conducted among adolescents aged 12-16,9 years old. A multi-stage stratified cluster random sampling method was employed. Anthropometric measurements and nutritional survey were performed, and fasting blood samples for insulin were obtained. Statistics: Multiple lineal regression. Results: 379 adolescents were included. Mean age was 14.08 ± 1.30 years. Factors associated with higher fasting insulin levels were puberty [ 4.55 (95% IC 0.42-8.69)], abdominal obesity [ 6.11 (95% IC 3.93-8.29)] and to be born small for gestational age (SGA) [ 7.45 (95% IC 2.47-12.44)]. It was observed a negative association between the regular intake of olive oil at home and insulin values [ -4.14 (95% IC -7.31- -0.98)]. Conclusions: Abdominal obesity and SGA were factors associated with higher fasting insulin values. In contrast, the regular intake of olive oil at home was an independent protective factor.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The multiscale finite-volume (MSFV) method is designed to reduce the computational cost of elliptic and parabolic problems with highly heterogeneous anisotropic coefficients. The reduction is achieved by splitting the original global problem into a set of local problems (with approximate local boundary conditions) coupled by a coarse global problem. It has been shown recently that the numerical errors in MSFV results can be reduced systematically with an iterative procedure that provides a conservative velocity field after any iteration step. The iterative MSFV (i-MSFV) method can be obtained with an improved (smoothed) multiscale solution to enhance the localization conditions, with a Krylov subspace method [e.g., the generalized-minimal-residual (GMRES) algorithm] preconditioned by the MSFV system, or with a combination of both. In a multiphase-flow system, a balance between accuracy and computational efficiency should be achieved by finding a minimum number of i-MSFV iterations (on pressure), which is necessary to achieve the desired accuracy in the saturation solution. In this work, we extend the i-MSFV method to sequential implicit simulation of time-dependent problems. To control the error of the coupled saturation/pressure system, we analyze the transport error caused by an approximate velocity field. We then propose an error-control strategy on the basis of the residual of the pressure equation. At the beginning of simulation, the pressure solution is iterated until a specified accuracy is achieved. To minimize the number of iterations in a multiphase-flow problem, the solution at the previous timestep is used to improve the localization assumption at the current timestep. Additional iterations are used only when the residual becomes larger than a specified threshold value. Numerical results show that only a few iterations on average are necessary to improve the MSFV results significantly, even for very challenging problems. Therefore, the proposed adaptive strategy yields efficient and accurate simulation of multiphase flow in heterogeneous porous media.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: The objective is to develop a cost-effective, reliable and non invasive screening test able to detect early CRCs and adenomas. This is done on a nucleic acids multigene assay performed on peripheral blood mononuclear cells (PBMCs). METHODS: A colonoscopy-controlled study was conducted on 179 subjects. 92 subjects (21 CRC, 30 adenoma >1 cm and 41 controls) were used as training set to generate a signature. Other 48 subjects kept blinded (controls, CRC and polyps) were used as a test set. To determine organ and disease specificity 38 subjects were used: 24 with inflammatory bowel disease (IBD),14 with other cancers (OC). Blood samples were taken and PBMCs were purified. After the RNA extraction, multiplex RT-qPCR was applied on 92 different candidate biomarkers. After different univariate and multivariate analysis 60 biomarkers with significant p-values (<0.01) were selected. 2 distinct biomarker signatures are used to separate patients without lesion from those with CRC or with adenoma, named COLOX CRC and COLOX POL. COLOX performances were validated using random resampling method, bootstrap. RESULTS: COLOX CRC and POL tests successfully separate patients without lesions from those with CRC (Se 67%, Sp 93%, AUC 0.87), and from those with adenoma > 1cm (Se 63%, Sp 83%, AUC 0.77). 6/24 patients in the IBD group and 1/14 patients in the OC group have a positive COLOX CRC. CONCLUSION: The two COLOX tests demonstrated a high Se and Sp to detect the presence of CRCs and adenomas > 1 cm. A prospective, multicenter, pivotal study is underway in order to confirm these promising results in a larger cohort.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis presents two graphical user interfaces for the project DigiQ - Fusion of Digital and Visual Print Quality, a project for computationally modeling the subjective human experience of print quality by measuring the image with certain metrics. After presenting the user interfaces, methods for reducing the computation time of several of the metrics and the image registration process required to compute the metrics, and details of their performance are given. The weighted sample method for the image registration process was able to signifigantly decrease the calculation times while resulting in some error. The random sampling method for the metrics greatly reduced calculation time while maintaining excellent accuracy, but worked with only two of the metrics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The etiology of functional dyspepsia is not known. The objective of the present study was to determine the characteristics of functional dyspepsia in Western Turkey. We divided 900 patients with functional dyspepsia into three subgroups according to symptoms: ulcer-like (UL), 321 (35.6%), motility disorder-like (ML), 281 (31.2%), and the combination (C) of these symptoms, 298 (33.1%). All patients were submitted to endoscopic evaluation, with two biopsies taken from the cardia and corpus, and four from the antrum of the stomach. All biopsy samples were studied for Helicobacter pylori (Hp) density, chronic inflammation, activity, intestinal metaplasia, atrophy, and the presence of lymphoid aggregates by histological examination. One antral biopsy was used for the rapid urease test. Tissue cagA status was determined by PCR from an antral biopsy specimen by a random sampling method. We also determined the serum levels of tumor necrosis factor-alpha (TNF-alpha) and gastrin by the same method. Data were analyzed statistically by the Kolmogorov-Smirnov test and by analysis of variance. Hp and cagA positivity was significantly higher in the UL subgroup than in the others. The patients in the ML subgroup had the lowest Hp and cagA positivity and Hp density. The ML subgroup also showed the lowest level of Hp-induced inflammation among all subgroups. The serum levels of TNF-alpha and gastrin did not reveal any difference between groups. Our findings show a poor association of Hp with the ML subgroup of functional dyspepsia, but a stronger association with the UL and C subgroups.