Biblioteca Digital

17 resultados para Gender classification model

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo

Particle Competition and Cooperation in Networks for Semi-Supervised Learning

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Semi-supervised learning is one of the important topics in machine learning, concerning with pattern classification where only a small subset of data is labeled. In this paper, a new network-based (or graph-based) semi-supervised classification model is proposed. It employs a combined random-greedy walk of particles, with competition and cooperation mechanisms, to propagate class labels to the whole network. Due to the competition mechanism, the proposed model has a local label spreading fashion, i.e., each particle only visits a portion of nodes potentially belonging to it, while it is not allowed to visit those nodes definitely occupied by particles of other classes. In this way, a "divide-and-conquer" effect is naturally embedded in the model. As a result, the proposed model can achieve a good classification rate while exhibiting low computational complexity order in comparison to other network-based semi-supervised algorithms. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method.

Network-based stochastic semisupervised learning

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Semisupervised learning is a machine learning approach that is able to employ both labeled and unlabeled samples in the training process. In this paper, we propose a semisupervised data classification model based on a combined random-preferential walk of particles in a network (graph) constructed from the input dataset. The particles of the same class cooperate among themselves, while the particles of different classes compete with each other to propagate class labels to the whole network. A rigorous model definition is provided via a nonlinear stochastic dynamical system and a mathematical analysis of its behavior is carried out. A numerical validation presented in this paper confirms the theoretical predictions. An interesting feature brought by the competitive-cooperative mechanism is that the proposed model can achieve good classification rates while exhibiting low computational complexity order in comparison to other network-based semisupervised algorithms. Computer simulations conducted on synthetic and real-world datasets reveal the effectiveness of the model.

On the impact of disproportional samples in credit scoring models: An application to a Brazilian bank data

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. (C) 2012 Elsevier Ltd. All rights reserved.

Reliability, Validity and Classification Accuracy of the South Oaks Gambling Screen in a Brazilian Sample

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The purpose of this study was to examine the reliability, validity and classification accuracy of the South Oaks Gambling Screen (SOGS) in a sample of the Brazilian population. Participants in this study were drawn from three sources: 71 men and women from the general population interviewed at a metropolitan train station; 116 men and women encountered at a bingo venue; and 54 men and women undergoing treatment for gambling. The SOGS and a DSM-IV-based instrument were applied by trained researchers. The internal consistency of the SOGS was 0.75 according to the Cronbach`s alpha model, and construct validity was good. A significant difference among groups was demonstrated by ANOVA (F ((2.238)) = 221.3, P < 0.001). The SOGS items and DSM-IV symptoms were highly correlated (r = 0.854, P < 0.01). The SOGS also presented satisfactory psychometric properties: sensitivity (100), specificity (74.7), positive predictive rate (60.7), negative predictive rate (100) and misclassification rate (0.18). However, a cut-off score of eight improved classification accuracy and reduced the rate of false positives: sensitivity (95.4), specificity (89.8), positive predictive rate (78.5), negative predictive rate (98) and misclassification rate (0.09). Thus, the SOGS was found to be reliable and valid in the Brazilian population.

Classification and regression tree (CART) model to predict pulmonary tuberculosis in hospitalized patients

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods: Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results: We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions: The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in countries with limited resources.

"Evaluating the model of classification and valuation of disabilities used in Brazil and defining the elaboration and adoption of a unique model for all the country": Brazilian Interministerial Workgroup Task

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The President of Brazil established an Interministerial Work Group in order to “evaluate the model of classification and valuation of disabilities used in Brazil and to define the elaboration and adoption of a unique model for all the country”. Eight Ministries and/or Secretaries participated in the discussion over a period of 10 months, concluding that a proposed model should be based on the United Nations Convention on the Rights of Person with Disabilities, the International Classification of Functioning, Disability and Health, and the ‘support theory’, and organizing a list of recommendations and necessary actions for a Classification, Evaluation and Certification Network with national coverage.

The Ecological Basis for Biogeographic Classification: an Example in Orchid Bees (Apidae: Euglossini)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biogeography has been difficult to apply as a methodological approach because organismic biology is incomplete at levels where the process of formulating comparisons and analogies is complex. The study of insect biogeography became necessary because insects possess numerous evolutionary traits and play an important role as pollinators. Among insects, the euglossine bees, or orchid bees, attract interest because the study of their biology allows us to explain important steps in the evolution of social behavior and many other adaptive tradeoffs. We analyzed the distribution of morphological characteristics in Colombian orchid bees from an ecological perspective. The aim of this study was to observe the distribution of these attributes on a regional basis. Data corresponding to Colombian euglossine species were ordered with a correspondence analysis and with subsequent hierarchical clustering. Later, and based on community proprieties, we compared the resulting hierarchical model with the collection localities to seek to identify a biogeographic classification pattern. From this analysis, we derived a model that classifies the territory of Colombia into 11 biogeographic units or natural clusters. Ecological assumptions in concordance with the derived classification levels suggest that species characteristics associated with flight performance, nectar uptake, and social behavior are the factors that served to produce the current geographical structure.

Gender characterization in a large series of Brazilian patients with spondyloarthritis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An increasing number of women have been diagnosed with spondyloarthritis (SpA) in recent decades. While a few studies have analyzed gender as a prognostic factor of the disease, no studies have addressed this matter with a large number of patients in South America, which is a peculiar region due to its genetic heterogeneity. The aim of the present study was to analyze the influence of gender on disease patterns in a large cohort of Brazilian patients with SpA. A prospective study was carried out involving 1,505 patients [1,090 males (72.4%) and 415 females (27.6%)] classified as SpA according to the European Spondyloarthropaties Study Group criteria who attended at 29 reference centers for rheumatology in Brazil. Clinical and demographic variables were recorded and the following disease indices were administered: Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), Bath Ankylosing Spondylitis Functional Index (BASFI), Bath Ankylosing Spondylitis Radiologic Index (BASRI), Maastricht Ankylosing Spondylitis Enthesitis Score (MASES), and Ankylosing Spondylitis Quality of Life (ASQoL). Ankylosing spondylitis (AS) was the most frequent disease in the group (65.4%), followed by psoriatic arthritis (18.4%), undifferentiated SpA (6.7%), reactive arthritis (3.3%), arthritis associated to inflammatory bowel disease (3.2%), and juvenile SpA (2.9%). The male-to-female ratio was 2.6:1 for the whole group and 3.6:1 for AS. The females were older (p<0.001) and reported shorter disease duration (p=0.002) than the male patients. The female gender was positively associated to peripheral SpA (p<0.001), upper limb arthritis (p<0.001), dactylitis (p=0.011), psoriasis (p<0.001), nail involvement (p<0.001), and family history of SpA (p=0.045) and negatively associated to pure axial involvement (p< 0.001), lumbar inflammatory pain (p=0.042), radiographic sacroiliitis (p<0.001), and positive HLA-B27 (p=0.001). The number of painful (p<0.001) and swollen (p=0.006) joints was significantly higher in the female gender, who also achieved higher BASDAI (p<0.001), BASFI (p=0.073, trend), MASES (p=0.019), ASQoL (p=0.014), and patient's global assessment (p=0.003) scores, whereas the use of nonsteroidal anti-inflammatory drugs (p<0.001) and biological agents (p=0.003) was less frequent in the female gender. Moreover, BASRI values were significantly lower in females (p<0.001). The female gender comprised one third of SpA patients in this large cohort and exhibited more significant peripheral involvement and less functional disability, despite higher values in disease indices.

Complex network classification using partially self-avoiding deterministic walks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Complex networks have attracted increasing interest from various fields of science. It has been demonstrated that each complex network model presents specific topological structures which characterize its connectivity and dynamics. Complex network classification relies on the use of representative measurements that describe topological structures. Although there are a large number of measurements, most of them are correlated. To overcome this limitation, this paper presents a new measurement for complex network classification based on partially self-avoiding walks. We validate the measurement on a data set composed by 40000 complex networks of four well-known models. Our results indicate that the proposed measurement improves correct classification of networks compared to the traditional ones. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.4737515]

Four-gene expression model predictive of lymph node metastases in oral squamous cell carcinoma

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background. Previous knowledge of cervical lymph node compromise may be crucial to choose the best treatment strategy in oral squamous cell carcinoma (OSCC). Here we propose a set four genes, whose mRNA expression in the primary tumor predicts nodal status in OSCC, excluding tongue. Material and methods. We identified differentially expressed genes in OSCC with and without compromised lymph nodes using Differential Display RT-PCR. Known genes were chosen to be validated by means of Northern blotting or real time RT-PCR (qRT-PCR). Thereafter we constructed a Nodal Index (NI) using discriminant analysis in a learning set of 35 patients, which was further validated in a second independent group of 20 patients. Results. Of the 63 differentially expressed known genes identified comparing three lymph node positive (pN+) and three negative (pN0) primary tumors, 23 were analyzed by Northern analysis or RT-PCR in 49 primary tumors. Six genes confirmed as differentially expressed were used to construct a NI, as the best set predictive of lymph nodal status, with the final result including four genes. The NI was able to correctly classify 32 of 35 patients comprising the learning group (88.6%; p = 0.009). Casein kinase 1alpha1 and scavenger receptor class B, member 2 were found to be up regulated in pN + group in contrast to small proline-rich protein 2B and Ras-GTPase activating protein SH3 domain-binding protein 2 which were upregulated in the pN0 group. We validated further our NI in an independent set of 20 primary tumors, 11 of them pN0 and nine pN+ with an accuracy of 80.0% (p = 0.012). Conclusions. The NI was an independent predictor of compromised lymph nodes, taking into the consideration tumor size and histological grade. The genes identified here that integrate our "Nodal Index" model are predictive of lymph node metastasis in OSCC.

Multivariate analyses of UV-Vis absorption spectral data from cachaca wood extracts: a model to classify aged Brazilian cachacas according to the wood species used

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multivariate analyses of UV-Vis spectral data from cachaca wood extracts provide a simple and robust model to classify aged Brazilian cachacas according to the wood species used in the maturation barrels. The model is based on inspection of 93 extracts of oak and different Brazilian wood species by a non-aged cachaca used as an extraction solvent. Application of PCA (Principal Components Analysis) and HCA (Hierarchical Cluster Analysis) leads to identification of 6 clusters of cachaca wood extracts (amburana, amendoim, balsamo, castanheira, jatoba, and oak). LDA (Linear Discriminant Analysis) affords classification of 10 different wood species used in the cachaca extracts (amburana, amendoim, balsamo, cabreuva-parda, canela-sassafras, castanheira, jatoba, jequitiba-rosa, louro-canela, and oak) with an accuracy ranging from 80% (amendoim and castanheira) to 100% (balsamo and jequitiba-rosa). The methodology provides a low-cost alternative to methods based on liquid chromatography and mass spectrometry to classify cachacas aged in barrels that are composed of different wood species.

Independent predictors and a prognostic model for surgical outcome in refractory frontal lobe epilepsy

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose: Refractory frontal lobe epilepsy (FLE) remains one of the most challenging surgically remediable epilepsy syndromes. Nevertheless, definition of independent predictors and predictive models of postsurgical seizure outcome remains poorly explored in FLE. Methods: We retrospectively analyzed data from 70 consecutive patients with refractory FLE submitted to surgical treatment at our center from July 1994 to December 2006. Univariate results were submitted to logistic regression models and Cox proportional hazards regression to identify isolated risk factors for poor surgical results and to construct predictive models for surgical outcome in FLE. Results: From 70 patients submitted to surgery, 45 patients (64%) had favorable outcome and 37 (47%) became seizure free. Isolated risk factors for poor surgical outcome are expressed in hazard ratio (H.R.) and were time of epilepsy (H.R.=4.2; 95% C.I.=.1.5-11.7; p=0.006), ictal EEG recruiting rhythm (H.R. = 2.9; 95% C.I. = 1.1-7.7; p=0.033); normal MRI (H.R. = 4.8; 95% C.I. = 1.4-16.6; p = 0.012), and MRI with lesion involving eloquent cortex (H.R. = 3.8; 95% C.I. = 1.2-12.0; p = 0.021). Based on these variables and using a logistic regression model we constructed a model that correctly predicted long-term surgical outcome in up to 80% of patients. Conclusion: Among independent risk factors for postsurgical seizure outcome, epilepsy duration is a potentially modifiable factor that could impact surgical outcome in FLE. Early diagnosis, presence of an MRI lesion not involving eloquent cortex, and ictal EEG without recruited rhythm independently predicted favorable outcome in this series. (C) 2011 Elsevier B.V. All rights reserved.

Use of alcohol and other drugs among Brazilian college students: effects of gender and age

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: To assess the frequency of drug use among Brazilian college students and its relationship to gender and age. Methods: A nationwide sample of 12,721 college students completed a questionnaire concerning the use of drugs and other behaviors. The Alcohol, Smoking and Substance Involvement Screening Test (ASSIST-WHO) criteria were used to assess were used to assess hazardous drug use. A multivariate logistic regression model tested the associations of ASSIST-WHO scores with gender and age. The same analyses were carried out to measure drug use in the last 30 days. Results: After controlling for other sociodemographic, academic and administrative variables, men were found to be more likely to use and engage in the hazardous use of anabolic androgenic steroids than women across all age ranges. Conversely, women older than 34 years of age were more likely to use and engage in the hazardous use of amphetamines. Conclusions: These findings are consistent with results that have been reported for the general Brazilian population. Therefore, these findings should be taken into consideration when developing strategies at the prevention of drug use and the early identification of drug abuse among college students.

Decreasing the number of false positives in sequence classification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. Results For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. Conclusions Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.

Predicting smear negative pulmonary tuberculosis with classification trees and logistic regression: a cross-sectional study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Smear negative pulmonary tuberculosis (SNPT) accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.

«
1
2
»