926 resultados para Item sets
Resumo:
ABSTRACT: This work presents a method to analyze characteristics of a set of genes that can have an influence in a certain anomaly, such as a particular type of cancer. A measure is proposed with the objective of diagnosing individuals regarding the anomaly under study and some characteristics of the genes are analyzed. Maximum likelihood equations for general and particular cases are presented.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
This Project aims to develop methods for data classification in a Data Warehouse for decision-making purposes. We also have as another goal the reduction of an attribute set in a Data Warehouse, in which a given reduced set is capable of keeping the same properties of the original one. Once we achieve a reduced set, we have a smaller computational cost of processing, we are able to identify non-relevant attributes to certain kinds of situations, and finally we are also able to recognize patterns in the database that will help us to take decisions. In order to achieve these main objectives, it will be implemented the Rough Sets algorithm. We chose PostgreSQL as our data base management system due to its efficiency, consolidation and finally, it’s an open-source system (free distribution)
Resumo:
In this paper some aspects on chaotic behavior and minimality in planar piecewise smooth vector fields theory are treated. The occurrence of non-deterministic chaos is observed and the concept of orientable minimality is introduced. Some relations between minimality and orientable minimality are also investigated and the existence of new kinds of non-trivial minimal sets in chaotic systems is observed. The approach is geometrical and involves the ordinary techniques of non-smooth systems.
Resumo:
The role played by the attainable set of a differential inclusion, in the study of dynamic control systems and fuzzy differential equations, is widely acknowledged. A procedure for estimating the attainable set is rather complicated compared to the numerical methods for differential equations. This article addresses an alternative approach, based on an optimal control tool, to obtain a description of the attainable sets of differential inclusions. In particular, we obtain an exact delineation of the attainable set for a large class of nonlinear differential inclusions.
Resumo:
This study evaluated the effect of item inversion on the construct validity and reliability of psychometric scales and proposed a theoretical framework for the evaluation of the psychometric properties of data gathered with psychometric instruments. To this propose, we used the Maslach Burnout Inventory, which is the most used psychometric inventory to measure burnout in different professional context (Students, Teachers, Police, Doctors, Nurses, etc…). The version of the MBI used was the MBI-Student Survey (MBI-SS). This inventory is composed of three key dimensions: Exhaustion, Cynicism and Professional Efficacy. The two first dimensions—which have positive formulated items—are moderate to strong positive correlated, and show moderate to strong negative correlations with the 3rd dimension—which has negative formulated items. We tested the hypothesis that, in college students, formulating the 3rd dimension of burnout as Inefficacy (reverting the negatively worded items in the Efficacy dimension) improves the correlation of the 3rd dimension with the other two dimensions, improves its internal consistency, and the overall MBI-SS’ construct validity and reliability. Confirmatory factor analysis results, estimated by Maximum Likelihood, revealed adequate factorial fit for both forms of the MBI-SS (with Efficacy) vs. the MBI-SSi (with Inefficacy). Also both forms showed adequate convergent and discriminant related validity. However, reliability and convergent validity were higher for the MBI-SSi. There were also stronger (positive) correlations between the 3 factors in MBI-SSi than the ones observed in MBI-SS. Results show that positively rewording of the 3rd dimension of the MBI-SS improves its validity and reliability. We therefore propose that the 3rd dimension of the MBI-SS should be named Professional Inefficacy and its items should be positively worded.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Empirical phylogeographic studies have progressively sampled greater numbers of loci over time, in part motivated by theoretical papers showing that estimates of key demographic parameters improve as the number of loci increases. Recently, next-generation sequencing has been applied to questions about organismal history, with the promise of revolutionizing the field. However, no systematic assessment of how phylogeographic data sets have changed over time with respect to overall size and information content has been performed. Here, we quantify the changing nature of these genetic data sets over the past 20years, focusing on papers published in Molecular Ecology. We found that the number of independent loci, the total number of alleles sampled and the total number of single nucleotide polymorphisms (SNPs) per data set has improved over time, with particularly dramatic increases within the past 5years. Interestingly, uniparentally inherited organellar markers (e.g. animal mitochondrial and plant chloroplast DNA) continue to represent an important component of phylogeographic data. Single-species studies (cf. comparative studies) that focus on vertebrates (particularly fish and to some extent, birds) represent the gold standard of phylogeographic data collection. Based on the current trajectory seen in our survey data, forecast modelling indicates that the median number of SNPs per data set for studies published by the end of the year 2016 may approach similar to 20000. This survey provides baseline information for understanding the evolution of phylogeographic data sets and underscores the fact that development of analytical methods for handling very large genetic data sets will be critical for facilitating growth of the field.
Resumo:
Factors influencing the location decisions of offices include traffic, accessibility, employment conditions, economic prospects and land-use policies. Hence tools for supporting real-estate managers and urban planners in such multidimensional decisions may be useful. Accordingly, the objective of this study is to develop a GIS-based tool to support firms who seek office accommodation within a given regional or national study area. The tool relies on a matching approach, in which a firm's characteristics (demand) on the one hand, and environmental conditions and available office spaces (supply) on the other, are analyzed separately in a first step, after which a match is sought. That is, a suitability score is obtained for every firm and for every available office space by applying some value judgments (satisfaction, utility etc.). The latter are powered by a focus on location aspects and expert knowledge about the location decisions of firms/organizations with respect to office accommodation as acquired from a group of real-estate advisers; it is stored in decision tables, and they constitute the core of the model. Apart from the delineation of choice sets for any firm seeking a location, the tool supports two additional types of queries. Firstly, it supports the more generic problem of optimally allocating firms to a set of vacant locations. Secondly, the tool allows users to find firms which meet the characteristics of any given location. Moreover, as a GIS-based tool, its results can be visualized using GIS features which, in turn, facilitate several types of analyses.
Resumo:
Background: Large gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we describe a statistical method, which we have called ‘gene shaving’. The method identifies subsets of genes with coherent expression patterns and large variation across conditions. Gene shaving differs from hierarchical clustering and other widely used methods for analyzing gene expression studies in that genes may belong to more than one cluster, and the clustering may be supervised by an outcome measure. The technique can be ‘unsupervised’, that is, the genes and samples are treated as unlabeled, or partially or fully supervised by using known properties of the genes or samples to assist in finding meaningful groupings. Results: We illustrate the use of the gene shaving method to analyze gene expression measurements made on samples from patients with diffuse large B-cell lymphoma. The method identifies a small cluster of genes whose expression is highly predictive of survival. Conclusions: The gene shaving method is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worth further investigation.
Resumo:
Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.