917 resultados para structuration of lexical data bases


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data centre is a centralized repository,either physical or virtual,for the storage,management and dissemination of data and information organized around a particular body and nerve centre of the present IT revolution.Data centre are expected to serve uniinterruptedly round the year enabling them to perform their functions,it consumes enormous energy in the present scenario.Tremendous growth in the demand from IT Industry made it customary to develop newer technologies for the better operation of data centre.Energy conservation activities in data centre mainly concentrate on the air conditioning system since it is the major mechanical sub-system which consumes considerable share of the total power consumption of the data centre.The data centre energy matrix is best represented by power utilization efficiency(PUE),which is defined as the ratio of the total facility power to the IT equipment power.Its value will be greater than one and a large value of PUE indicates that the sub-systems draw more power from the facility and the performance of the data will be poor from the stand point of energy conservation. PUE values of 1.4 to 1.6 are acievable by proper design and management techniques.Optimizing the air conditioning systems brings enormous opportunity in bringing down the PUE value.The air conditioning system can be optimized by two approaches namely,thermal management and air flow management.thermal management systems are now introduced by some companies but they are highly sophisticated and costly and do not catch much attention in the thumb rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work identifies the importance of plenum pressure on the performance of the data centre. The present methodology followed in the industry considers the pressure drop across the tile as a dependant variable, but it is shown in this work that this is the only one independent variable that is responsible for the entire flow dynamics in the data centre, and any design or assessment procedure must consider the pressure difference across the tile as the primary independent variable. This concept is further explained by the studies on the effect of dampers on the flow characteristics. The dampers have found to introduce an additional pressure drop there by reducing the effective pressure drop across the tile. The effect of damper is to change the flow both in quantitative and qualitative aspects. But the effect of damper on the flow in the quantitative aspect is only considered while using the damper as an aid for capacity control. Results from the present study suggest that the use of dampers must be avoided in data centre and well designed tiles which give required flow rates must be used in the appropriate locations. In the present study the effect of hot air recirculation is studied with suitable assumptions. It identifies that, the pressure drop across the tile is a dominant parameter which governs the recirculation. The rack suction pressure of the hardware along with the pressure drop across the tile determines the point of recirculation in the cold aisle. The positioning of hardware in the racks play an important role in controlling the recirculation point. The present study is thus helpful in the design of data centre air flow, based on the theory of jets. The air flow can be modelled both quantitatively and qualitatively based on the results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While most data analysis and decision support tools use numerical aspects of the data, Conceptual Information Systems focus on their conceptual structure. This paper discusses how both approaches can be combined.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The application of compositional data analysis through log ratio trans- formations corresponds to a multinomial logit model for the shares themselves. This model is characterized by the property of Independence of Irrelevant Alter- natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactly this invariance of the ratio that underlies the commonly used zero replacement procedure in compositional data analysis. In this paper we investigate using the nested logit model that does not embody IIA and an associated zero replacement procedure and compare its performance with that of the more usual approach of using the multinomial logit model. Our comparisons exploit a data set that com- bines voting data by electoral division with corresponding census data for each division for the 2001 Federal election in Australia

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The statistical analysis of compositional data is commonly used in geological studies. As is well-known, compositions should be treated using logratios of parts, which are difficult to use correctly in standard statistical packages. In this paper we describe the new features of our freeware package, named CoDaPack, which implements most of the basic statistical methods suitable for compositional data. An example using real data is presented to illustrate the use of the package

Relevância:

100.00% 100.00%

Publicador:

Resumo:

First application of compositional data analysis techniques to Australian election data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several eco-toxicological studies have shown that insectivorous mammals, due to their feeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uence on essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P, S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greater white-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control (Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can give misleading results. Therefore, to improve the interpretation of the data obtained, we used statistical techniques for compositional data analysis to define groups of metals and to evaluate the relationships between them, from an inter-population viewpoint. Hypothesis testing on the adequate balance-coordinates allow us to confirm intuition based hypothesis and some previous results. The main statistical goal was to test equal means of balance-coordinates for the two defined populations. After checking normality, one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many multivariate methods that are apparently distinct can be linked by introducing one or more parameters in their definition. Methods that can be linked in this way are correspondence analysis, unweighted or weighted logratio analysis (the latter also known as "spectral mapping"), nonsymmetric correspondence analysis, principal component analysis (with and without logarithmic transformation of the data) and multidimensional scaling. In this presentation I will show how several of these methods, which are frequently used in compositional data analysis, may be linked through parametrizations such as power transformations, linear transformations and convex linear combinations. Since the methods of interest here all lead to visual maps of data, a "movie" can be made where where the linking parameter is allowed to vary in small steps: the results are recalculated "frame by frame" and one can see the smooth change from one method to another. Several of these "movies" will be shown, giving a deeper insight into the similarities and differences between these methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En la presente comunicación se describe la creación y aplicación de una base de datos educativa en el campo de la zoología, conjunta entre la Universidad Complutense de Madrid y la Universidad Nacional de Tucumán (Argentina).El material depositado en los laboratorios de prácticas de los departamentos de nuestras facultades constituye un importante recurso educativo, cuyo uso está normalmente limitado a los profesores y alumnos del propio departamento. La creación de bases de datos (banco de imágenes) a partir del citado material y su aplicación para la docencia representa un aprovechamiento importante de los recursos de nuestras universidades que se ve potenciado gracias a la colaboración entre distintas universidades.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Las bases de datos educativas son una fuente de información especializada para la investigación científica. Juegan un papel imprescindible para la conexión de producciones, novedades y resultados, y la difusión de los mismos en todo el mundo. Sin embargo, las bases de datos de mayor alcance y volumen de documentación funcionan con un acceso restringido a usuarios y/o organizaciones que pagan una cuota de suscripción. Asimismo, no todas las bases de datos cuentan con sistemas para depurar la información objeto de búsqueda. Cabe añadir, que algunas áreas documentales tienen una presencia muy limitada en las mismas, como es el caso de la documentación sobre evaluación de la transferencia de los aprendizajes; el objetivo de esta comunicación es presentar el análisis realizado de la documentación disponible sobre esta temática en las principales bases de datos educativas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L'increment de bases de dades que cada vegada contenen imatges més difícils i amb un nombre més elevat de categories, està forçant el desenvolupament de tècniques de representació d'imatges que siguin discriminatives quan es vol treballar amb múltiples classes i d'algorismes que siguin eficients en l'aprenentatge i classificació. Aquesta tesi explora el problema de classificar les imatges segons l'objecte que contenen quan es disposa d'un gran nombre de categories. Primerament s'investiga com un sistema híbrid format per un model generatiu i un model discriminatiu pot beneficiar la tasca de classificació d'imatges on el nivell d'anotació humà sigui mínim. Per aquesta tasca introduïm un nou vocabulari utilitzant una representació densa de descriptors color-SIFT, i desprès s'investiga com els diferents paràmetres afecten la classificació final. Tot seguit es proposa un mètode par tal d'incorporar informació espacial amb el sistema híbrid, mostrant que la informació de context es de gran ajuda per la classificació d'imatges. Desprès introduïm un nou descriptor de forma que representa la imatge segons la seva forma local i la seva forma espacial, tot junt amb un kernel que incorpora aquesta informació espacial en forma piramidal. La forma es representada per un vector compacte obtenint un descriptor molt adequat per ésser utilitzat amb algorismes d'aprenentatge amb kernels. Els experiments realitzats postren que aquesta informació de forma te uns resultats semblants (i a vegades millors) als descriptors basats en aparença. També s'investiga com diferents característiques es poden combinar per ésser utilitzades en la classificació d'imatges i es mostra com el descriptor de forma proposat juntament amb un descriptor d'aparença millora substancialment la classificació. Finalment es descriu un algoritme que detecta les regions d'interès automàticament durant l'entrenament i la classificació. Això proporciona un mètode per inhibir el fons de la imatge i afegeix invariança a la posició dels objectes dins les imatges. S'ensenya que la forma i l'aparença sobre aquesta regió d'interès i utilitzant els classificadors random forests millora la classificació i el temps computacional. Es comparen els postres resultats amb resultats de la literatura utilitzant les mateixes bases de dades que els autors Aixa com els mateixos protocols d'aprenentatge i classificació. Es veu com totes les innovacions introduïdes incrementen la classificació final de les imatges.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Data Protection Regulation proposed by the European Commission contains important elements to facilitate and secure personal data flows within the Single Market. A harmonised level of protection of individual data is an important objective and all stakeholders have generally welcomed this basic principle. However, when putting the regulation proposal in the complex context in which it is to be implemented, some important issues are revealed. The proposal dictates how data is to be used, regardless of the operational context. It is generally thought to have been influenced by concerns over social networking. This approach implies protection of data rather than protection of privacy and can hardly lead to more flexible instruments for global data flows.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a new methodology for comparing satellite radiation budget data with a numerical weather prediction (NWP) model. This is applied to data from the Geostationary Earth Radiation Budget (GERB) instrument on Meteosat-8. The methodology brings together, in near-real time, GERB broadband shortwave and longwave fluxes with simulations based on analyses produced by the Met Office global NWP model. Results for the period May 2003 to February 2005 illustrate the progressive improvements in the data products as various initial problems were resolved. In most areas the comparisons reveal systematic errors in the model's representation of surface properties and clouds, which are discussed elsewhere. However, for clear-sky regions over the oceans the model simulations are believed to be sufficiently accurate to allow the quality of the GERB fluxes themselves to be assessed and any changes in time of the performance of the instrument to be identified. Using model and radiosonde profiles of temperature and humidity as input to a single-column version of the model's radiation code, we conduct sensitivity experiments which provide estimates of the expected model errors over the ocean of about ±5–10 W m−2 in clear-sky outgoing longwave radiation (OLR) and ±0.01 in clear-sky albedo. For the more recent data the differences between the observed and modeled OLR and albedo are well within these error estimates. The close agreement between the observed and modeled values, particularly for the most recent period, illustrates the value of the methodology. It also contributes to the validation of the GERB products and increases confidence in the quality of the data, prior to their release.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Site-specific management requires accurate knowledge of the spatial variation in a range of soil properties within fields. This involves considerable sampling effort, which is costly. Ancillary data, such as crop yield, elevation and apparent electrical conductivity (ECa) of the soil, can provide insight into the spatial variation of some soil properties. A multivariate classification with spatial constraint imposed by the variogram was used to classify data from two arable crop fields. The yield data comprised 5 years of crop yield, and the ancillary data 3 years of yield data, elevation and ECa. Information on soil chemical and physical properties was provided by intensive surveys of the soil. Multivariate variograms computed from these data were used to constrain sites spatially within classes to increase their contiguity. The constrained classifications resulted in coherent classes, and those based on the ancillary data were similar to those from the soil properties. The ancillary data seemed to identify areas in the field where the soil is reasonably homogeneous. The results of targeted sampling showed that these classes could be used as a basis for management and to guide future sampling of the soil.