978 resultados para Core data set


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The main goal of CleanEx is to provide access to public gene expression data via unique gene names. A second objective is to represent heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-data set comparisons. A consistent and up-to-date gene nomenclature is achieved by associating each single experiment with a permanent target identifier consisting of a physical description of the targeted RNA population or the hybridization reagent used. These targets are then mapped at regular intervals to the growing and evolving catalogues of human genes and genes from model organisms. The completely automatic mapping procedure relies partly on external genome information resources such as UniGene and RefSeq. The central part of CleanEx is a weekly built gene index containing cross-references to all public expression data already incorporated into the system. In addition, the expression target database of CleanEx provides gene mapping and quality control information for various types of experimental resource, such as cDNA clones or Affymetrix probe sets. The web-based query interfaces offer access to individual entries via text string searches or quantitative expression criteria. CleanEx is accessible at: http://www.cleanex.isb-sib.ch/.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

US Geological Survey (USGS) based elevation data are the most commonly used data source for highway hydraulic analysis; however, due to the vertical accuracy of USGS-based elevation data, USGS data may be too “coarse” to adequately describe surface profiles of watershed areas or drainage patterns. Additionally hydraulic design requires delineation of much smaller drainage areas (watersheds) than other hydrologic applications, such as environmental, ecological, and water resource management. This research study investigated whether higher resolution LIDAR based surface models would provide better delineation of watersheds and drainage patterns as compared to surface models created from standard USGS-based elevation data. Differences in runoff values were the metric used to compare the data sets. The two data sets were compared for a pilot study area along the Iowa 1 corridor between Iowa City and Mount Vernon. Given the limited breadth of the analysis corridor, areas of particular emphasis were the location of drainage area boundaries and flow patterns parallel to and intersecting the road cross section. Traditional highway hydrology does not appear to be significantly impacted, or benefited, by the increased terrain detail that LIDAR provided for the study area. In fact, hydrologic outputs, such as streams and watersheds, may be too sensitive to the increased horizontal resolution and/or errors in the data set. However, a true comparison of LIDAR and USGS-based data sets of equal size and encompassing entire drainage areas could not be performed in this study. Differences may also result in areas with much steeper slopes or significant changes in terrain. LIDAR may provide possibly valuable detail in areas of modified terrain, such as roads. Better representations of channel and terrain detail in the vicinity of the roadway may be useful in modeling problem drainage areas and evaluating structural surety during and after significant storm events. Furthermore, LIDAR may be used to verify the intended/expected drainage patterns at newly constructed highways. LIDAR will likely provide the greatest benefit for highway projects in flood plains and areas with relatively flat terrain where slight changes in terrain may have a significant impact on drainage patterns.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many questions in evolutionary biology require an estimate of divergence times but, for groups with a sparse fossil record, such estimates rely heavily on molecular dating methods. The accuracy of these methods depends on both an adequate underlying model and the appropriate implementation of fossil evidence as calibration points. We explore the effect of these in Poaceae (grasses), a diverse plant lineage with a very limited fossil record, focusing particularly on dating the early divergences in the group. We show that molecular dating based on a data set of plastid markers is strongly dependent on the model assumptions. In particular, an acceleration of evolutionary rates at the base of Poaceae followed by a deceleration in the descendants strongly biases methods that assume an autocorrelation of rates. This problem can be circumvented by using markers that have lower rate variation, and we show that phylogenetic markers extracted from complete nuclear genomes can be a useful complement to the more commonly used plastid markers. However, estimates of divergence times remain strongly affected by different implementations of fossil calibration points. Analyses calibrated with only macrofossils lead to estimates for the age of core Poaceae ∼51-55 Ma, but the inclusion of microfossil evidence pushes this age to 74-82 Ma and leads to lower estimated evolutionary rates in grasses. These results emphasize the importance of considering markers from multiple genomes and alternative fossil placements when addressing evolutionary issues that depend on ages estimated for important groups.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Panel data can be arranged into a matrix in two ways, called 'long' and 'wide' formats (LFand WF). The two formats suggest two alternative model approaches for analyzing paneldata: (i) univariate regression with varying intercept; and (ii) multivariate regression withlatent variables (a particular case of structural equation model, SEM). The present papercompares the two approaches showing in which circumstances they yield equivalent?insome cases, even numerically equal?results. We show that the univariate approach givesresults equivalent to the multivariate approach when restrictions of time invariance (inthe paper, the TI assumption) are imposed on the parameters of the multivariate model.It is shown that the restrictions implicit in the univariate approach can be assessed bychi-square difference testing of two nested multivariate models. In addition, commontests encountered in the econometric analysis of panel data, such as the Hausman test, areshown to have an equivalent representation as chi-square difference tests. Commonalitiesand differences between the univariate and multivariate approaches are illustrated usingan empirical panel data set of firms' profitability as well as a simulated panel data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Tämä tutkimus liittyy keskusteluun yrittäjyydestä, strategisesta prosessista ja kasvunmerkityksestä pienissä yrityksissä ja edelleen keskusteluun yrittäjästä oppijana, tilaisuuteen tarttujana ja toisaalta elämäntapaansa vaalivana yksilönä. Yrittäjän oppiminen on liitettävissä keskusteluun oppimisesta organisaatioissa, mikä tässä ymmärretään yrittäjän, yksilön oppimisena ja mahdollisuutena innovointiin ja uusiutumiseen yrittäjyysprosessissa. Tutkimuksen tavoitteena on etsiä yrittäjän oppimisen tapoja ja tuotoksia, roolia ja merkitystä strategisessa liiketoimintaprosessissa, jossa yrittäjän arvot ja pienen matkailu- ja perheyrityksen kulttuuri ohjaavat yrittäjän toimintatapaa. Edelleen tavoitteena on hahmottaa yrittäjän oppimisen ja kompetenssin ilmenemistä muutoksissa ja sidosryhmäsuhteissa. Työn lähestymistapa on hermeneuttinen. Yrittäjää lähestytään kokonaisvaltaisena yksilönä näkökulmasta, jossa yrittäjyyden prosessi ymmärretään inhimillisenä ja luovana toimintana sekä jatkuvana muutoksena. Yrittäjä nähdään oman prosessinsa määrittäjänä ja parhaana asiantuntijana. Empiirinen aineisto on laadullista. Vanhatyrittäjähaastatteluin vuosina 1998 - 99 kerätyt aineistot tarjosivat kokeilumahdollisuuden aineiston analysoinnin aloittamiseksi. Aineistojen kapean näkökulmanlaajentamiseksi kerättiin lisäaineisto neljältä yrittäjältä keskusteluin vuonna2003. Tämä Itä-Suomessa matkailualan yrittäjiltä kerätty uusi aineisto muodostaa tämän tutkimuksen keskeisen aineiston. Yrittäjyyden aiemmasta tutkimuksesta nousseet näkökulmat ovat suunnanneet tämän työn empiirisen aineiston tarkastelua. Tarkastelun kohteena ovat yrittäjä oppijana yrittäjyysprosessissaan ja yrittäjähistoriansa taitekohdissa sekä oppimisen näkyminen yrittäjän arvomaailmassa, matkailuyrityksen kulttuurissa ja sidosryhmäyhteistyössä. Tutkimuksen tulokset osoittavat, että yrittäjille jatkuva muutos on arkipäiväistä työtä. Yrittäjät hallitsevat omaa oppimis- ja yrittäjyysprosessiaan. He asettavat itse tavoitteensa liiketoiminnan suhteen, ymmärtävät liiketaloudellisia realiteetteja riittävästi; erityisesti riskien hallinta on ominaista yrittäjille. Yritysten menestystä osoittaa se, että niissä on pystytty kasvattamaan toimintaa määrällisten mittareiden puitteissa tai ainakin yrittäjän omiin tavoitteisiin on päästy. Yrittäjien oppimis- ja uusiutumiskykyä todistaa yritysten olemassaolo sinänsä ja jatkuva olemassaolon kamppailu vaikuttamalla sidosryhmiin. Vaikka näitä yrittäjiä ei anglosaksisen, schumpeterilaisen määrittelytavan mukaan voida määritellä käsitteellä 'entrepreneur', voidaan kuitenkin väittää, että heidät voidaan ymmärtää 'yrittäjyys' yrittäjiksi - rohkeiksi, uteliaiksi, itsenäisiksi, oppiviksi ja jatkuvasti uusiutuviksi yksilöiksi, jotka luovat jatkuvasti uutta arvoa yritystoiminnallaan. Oppiminen ilmenee prosessissa niin, että yrittäjät hakevat uutta tietoa sosiaalisesta suhdeverkostostaan, asiakkaiden käyttäytymisestä ja reaktioista sekä havainnoivatympäristöään tarkasti tehden vertailuja oman yrityksenä näkökulmasta alan muihin yrityksiin ja toimivat yhteistyössä muiden yrittäjien kanssa. He kyseenalaistavat omaa ja yrityksensä toimintaa, laativat suunnitelmia yrityksen laajentamiseksi tai toimintojen lisäämiseksi, mutta myös toiminnoista luopumiseksi. Oppiminenja uusiutuminen näkyvät visioissa, joissa yritystoimintaa varaudutaan muuttamaan siten, että se vastaa kulloisenkin tilanteen vaatimuksia Kuitenkin yrittäjät ovat ymmärrettävissä toimintaa ennakoiviksi, eivät niinkään toimintaa sopeuttaviksi yksilöiksi. Ennakointi voidaan nähdä muutosten työntövoimana, jolloin yrittäjät ovat enemmänkin yritystensä eteenpäin työntäjiä kuin yritystensä vetäjiä. Yrittäjät ovat omissa yrityksissään paras mahdollinen kyvykkyys- ja osaamisresurssi, joka näyttää siirtyvän näissä yrityksissä myös seuraavan sukupolven käyttöön käytännön esimerkkien ja hiljaisen tiedon kautta. Jokaiselle yrittäjälle on muotoutunut oma selviytymisstrategiansa tai toimintamallinsa, jossa on nähtävissä yrittäjän koko yrittäjyyden historian aikana ja yrittäjyysprosessissa tapahtuvan oppimisen muotoutuminen yrittäjän osaamiseksi. Näissä selviytymisstrategioissa on havaittavissa yrittäjän oma arvokäsitys, perheyrityksen kulttuuri, suhtautumistapa ympäristöön ja yrittäjän oma, persoonallinen toimintatapa yrittäjänä.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Estimation of the dimensions of fluvial geobodies from core data is a notoriously difficult problem in reservoir modeling. To try and improve such estimates and, hence, reduce uncertainty in geomodels, data on dunes, unit bars, cross-bar channels, and compound bars and their associated deposits are presented herein from the sand-bed braided South Saskatchewan River, Canada. These data are used to test models that relate the scale of the formative bed forms to the dimensions of the preserved deposits and, therefore, provide an insight as to how such deposits may be preserved over geologic time. The preservation of bed-form geometry is quantified by comparing the Alluvial architecture above and below the maximum erosion depth of the modem channel deposits. This comparison shows that there is no significant difference in the mean set thickness of dune cross-strata above and below the basal erosion surface of the contemporary channel, thus suggesting that dimensional relationships between dune deposits and the formative bed-form dimensions are likely to be valid from both recent and older deposits. The data show that estimates of mean bankfull flow depth derived from dune, unit bar, and cross-bar channel deposits are all very similar. Thus, the use of all these metrics together can provide a useful check that all components and scales of the alluvial architecture have been identified correctly when building reservoir models. The data also highlight several practical issues with identifying and applying data relating to cross-strata. For example, the deposits of unit bars were found to be severely truncated in length and width, with only approximately 10% of the mean bar-form length remaining, and thus making identification in section difficult. For similar reasons, the deposits of compound bars were found to be especially difficult to recognize, and hence, estimates of channel depth based on this method may be problematic. Where only core data are available (i.e., no outcrop data exist), formative flow depths are suggested to be best reconstructed using cross-strata formed by dunes. However, theoretical relationships between the distribution of set thicknesses and formative dune height are found to result in slight overestimates of the latter and, hence, mean bankfull flow depths derived from these measurements. This article illustrates that the preservation of fluvial cross-strata and, thus, the paleohydraulic inferences that can be drawn from them, are a function of the ratio of the size and migration rate of bed forms and the time scale of aggradation and channel migration. These factors must thus be considered when deciding on appropriate length:thickness ratios for the purposes of object-based modeling in reservoir characterization.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this thesis is to study factors that explain the bilateral fiber trade flows. This is done by analyzing bilateral trade flows during 1990-2006. It will be studied also, whether there are differences between fiber types. This thesis uses a gravity model approach to study the trade flows. Gravity model is mostly used to study the aggregate data between trading countries. In this thesis the gravity model is applied to single fibers. This model is then applied to panel data set. Results from the regression show clearly that there are benefits in studying different fibers in separate. The effects differ considerably from each other. Furthermore, this thesis speaks for the existence of Linder’s effect in certain fiber types.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

When laboratory intercomparison exercises are conducted, there is no a priori dependence of the concentration of a certain compound determined in one laboratory to that determined by another(s). The same applies when comparing different methodologies. A existing data set of total mercury readings in fish muscle samples involved in a Brazilian intercomparison exercise was used to show that correlation analysis is the most effective statistical tool in this kind of experiments. Problems associated with alternative analytical tools such as mean or paired 't'-test comparison and regression analysis are discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.