931 resultados para large data sets


Relevância:

90.00% 90.00%

Publicador:

Resumo:

A large number of models have been derived from the two-parameter Weibull distribution and are referred to as Weibull models. They exhibit a wide range of shapes for the density and hazard functions, which makes them suitable for modelling complex failure data sets. The WPP and IWPP plot allows one to determine in a systematic manner if one or more of these models are suitable for modelling a given data set. This paper deals with this topic.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In natural estuaries, contaminant transport is driven by the turbulent momentum mixing. The predictions of scalar dispersion can rarely be predicted accurately because of a lack of fundamental understanding of the turbulence structure in estuaries. Herein detailed turbulence field measurements were conducted at high frequency and continuously for up to 50 hours per investigation in a small subtropical estuary with semi-diurnal tides. Acoustic Doppler velocimetry was deemed the most appropriate measurement technique for such small estuarine systems with shallow water depths (less than 0.5 m at low tides), and a thorough post-processing technique was applied. The estuarine flow is always a fluctuating process. The bulk flow parameters fluctuated with periods comparable to tidal cycles and other large-scale processes. But turbulence properties depended upon the instantaneous local flow properties. They were little affected by the flow history, but their structure and temporal variability were influenced by a variety of mechanisms. This resulted in behaviour which deviated from that for equilibrium turbulent boundary layer induced by velocity shear only. A striking feature of the data sets is the large fluctuations in all turbulence characteristics during the tidal cycle. This feature was rarely documented, but an important difference between the data sets used in this study from earlier reported measurements is that the present data were collected continuously at high frequency during relatively long periods. The findings bring new lights in the fluctuating nature of momentum exchange coefficients and integral time and length scales. These turbulent properties should not be assumed constant.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Two major factors are likely to impact the utilisation of remotely sensed data in the near future: (1)an increase in the number and availability of commercial and non-commercial image data sets with a range of spatial, spectral and temporal dimensions, and (2) increased access to image display and analysis software through GIS. A framework was developed to provide an objective approach to selecting remotely sensed data sets for specific environmental monitoring problems. Preliminary applications of the framework have provided successful approaches for monitoring disturbed and restored wetlands in southern California.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We tested the effects of four data characteristics on the results of reserve selection algorithms. The data characteristics were nestedness of features (land types in this case), rarity of features, size variation of sites (potential reserves) and size of data sets (numbers of sites and features). We manipulated data sets to produce three levels, with replication, of each of these data characteristics while holding the other three characteristics constant. We then used an optimizing algorithm and three heuristic algorithms to select sites to solve several reservation problems. We measured efficiency as the number or total area of selected sites, indicating the relative cost of a reserve system. Higher nestedness increased the efficiency of all algorithms (reduced the total cost of new reserves). Higher rarity reduced the efficiency of all algorithms (increased the total cost of new reserves). More variation in site size increased the efficiency of all algorithms expressed in terms of total area of selected sites. We measured the suboptimality of heuristic algorithms as the percentage increase of their results over optimal (minimum possible) results. Suboptimality is a measure of the reliability of heuristics as indicative costing analyses. Higher rarity reduced the suboptimality of heuristics (increased their reliability) and there is some evidence that more size variation did the same for the total area of selected sites. We discuss the implications of these results for the use of reserve selection algorithms as indicative and real-world planning tools.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We re-mapped the soils of the Murray-Darling Basin (MDB) in 1995-1998 with a minimum of new fieldwork, making the most out of existing data. We collated existing digital soil maps and used inductive spatial modelling to predict soil types from those maps combined with environmental predictor variables. Lithology, Landsat Multi Spectral Scanner (Landsat MSS), the 9-s digital elevation model (DEM) of Australia and derived terrain attributes, all gridded to 250-m pixels, were the predictor variables. Because the basin-wide datasets were very large data mining software was used for modelling. Rule induction by data mining was also used to define the spatial domain of extrapolation for the extension of soil-landscape models from existing soil maps. Procedures to estimate the uncertainty associated with the predictions and quality of information for the new soil-landforms map of the MDB are described. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The success of plant reproduction depends on pollen-pistil interactions occurring at the stigma/style. These interactions vary depending on the stigma type: wet or dry. Tobacco (Nicotiana tabacum) represents a model of wet stigma, and its stigmas/styles express genes to accomplish the appropriate functions. For a large-scale study of gene expression during tobacco pistil development and preparation for pollination, we generated 11,216 high-quality expressed sequence tags (ESTs) from stigmas/styles and created the TOBEST database. These ESTs were assembled in 6,177 clusters, from which 52.1% are pistil transcripts/genes of unknown function. The 21 clusters with the highest number of ESTs (putative higher expression levels) correspond to genes associated with defense mechanisms or pollen-pistil interactions. The database analysis unraveled tobacco sequences homologous to the Arabidopsis (Arabidopsis thaliana) genes involved in specifying pistil identity or determining normal pistil morphology and function. Additionally, 782 independent clusters were examined by macroarray, revealing 46 stigma/style preferentially expressed genes. Real-time reverse transcription-polymerase chain reaction experiments validated the pistil-preferential expression for nine out of 10 genes tested. A search for these 46 genes in the Arabidopsis pistil data sets demonstrated that only 11 sequences, with putative equivalent molecular functions, are expressed in this dry stigma species. The reverse search for the Arabidopsis pistil genes in the TOBEST exposed a partial overlap between these dry and wet stigma transcriptomes. The TOBEST represents the most extensive survey of gene expression in the stigmas/styles of wet stigma plants, and our results indicate that wet and dry stigmas/styles express common as well as distinct genes in preparation for the pollination process.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Avicennia marina is an important mangrove species with a wide geographical and climatic distribution which suggests that large amounts of genetic diversity are available for conservation and breeding programs. In this study we compare the informativeness of AFLPs and SSRs for assessing genetic diversity within and among individuals, populations and subspecies of A. marina in Australia. Our comparison utilized three SSR loci and three AFLP primer sets that were known to be polymorphic, and could be run in a single analysis on a capillary electrophoresis system, using different-colored fluorescent dyes. A total of 120 individuals representing six populations and three subspecies were samplcd. At the locus level, SSRs were considerably more variable than AFLPs, with a total of 52 alleles and an average heterozygosity of 0.78. Average heterozygosity for AFLPs was 0.193, but all of the 918 bands scored were polymorphic. Thus, AFLPs were considerably more efficient at revealing polymorphic loci than SSRs despite lower average heterozygosities. SSRs detected more genetic differentiation between populations (19 vs 9%) and subspecies (35 vs 11%) than AFLPs. Principal co-ordinate analysis revealed congruent patterns of genetic relationships at the individual, population and subspecific levels for both data sets. Mantel testing confirmed congruence between AFLP and SSR genetic distances among, but not within, population comparisons, indicating that the markers were segregating inde- pendently but that evolutionary groups (populations and subspecies) were similar. Three genetic criteria of importance for defining priorities for ex situ collections or in situ conservation programs (number of alleles, number of locally common alleles and number of private alleles) were correlated between the AFLP and SSR data sets. The congruence between AFLP and SSR data sets suggest that either method, or a combination, is applicable to expanded genetic studies of mangroves. The codominant nature of SSRs makes them ideal for further population-based investigations, such as mating-system analyses, for which the dominant AFLP markers are less well suited. AFLPs may be particularly useful for monitoring propagation programs and identifying duplicates within collections, since a single PCR assay can reveal many loci at once.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Back ground. Based on the well-described excess of schizophrenia births in winter and spring, we hypothesised that individuals with schizophrenia (a) would be more likely to be born during periods of decreased perinatal sunshine, and (b) those born during periods of less sunshine would have an earlier age of first registration. Methods. We undertook an ecological analysis of long-term trends in perinatal sunshine duration and schizophrenia birth rates based on two mental health registers (Queensland. Australia n = 6630; The Netherlands n = 24, 474). For each of the 480 months between 1931 and 1970, the agreement between slopes of the trends in psychosis and long-term sunshine duration series were assessed. Age at first registration was assessed by quartiles of long-term trends in perinatal sunshine duration, Males and females were assessed separately. Results. Both the Dutch and Australian data showed a statistically significant association between falling long-term trends in sunshine duration around the time of birth and rising schizophrenia birth rates for males only. In both the Dutch and Australian data there were significant associations between earlier age of first registration and reduced long-term trends in sunshine duration around the time of birth for both males and females, Conclusions. A measure of long-term trends in perinatal sunshine duration was associated with two epidemiological features of schizophrenia in two separate data sets. Exposures related to sunshine duration warrant further consideration in schizophrenia research. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Complete small subunit ribosomal RNA gene (ssrDNA) and partial (D1-D3) large subunit ribosomal RNA gene (lsrDNA) sequences were used to estimate the phylogeny of the Digenea via maximum parsimony and Bayesian inference. Here we contribute 80 new ssrDNA and 124 new lsrDNA sequences. Fully complementary data sets of the two genes were assembled from newly generated and previously published sequences and comprised 163 digenean taxa representing 77 nominal families and seven aspidogastrean outgroup taxa representing three families. Analyses were conducted on the genes independently as well as combined and separate analyses including only the higher plagiorchiidan taxa were performed using a reduced-taxon alignment including additional characters that could not be otherwise unambiguously aligned. The combined data analyses yielded the most strongly supported results and differences between the two methods of analysis were primarily in their degree of resolution. The Bayesian analysis including all taxa and characters, and incorporating a model of nucleotide substitution (general-time-reversible with among-site rate heterogeneity), was considered the best estimate of the phylogeny and was used to evaluate their classification and evolution. In broad terms, the Digenea forms a dichotomy that is split between a lineage leading to the Brachylaimoidea, Diplostomoidea and Schistosomatoidea (collectively the Diplostomida nomen novum (nom. nov.)) and the remainder of the Digenea (the Plagiorchiida), in which the Bivesiculata nom. nov. and Transversotremata nom. nov. form the two most basal lineages, followed by the Hemiurata. The remainder of the Plagiorchiida forms a large number of independent lineages leading to the crown clade Xiphidiata nom. nov. that comprises the Allocreadioidea, Gorgoderoidea, Microphalloidea and Plagiorchioidea, which are united by the presence of a penetrating stylet in their cercariae. Although a majority of families and to a lesser degree, superfamilies are supported as currently defined, the traditional divisions of the Echinostomida, Plagiorchiida and Strigeida were found to comprise non-natural assemblages. Therefore, the membership of established higher taxa are emended, new taxa erected and a revised, phylogenetically based classification proposed and discussed in light of ontogeny, morphology and taxonomic history. (C) 2003 Australian Society for Parasitology Inc. Published by Elsevier Science Ltd. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Seismic recordings of IRIS/IDA/GSN station CMLA and of several temporary stations in the Azores archipelago are processed with P and S receiver function (PRF and SRF) techniques. Contrary to regional seismic tomography these methods provide estimates of the absolute velocities and of the Vp/Vs ratio up to a depth of similar to 300 km. Joint inversion of PRFs and SRFs for a few data sets consistently reveals a division of the subsurface medium into four zones with a distinctly different Vp/Vs ratio: the crust similar to 20 km thick with a ratio of similar to 1.9 in the lower crust, the high-Vs mantle lid with a strongly reduced VpNs velocity ratio relative to the standard 1.8, the low-velocity zone (LVZ) with a velocity ratio of similar to 2.0, and the underlying upper-mantle layer with a standard velocity ratio. Our estimates of crustal thickness greatly exceed previous estimates (similar to 10 km). The base of the high-Vs lid (the Gutenberg discontinuity) is at a depth of-SO km. The LVZ with a reduction of S velocity of similar to 15% relative to the standard (IASP91) model is terminated at a depth of similar to 200 km. The average thickness of the mantle transition zone (TZ) is evaluated from the time difference between the S410p and SKS660p, seismic phases that are robustly detected in the S and SKS receiver functions. This thickness is practically similar to the standard IASP91 value of 250 km. and is characteristic of a large region of the North Atlantic outside the Azores plateau. Our data are indicative of a reduction of the S-wave velocity of several percent relative to the standard velocity in a depth interval from 460 to 500 km. This reduction is found in the nearest vicinities of the Azores, in the region sampled by the PRFs, but, as evidenced by SRFs, it is missing at a distance of a few hundred kilometers from the islands. We speculate that this anomaly may correspond to the source of a plume which generated the Azores hotspot. Previously, a low S velocity in this depth range was found with SRF techniques beneath a few other hotspots.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Copyright © 2013 Springer Netherlands.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Catastrophic events, such as wars and terrorist attacks, tornadoes and hurricanes, earthquakes, tsunamis, floods and landslides, are always accompanied by a large number of casualties. The size distribution of these casualties has separately been shown to follow approximate power law (PL) distributions. In this paper, we analyze the statistical distributions of the number of victims of catastrophic phenomena, in particular, terrorism, and find double PL behavior. This means that the data sets are better approximated by two PLs instead of a single one. We plot the PL parameters, corresponding to several events, and observe an interesting pattern in the charts, where the lines that connect each pair of points defining the double PLs are almost parallel to each other. A complementary data analysis is performed by means of the computation of the entropy. The results reveal relationships hidden in the data that may trigger a future comprehensive explanation of this type of phenomena.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Managing the physical and compute infrastructure of a large data center is an embodiment of a Cyber-Physical System (CPS). The physical parameters of the data center (such as power, temperature, pressure, humidity) are tightly coupled with computations, even more so in upcoming data centers, where the location of workloads can vary substantially due, for example, to workloads being moved in a cloud infrastructure hosted in the data center. In this paper, we describe a data collection and distribution architecture that enables gathering physical parameters of a large data center at a very high temporal and spatial resolutionof the sensor measurements. We think this is an important characteristic to enable more accurate heat-flow models of the data center andwith them, _and opportunities to optimize energy consumption. Havinga high resolution picture of the data center conditions, also enables minimizing local hotspots, perform more accurate predictive maintenance (pending failures in cooling and other infrastructure equipment can be more promptly detected) and more accurate billing. We detail this architecture and define the structure of the underlying messaging system that is used to collect and distribute the data. Finally, we show the results of a preliminary study of a typical data center radio environment.