992 resultados para clustered data
Resumo:
Most of existing motorway traffic safety studies using disaggregate traffic flow data aim at developing models for identifying real-time traffic risks by comparing pre-crash and non-crash conditions. One of serious shortcomings in those studies is that non-crash conditions are arbitrarily selected and hence, not representative, i.e. selected non-crash data might not be the right data comparable with pre-crash data; the non-crash/pre-crash ratio is arbitrarily decided and neglects the abundance of non-crash over pre-crash conditions; etc. Here, we present a methodology for developing a real-time MotorwaY Traffic Risk Identification Model (MyTRIM) using individual vehicle data, meteorological data, and crash data. Non-crash data are clustered into groups called traffic regimes. Thereafter, pre-crash data are classified into regimes to match with relevant non-crash data. Among totally eight traffic regimes obtained, four highly risky regimes were identified; three regime-based Risk Identification Models (RIM) with sufficient pre-crash data were developed. MyTRIM memorizes the latest risk evolution identified by RIM to predict near future risks. Traffic practitioners can decide MyTRIM’s memory size based on the trade-off between detection and false alarm rates. Decreasing the memory size from 5 to 1 precipitates the increase of detection rate from 65.0% to 100.0% and of false alarm rate from 0.21% to 3.68%. Moreover, critical factors in differentiating pre-crash and non-crash conditions are recognized and usable for developing preventive measures. MyTRIM can be used by practitioners in real-time as an independent tool to make online decision or integrated with existing traffic management systems.
Resumo:
Spatial data are now prevalent in a wide range of fields including environmental and health science. This has led to the development of a range of approaches for analysing patterns in these data. In this paper, we compare several Bayesian hierarchical models for analysing point-based data based on the discretization of the study region, resulting in grid-based spatial data. The approaches considered include two parametric models and a semiparametric model. We highlight the methodology and computation for each approach. Two simulation studies are undertaken to compare the performance of these models for various structures of simulated point-based data which resemble environmental data. A case study of a real dataset is also conducted to demonstrate a practical application of the modelling approaches. Goodness-of-fit statistics are computed to compare estimates of the intensity functions. The deviance information criterion is also considered as an alternative model evaluation criterion. The results suggest that the adaptive Gaussian Markov random field model performs well for highly sparse point-based data where there are large variations or clustering across the space; whereas the discretized log Gaussian Cox process produces good fit in dense and clustered point-based data. One should generally consider the nature and structure of the point-based data in order to choose the appropriate method in modelling a discretized spatial point-based data.
Resumo:
A computationally efficient agglomerative clustering algorithm based on multilevel theory is presented. Here, the data set is divided randomly into a number of partitions. The samples of each such partition are clustered separately using hierarchical agglomerative clustering algorithm to form sub-clusters. These are merged at higher levels to get the final classification. This algorithm leads to the same classification as that of hierarchical agglomerative clustering algorithm when the clusters are well separated. The advantages of this algorithm are short run time and small storage requirement. It is observed that the savings, in storage space and computation time, increase nonlinearly with the sample size.
Resumo:
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires which leads to delay in execution and significantly high energy consumption.In this paper, we propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures with heterogeneous interconnect. Our instruction scheduling algorithm achieves 35% and 40% reduction in communication energy, whereas the overall energy-delay product improves by 4.5% and 6.5% respectively for 2 cluster and 4 cluster machines with marginal increase (1.6% and 1.1%) in execution time. Our test bed uses the Trimaran compiler infrastructure.
Resumo:
This paper describes an algorithm for constructing the solid model (boundary representation) from pout data measured from the faces of the object. The poznt data is assumed to be clustered for each face. This algorithm does not require any compuiier model of the part to exist and does not require any topological infarmation about the part to be input by the user. The property that a convex solid can be constructed uniquely from geometric input alone is utilized in the current work. Any object can be represented a5 a combznatzon of convex solids. The proposed algorithm attempts to construct convex polyhedra from the given input. The polyhedra so obtained are then checked against the input data for containment and those polyhedra, that satisfy this check, are combined (using boolean union operation) to realise the solid model. Results of implementation are presented.
Resumo:
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
A principal hypothesis for the evolution of leks (rare and intensely competitive territorial aggregations) is that leks result from females preferring to mate with clustered males. This hypothesis predicts more female visits and higher mating success per male on larger leks. Evidence for and against this hypothesis has been presented by different studies, primarily of individual populations, but its generality has not yet been formally investigated. We took a meta-analytical approach towards formally examining the generality of such a female bias in lekking species. Using available published data and using female visits as an index of female mating bias, we estimated the shape of the relationship between lek size and total female visits to a lek, female visits per lekking male and, where available, per capita male mating success. Individual analyses showed that female visits generally increased with lek size across the majority of taxa surveyed; the meta-analysis indicated that this relationship with lek size was disproportionately positive. The findings from analysing per capita female visits were mixed, with an increase with lek size detected in half of the species, which were, however, widely distributed taxonomically. Taken together, these findings suggest that a female bias for clustered males may be a general process across lekking species. Nevertheless, the substantial variation seen in these relationships implies that other processes are also important. Analyses of per capita copulation success suggested that, more generally, increased per capita mating benefits may be an important selective factor in lek maintenance.
Resumo:
To better understand the floral diversity and the phylogeny of the Swertiinae (Gentianaceae), the internal transcribed spacer (ITS) region of nrDNA for 17 species and one outgroup was sequenced. Our data suggest that corolla type, gland shape, and corolla appendage are poorly correlated with the ITS phylogeny. The genus Swertia s.l. and the rotate group previously recognized based on the corolla types and gland shapes are polyphyletic. Four genera with simple protruding glands and three taxa with corolla appendages are not clustered as the monophyletic groups. Four separate clades, corresponding to the four sections, were identified in Swertia s.1. Lomatogoniopsis with the simple protruding gland type of the tubular group closely related to Lomatogonium of the rotate group. The deeply lobing corolla and concave foveae may be ancestral in the Swertiinae, while the tubular corolla and the protruding glands may have undergone convergent evolution.
Resumo:
We study the impact of heterogeneity of nodes, in terms of their energy, in wireless sensor networks that are hierarchically clustered. In these networks some of the nodes become cluster heads, aggregate the data of their cluster members and transmit it to the sink. We assume that a percentage of the population of sensor nodes is equipped with additional energy resources-this is a source of heterogeneity which may result from the initial setting or as the operation of the network evolves. We also assume that the sensors are randomly (uniformly) distributed and are not mobile, the coordinates of the sink and the dimensions of the sensor field are known. We show that the behavior of such sensor networks becomes very unstable once the first node dies, especially in the presence of node heterogeneity. Classical clustering protocols assume that all the nodes are equipped with the same amount of energy and as a result, they can not take full advantage of the presence of node heterogeneity. We propose SEP, a heterogeneous-aware protocol to prolong the time interval before the death of the first node (we refer to as stability period), which is crucial for many applications where the feedback from the sensor network must be reliable. SEP is based on weighted election probabilities of each node to become cluster head according to the remaining energy in each node. We show by simulation that SEP always prolongs the stability period compared to (and that the average throughput is greater than) the one obtained using current clustering protocols. We conclude by studying the sensitivity of our SEP protocol to heterogeneity parameters capturing energy imbalance in the network. We found that SEP yields longer stability region for higher values of extra energy brought by more powerful nodes.
Resumo:
Training data for supervised learning neural networks can be clustered such that the input/output pairs in each cluster are redundant. Redundant training data can adversely affect training time. In this paper we apply two clustering algorithms, ART2 -A and the Generalized Equality Classifier, to identify training data clusters and thus reduce the training data and training time. The approach is demonstrated for a high dimensional nonlinear continuous time mapping. The demonstration shows six-fold decrease in training time at little or no loss of accuracy in the handling of evaluation data.
Resumo:
In studies of radiation-induced DNA fragmentation and repair, analytical models may provide rapid and easy-to-use methods to test simple hypotheses regarding the breakage and rejoining mechanisms involved. The random breakage model, according to which lesions are distributed uniformly and independently of each other along the DNA, has been the model most used to describe spatial distribution of radiation-induced DNA damage. Recently several mechanistic approaches have been proposed that model clustered damage to DNA. In general, such approaches focus on the study of initial radiation-induced DNA damage and repair, without considering the effects of additional (unwanted and unavoidable) fragmentation that may take place during the experimental procedures. While most approaches, including measurement of total DNA mass below a specified value, allow for the occurrence of background experimental damage by means of simple subtractive procedures, a more detailed analysis of DNA fragmentation necessitates a more accurate treatment. We have developed a new, relatively simple model of DNA breakage and the resulting rejoining kinetics of broken fragments. Initial radiation-induced DNA damage is simulated using a clustered breakage approach, with three free parameters: the number of independently located clusters, each containing several DNA double-strand breaks (DSBs), the average number of DSBs within a cluster (multiplicity of the cluster), and the maximum allowed radius within which DSBs belonging to the same cluster are distributed. Random breakage is simulated as a special case of the DSB clustering procedure. When the model is applied to the analysis of DNA fragmentation as measured with pulsed-field gel electrophoresis (PFGE), the hypothesis that DSBs in proximity rejoin at a different rate from that of sparse isolated breaks can be tested, since the kinetics of rejoining of fragments of varying size may be followed by means of computer simulations. The problem of how to account for background damage from experimental handling is also carefully considered. We have shown that the conventional procedure of subtracting the background damage from the experimental data may lead to erroneous conclusions during the analysis of both initial fragmentation and DSB rejoining. Despite its relative simplicity, the method presented allows both the quantitative and qualitative description of radiation-induced DNA fragmentation and subsequent rejoining of double-stranded DNA fragments. (C) 2004 by Radiation Research Society.
Resumo:
Flooding is a major hazard in both rural and urban areas worldwide, but it is in urban areas that the impacts are most severe. An investigation of the ability of high resolution TerraSAR-X data to detect flooded regions in urban areas is described. An important application for this would be the calibration and validation of the flood extent predicted by an urban flood inundation model. To date, research on such models has been hampered by lack of suitable distributed validation data. The study uses a 3m resolution TerraSAR-X image of a 1-in-150 year flood near Tewkesbury, UK, in 2007, for which contemporaneous aerial photography exists for validation. The DLR SETES SAR simulator was used in conjunction with airborne LiDAR data to estimate regions of the TerraSAR-X image in which water would not be visible due to radar shadow or layover caused by buildings and taller vegetation, and these regions were masked out in the flood detection process. A semi-automatic algorithm for the detection of floodwater was developed, based on a hybrid approach. Flooding in rural areas adjacent to the urban areas was detected using an active contour model (snake) region-growing algorithm seeded using the un-flooded river channel network, which was applied to the TerraSAR-X image fused with the LiDAR DTM to ensure the smooth variation of heights along the reach. A simpler region-growing approach was used in the urban areas, which was initialized using knowledge of the flood waterline in the rural areas. Seed pixels having low backscatter were identified in the urban areas using supervised classification based on training areas for water taken from the rural flood, and non-water taken from the higher urban areas. Seed pixels were required to have heights less than a spatially-varying height threshold determined from nearby rural waterline heights. Seed pixels were clustered into urban flood regions based on their close proximity, rather than requiring that all pixels in the region should have low backscatter. This approach was taken because it appeared that urban water backscatter values were corrupted in some pixels, perhaps due to contributions from side-lobes of strong reflectors nearby. The TerraSAR-X urban flood extent was validated using the flood extent visible in the aerial photos. It turned out that 76% of the urban water pixels visible to TerraSAR-X were correctly detected, with an associated false positive rate of 25%. If all urban water pixels were considered, including those in shadow and layover regions, these figures fell to 58% and 19% respectively. These findings indicate that TerraSAR-X is capable of providing useful data for the calibration and validation of urban flood inundation models.
Resumo:
FoxC, FoxF, FoxL1 and FoxQ1 genes have been shown to be clustered in some animal genomes, with mesendodermal expression hypothesised as a selective force maintaining cluster integrity. Hypotheses are, however, constrained by a lack of data from the Lophotrochozoa. Here we characterise members of the FoxC, FoxF, FoxL1 and FoxQ1 families from the annelid Capitella teleta and the molluscs Lottia gigantea and Patella vulgata. We cloned FoxC, FoxF, FoxL1 and FoxQ1 genes from C. teleta, and FoxC, FoxF and FoxL1 genes from P. vulgata, and established their expression during development. We also examined their genomic organisation in C. teleta and L. gigantea, and investigated local syntenic relationships. Our results show mesodermal and anterior gut expression is a common feature of these genes in lophotrochozoans. In L. gigantea FoxC, FoxF and FoxL1 are closely linked, while in C. teleta Ct-foxC and Ct-foxL1 are closely linked, with Ct-foxF and Ct-foxQ1 on different scaffolds. Adjacent to these genes there is limited evidence of local synteny. This demonstrates conservation of genomic organisation and expression of these genes can be traced in all three bilaterian Superphyla. These data are evaluated against competing theories for the long-term maintenance of gene clusters.
Resumo:
The performance of flood inundation models is often assessed using satellite observed data; however these data have inherent uncertainty. In this study we assess the impact of this uncertainty when calibrating a flood inundation model (LISFLOOD-FP) for a flood event in December 2006 on the River Dee, North Wales, UK. The flood extent is delineated from an ERS-2 SAR image of the event using an active contour model (snake), and water levels at the flood margin calculated through intersection of the shoreline vector with LiDAR topographic data. Gauged water levels are used to create a reference water surface slope for comparison with the satellite-derived water levels. Residuals between the satellite observed data points and those from the reference line are spatially clustered into groups of similar values. We show that model calibration achieved using pattern matching of observed and predicted flood extent is negatively influenced by this spatial dependency in the data. By contrast, model calibration using water elevations produces realistic calibrated optimum friction parameters even when spatial dependency is present. To test the impact of removing spatial dependency a new method of evaluating flood inundation model performance is developed by using multiple random subsamples of the water surface elevation data points. By testing for spatial dependency using Moran’s I, multiple subsamples of water elevations that have no significant spatial dependency are selected. The model is then calibrated against these data and the results averaged. This gives a near identical result to calibration using spatially dependent data, but has the advantage of being a statistically robust assessment of model performance in which we can have more confidence. Moreover, by using the variations found in the subsamples of the observed data it is possible to assess the effects of observational uncertainty on the assessment of flooding risk.