912 resultados para hierarchical clustering techniques
Resumo:
Satellite remote sensing has proved to be an effective support in timely detection and monitoring of marine oil pollution, mainly due to illegal ship discharges. In this context, we have developed a new methodology and technique for optical oil spill detection, which make use of MODIS L2 and MERIS L1B satellite top of atmosphere (TOA) reflectance imagery, for the first time in a highly automated way. The main idea was combining wide swaths and short revisit times of optical sensors with SAR observations, generally used in oil spill monitoring. This arises from the necessity to overcome the SAR reduced coverage and long revisit time of the monitoring area. This can be done now, given the MODIS and MERIS higher spatial resolution with respect to older sensors (250-300 m vs. 1 km), which consents the identification of smaller spills deriving from illicit discharge at sea. The procedure to obtain identifiable spills in optical reflectance images involves removal of oceanic and atmospheric natural variability, in order to enhance oil-water contrast; image clustering, which purpose is to segment the oil spill eventually presents in the image; finally, the application of a set of criteria for the elimination of those features which look like spills (look-alikes). The final result is a classification of oil spill candidate regions by means of a score based on the above criteria.
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general. However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of "intelligence". The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them. CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems. HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain. In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.
Resumo:
For smart applications, nodes in wireless multimedia sensor networks (MWSNs) have to take decisions based on sensed scalar physical measurements. A routing protocol must provide the multimedia delivery with quality level support and be energy-efficient for large-scale networks. With this goal in mind, this paper proposes a smart Multi-hop hierarchical routing protocol for Efficient VIdeo communication (MEVI). MEVI combines an opportunistic scheme to create clusters, a cross-layer solution to select routes based on network conditions, and a smart solution to trigger multimedia transmission according to sensed data. Simulations were conducted to show the benefits of MEVI compared with the well-known Low-Energy Adaptive Clustering Hierarchy (LEACH) protocol. This paper includes an analysis of the signaling overhead, energy-efficiency, and video quality.
Resumo:
In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.
Resumo:
Functional neuroimaging techniques enable investigations into the neural basis of human cognition, emotions, and behaviors. In practice, applications of functional magnetic resonance imaging (fMRI) have provided novel insights into the neuropathophysiology of major psychiatric,neurological, and substance abuse disorders, as well as into the neural responses to their treatments. Modern activation studies often compare localized task-induced changes in brain activity between experimental groups. One may also extend voxel-level analyses by simultaneously considering the ensemble of voxels constituting an anatomically defined region of interest (ROI) or by considering means or quantiles of the ROI. In this work we present a Bayesian extension of voxel-level analyses that offers several notable benefits. First, it combines whole-brain voxel-by-voxel modeling and ROI analyses within a unified framework. Secondly, an unstructured variance/covariance for regional mean parameters allows for the study of inter-regional functional connectivity, provided enough subjects are available to allow for accurate estimation. Finally, an exchangeable correlation structure within regions allows for the consideration of intra-regional functional connectivity. We perform estimation for our model using Markov Chain Monte Carlo (MCMC) techniques implemented via Gibbs sampling which, despite the high throughput nature of the data, can be executed quickly (less than 30 minutes). We apply our Bayesian hierarchical model to two novel fMRI data sets: one considering inhibitory control in cocaine-dependent men and the second considering verbal memory in subjects at high risk for Alzheimer’s disease. The unifying hierarchical model presented in this manuscript is shown to enhance the interpretation content of these data sets.
Resumo:
As the performance gap between microprocessors and memory continues to increase, main memory accesses result in long latencies which become a factor limiting system performance. Previous studies show that main memory access streams contain significant localities and SDRAM devices provide parallelism through multiple banks and channels. These locality and parallelism have not been exploited thoroughly by conventional memory controllers. In this thesis, SDRAM address mapping techniques and memory access reordering mechanisms are studied and applied to memory controller design with the goal of reducing observed main memory access latency. The proposed bit-reversal address mapping attempts to distribute main memory accesses evenly in the SDRAM address space to enable bank parallelism. As memory accesses to unique banks are interleaved, the access latencies are partially hidden and therefore reduced. With the consideration of cache conflict misses, bit-reversal address mapping is able to direct potential row conflicts to different banks, further improving the performance. The proposed burst scheduling is a novel access reordering mechanism, which creates bursts by clustering accesses directed to the same rows of the same banks. Subjected to a threshold, reads are allowed to preempt writes and qualified writes are piggybacked at the end of the bursts. A sophisticated access scheduler selects accesses based on priorities and interleaves accesses to maximize the SDRAM data bus utilization. Consequentially burst scheduling reduces row conflict rate, increasing and exploiting the available row locality. Using a revised SimpleScalar and M5 simulator, both techniques are evaluated and compared with existing academic and industrial solutions. With SPEC CPU2000 benchmarks, bit-reversal reduces the execution time by 14% on average over traditional page interleaving address mapping. Burst scheduling also achieves a 15% reduction in execution time over conventional bank in order scheduling. Working constructively together, bit-reversal and burst scheduling successfully achieve a 19% speedup across simulated benchmarks.
Resumo:
Prevotella nigrescens, Prevotella intermedia and Porphyromonas gingivalis are oral pathogens from the family Bacteroidaceae, regularly isolated from cases of gingivitis and periodontitis. In this study, the phylogenetic variability of these three bacterial species was investigated by means of 16S rRNA (rrs) gene sequence comparisons of a set of epidemiologically and geographically diverse isolates. For each of the three species, the rrs gene sequences of 11 clinical isolates as well as the corresponding type strains was determined. Comparison of all rrs sequences obtained with those of closely related species revealed a clear clustering of species, with only a little intraspecies variability but a clear difference in the rrs gene with respect to the next related taxon. The results indicate that the three species form stable, homogeneous genetic groups, which favours an rrs-based species identification of these oral pathogens. This is especially useful given the 7% sequence divergence between Prevotella intermedia and Prevotella nigrescens, since phenotypic distinction between the two Prevotella species is inconsistent or involves techniques not applicable in routine identification.
Resumo:
OBJECTIVES In dental research multiple site observations within patients or taken at various time intervals are commonplace. These clustered observations are not independent; statistical analysis should be amended accordingly. This study aimed to assess whether adjustment for clustering effects during statistical analysis was undertaken in five specialty dental journals. METHODS Thirty recent consecutive issues of Orthodontics (OJ), Periodontology (PJ), Endodontology (EJ), Maxillofacial (MJ) and Paediatric Dentristry (PDJ) journals were hand searched. Articles requiring adjustment accounting for clustering effects were identified and statistical techniques used were scrutinized. RESULTS Of 559 studies considered to have inherent clustering effects, adjustment for this was made in the statistical analysis in 223 (39.1%). Studies published in the Periodontology specialty accounted for clustering effects in the statistical analysis more often than articles published in other journals (OJ vs. PJ: OR=0.21, 95% CI: 0.12, 0.37, p<0.001; MJ vs. PJ: OR=0.02, 95% CI: 0.00, 0.07, p<0.001; PDJ vs. PJ: OR=0.14, 95% CI: 0.07, 0.28, p<0.001; EJ vs. PJ: OR=0.11, 95% CI: 0.06, 0.22, p<0.001). A positive correlation was found between increasing prevalence of clustering effects in individual specialty journals and correct statistical handling of clustering (r=0.89). CONCLUSIONS The majority of studies in 5 dental specialty journals (60.9%) examined failed to account for clustering effects in statistical analysis where indicated, raising the possibility of inappropriate decreases in p-values and the risk of inappropriate inferences.
Resumo:
AIMS:Duchenne muscular dystrophy (DMD) is a muscle disease with serious cardiac complications. Changes in Ca(2+) homeostasis and oxidative stress were recently associated with cardiac deterioration, but the cellular pathophysiological mechanisms remain elusive. We investigated whether the activity of ryanodine receptor (RyR) Ca(2+) release channels is affected, whether changes in function are cause or consequence and which post-translational modifications drive disease progression. METHODS AND RESULTS:Electrophysiological, imaging, and biochemical techniques were used to study RyRs in cardiomyocytes from mdx mice, an animal model of DMD. Young mdx mice show no changes in cardiac performance, but do so after ∼8 months. Nevertheless, myocytes from mdx pups exhibited exaggerated Ca(2+) responses to mechanical stress and 'hypersensitive' excitation-contraction coupling, hallmarks of increased RyR Ca(2+) sensitivity. Both were normalized by antioxidants, inhibitors of NAD(P)H oxidase and CaMKII, but not by NO synthases and PKA antagonists. Sarcoplasmic reticulum Ca(2+) load and leak were unchanged in young mdx mice. However, by the age of 4-5 months and in senescence, leak was increased and load was reduced, indicating disease progression. By this age, all pharmacological interventions listed above normalized Ca(2+) signals and corrected changes in ECC, Ca(2+) load, and leak. CONCLUSION:Our findings suggest that increased RyR Ca(2+) sensitivity precedes and presumably drives the progression of dystrophic cardiomyopathy, with oxidative stress initiating its development. RyR oxidation followed by phosphorylation, first by CaMKII and later by PKA, synergistically contributes to cardiac deterioration.
Resumo:
We consider the problem of fitting a union of subspaces to a collection of data points drawn from one or more subspaces and corrupted by noise and/or gross errors. We pose this problem as a non-convex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and self-expressive dictionary plus a matrix of noise and/or gross errors. By self-expressive we mean a dictionary whose atoms can be expressed as linear combinations of themselves with low-rank coefficients. In the case of noisy data, our key contribution is to show that this non-convex matrix decomposition problem can be solved in closed form from the SVD of the noisy data matrix. The solution involves a novel polynomial thresholding operator on the singular values of the data matrix, which requires minimal shrinkage. For one subspace, a particular case of our framework leads to classical PCA, which requires no shrinkage. For multiple subspaces, the low-rank coefficients obtained by our framework can be used to construct a data affinity matrix from which the clustering of the data according to the subspaces can be obtained by spectral clustering. In the case of data corrupted by gross errors, we solve the problem using an alternating minimization approach, which combines our polynomial thresholding operator with the more traditional shrinkage-thresholding operator. Experiments on motion segmentation and face clustering show that our framework performs on par with state-of-the-art techniques at a reduced computational cost.
Resumo:
Randomised controlled trials (RCTs) of psychotherapeutic interventions assume that specific techniques are used in treatments, which are responsible for changes in the client's symptoms. This assumption also holds true for meta-analyses, where evidence for specific interventions and techniques is compiled. However, it has also been argued that different treatments share important techniques and that an upcoming consensus about useful treatment strategies is leading to a greater integration of treatments. This makes assumptions about the effectiveness of specific interventions ingredients questionable if the shared (common) techniques are more often used in interventions than are the unique techniques. This study investigated the unique or shared techniques in RCTs of cognitive-behavioural therapy (CBT) and short-term psychodynamic psychotherapy (STPP). Psychotherapeutic techniques were coded from 42 masked treatment descriptions of RCTs in the field of depression (1979-2010). CBT techniques were often used in studies identified as either CBT or STPP. However, STPP techniques were only used in STPP-identified studies. Empirical clustering of treatment descriptions did not confirm the original distinction of CBT versus STPP, but instead showed substantial heterogeneity within both approaches. Extraction of psychotherapeutic techniques from the treatment descriptions is feasible and could be used as a content-based approach to classify treatments in systematic reviews and meta-analyses.
Resumo:
This study aims at assessing the skill of several climate field reconstruction techniques (CFR) to reconstruct past precipitation over continental Europe and the Mediterranean at seasonal time scales over the last two millennia from proxy records. A number of pseudoproxy experiments are performed within the virtual reality ofa regional paleoclimate simulation at 45 km resolution to analyse different aspects of reconstruction skill. Canonical Correlation Analysis (CCA), two versions of an Analog Method (AM) and Bayesian hierarchical modeling (BHM) are applied to reconstruct precipitation from a synthetic network of pseudoproxies that are contaminated with various types of noise. The skill of the derived reconstructions is assessed through comparison with precipitation simulated by the regional climate model. Unlike BHM, CCA systematically underestimates the variance. The AM can be adjusted to overcome this shortcoming, presenting an intermediate behaviour between the two aforementioned techniques. However, a trade-off between reconstruction-target correlations and reconstructed variance is the drawback of all CFR techniques. CCA (BHM) presents the largest (lowest) skill in preserving the temporal evolution, whereas the AM can be tuned to reproduce better correlation at the expense of losing variance. While BHM has been shown to perform well for temperatures, it relies heavily on prescribed spatial correlation lengths. While this assumption is valid for temperature, it is hardly warranted for precipitation. In general, none of the methods outperforms the other. All experiments agree that a dense and regularly distributed proxy network is required to reconstruct precipitation accurately, reflecting its high spatial and temporal variability. This is especially true in summer, when a specifically short de-correlation distance from the proxy location is caused by localised summertime convective precipitation events.
Resumo:
Point Distribution Models (PDM) are among the most popular shape description techniques and their usefulness has been demonstrated in a wide variety of medical imaging applications. However, to adequately characterize the underlying modeled population it is essential to have a representative number of training samples, which is not always possible. This problem is especially relevant as the complexity of the modeled structure increases, being the modeling of ensembles of multiple 3D organs one of the most challenging cases. In this paper, we introduce a new GEneralized Multi-resolution PDM (GEM-PDM) in the context of multi-organ analysis able to efficiently characterize the different inter-object relations, as well as the particular locality of each object separately. Importantly, unlike previous approaches, the configuration of the algorithm is automated thanks to a new agglomerative landmark clustering method proposed here, which equally allows us to identify smaller anatomically significant regions within organs. The significant advantage of the GEM-PDM method over two previous approaches (PDM and hierarchical PDM) in terms of shape modeling accuracy and robustness to noise, has been successfully verified for two different databases of sets of multiple organs: six subcortical brain structures, and seven abdominal organs. Finally, we propose the integration of the new shape modeling framework into an active shape-model-based segmentation algorithm. The resulting algorithm, named GEMA, provides a better overall performance than the two classical approaches tested, ASM, and hierarchical ASM, when applied to the segmentation of 3D brain MRI.
Resumo:
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a noninvasive technique for quantitative assessment of the integrity of blood-brain barrier and blood-spinal cord barrier (BSCB) in the presence of central nervous system pathologies. However, the results of DCE-MRI show substantial variability. The high variability can be caused by a number of factors including inaccurate T1 estimation, insufficient temporal resolution and poor contrast-to-noise ratio. My thesis work is to develop improved methods to reduce the variability of DCE-MRI results. To obtain fast and accurate T1 map, the Look-Locker acquisition technique was implemented with a novel and truly centric k-space segmentation scheme. In addition, an original multi-step curve fitting procedure was developed to increase the accuracy of T1 estimation. A view sharing acquisition method was implemented to increase temporal resolution, and a novel normalization method was introduced to reduce image artifacts. Finally, a new clustering algorithm was developed to reduce apparent noise in the DCE-MRI data. The performance of these proposed methods was verified by simulations and phantom studies. As part of this work, the proposed techniques were applied to an in vivo DCE-MRI study of experimental spinal cord injury (SCI). These methods have shown robust results and allow quantitative assessment of regions with very low vascular permeability. In conclusion, applications of the improved DCE-MRI acquisition and analysis methods developed in this thesis work can improve the accuracy of the DCE-MRI results.