901 resultados para Subfractals, Subfractal Coding, Model Analysis, Digital Imaging, Pattern Recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El projecte consisteix en l'estudi i avaluació de diferents alternatives existents al mercat per a realitzar l'anàlisi i desenvolupament d'un conjunt de components que constitueixin un marc de treball per a simplificar i agilitzar el desenvolupament de la capa de presentació per a les aplicacions de client prim d'un determinat Framework desenvolupades amb la plataforma J2EE i basats en el patró de disseny Model-Vista-Controlador.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a method for segmenting white matter tracts from high angular resolution diffusion MR. images by representing the data in a 5 dimensional space of position and orientation. Whereas crossing fiber tracts cannot be separated in 3D position space, they clearly disentangle in 5D position-orientation space. The segmentation is done using a 5D level set method applied to hyper-surfaces evolving in 5D position-orientation space. In this paper we present a methodology for constructing the position-orientation space. We then show how to implement the standard level set method in such a non-Euclidean high dimensional space. The level set theory is basically defined for N-dimensions but there are several practical implementation details to consider, such as mean curvature. Finally, we will show results from a synthetic model and a few preliminary results on real data of a human brain acquired by high angular resolution diffusion MRI.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At CoDaWork'03 we presented work on the analysis of archaeological glass composi-tional data. Such data typically consist of geochemical compositions involving 10-12variables and approximates completely compositional data if the main component, sil-ica, is included. We suggested that what has been termed `crude' principal componentanalysis (PCA) of standardized data often identi ed interpretable pattern in the datamore readily than analyses based on log-ratio transformed data (LRA). The funda-mental problem is that, in LRA, minor oxides with high relative variation, that maynot be structure carrying, can dominate an analysis and obscure pattern associatedwith variables present at higher absolute levels. We investigate this further using sub-compositional data relating to archaeological glasses found on Israeli sites. A simplemodel for glass-making is that it is based on a `recipe' consisting of two `ingredients',sand and a source of soda. Our analysis focuses on the sub-composition of componentsassociated with the sand source. A `crude' PCA of standardized data shows two clearcompositional groups that can be interpreted in terms of di erent recipes being used atdi erent periods, reected in absolute di erences in the composition. LRA analysis canbe undertaken either by normalizing the data or de ning a `residual'. In either case,after some `tuning', these groups are recovered. The results from the normalized LRAare di erently interpreted as showing that the source of sand used to make the glassdi ered. These results are complementary. One relates to the recipe used. The otherrelates to the composition (and presumed sources) of one of the ingredients. It seemsto be axiomatic in some expositions of LRA that statistical analysis of compositionaldata should focus on relative variation via the use of ratios. Our analysis suggests thatabsolute di erences can also be informative

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The specimen distribution pattern of a species can be used to characterise a population of interest and also provides area-specific guidance for pest management and control. In the municipality of Dracena, in the state of São Paulo, we analysed 5,889 Lutzomyia longipalpis specimens collected from the peridomiciles of 14 houses in a sector where American visceral leishmaniasis (AVL) is transmitted to humans and dogs. The goal was to analyse the dispersion and a theoretical fitting of the species occurrence probability. From January-December 2005, samples were collected once per week using CDC light traps that operated for 12-h periods. Each collection was considered a sub-sample and was evaluated monthly. The standardised Morisita index was used as a measure of dispersion. Adherence tests were performed for the log-series distribution. The number of traps was used to adjust the octave plots. The quantity of Lu. longipalpis in the sector was highly aggregated for each month of the year, adhering to a log-series distribution for 11 of the 12 months analysed. A sex-stratified analysis demonstrated a pattern of aggregated dispersion adjusted for each month of the year. The classes and frequencies of the traps in octaves can be employed as indicators for entomological surveillance and AVL control.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Evaluation of segmentation methods is a crucial aspect in image processing, especially in the medical imaging field, where small differences between segmented regions in the anatomy can be of paramount importance. Usually, segmentation evaluation is based on a measure that depends on the number of segmented voxels inside and outside of some reference regions that are called gold standards. Although some other measures have been also used, in this work we propose a set of new similarity measures, based on different features, such as the location and intensity values of the misclassified voxels, and the connectivity and the boundaries of the segmented data. Using the multidimensional information provided by these measures, we propose a new evaluation method whose results are visualized applying a Principal Component Analysis of the data, obtaining a simplified graphical method to compare different segmentation results. We have carried out an intensive study using several classic segmentation methods applied to a set of MRI simulated data of the brain with several noise and RF inhomogeneity levels, and also to real data, showing that the new measures proposed here and the results that we have obtained from the multidimensional evaluation, improve the robustness of the evaluation and provides better understanding about the difference between segmentation methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Canonical correspondence analysis and redundancy analysis are two methods of constrained ordination regularly used in the analysis of ecological data when several response variables (for example, species abundances) are related linearly to several explanatory variables (for example, environmental variables, spatial positions of samples). In this report I demonstrate the advantages of the fuzzy coding of explanatory variables: first, nonlinear relationships can be diagnosed; second, more variance in the responses can be explained; and third, in the presence of categorical explanatory variables (for example, years, regions) the interpretation of the resulting triplot ordination is unified because all explanatory variables are measured at a categorical level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Closely related species may be very difficult to distinguish morphologically, yet sometimes morphology is the only reasonable possibility for taxonomic classification. Here we present learning-vector-quantization artificial neural networks as a powerful tool to classify specimens on the basis of geometric morphometric shape measurements. As an example, we trained a neural network to distinguish between field and root voles from Procrustes transformed landmark coordinates on the dorsal side of the skull, which is so similar in these two species that the human eye cannot make this distinction. Properly trained neural networks misclassified only 3% of specimens. Therefore, we conclude that the capacity of learning vector quantization neural networks to analyse spatial coordinates is a powerful tool among the range of pattern recognition procedures that is available to employ the information content of geometric morphometrics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a dynamic choice model in the attributespace considering rational consumers that discount the future. In lightof the evidence of several state-dependence patterns, the model isfurther extended by considering a utility function that allows for thedifferent types of behavior described in the literature: pure inertia,pure variety seeking and hybrid. The model presents a stationaryconsumption pattern that can be inertial, where the consumer only buysone product, or a variety-seeking one, where the consumer buys severalproducts simultane-ously. Under the inverted-U marginal utilityassumption, the consumer behaves inertial among the existing brands forseveral periods, and eventually, once the stationary levels areapproached, the consumer turns to a variety-seeking behavior. An empiricalanalysis is run using a scanner database for fabric softener andsignificant evidence of hybrid behavior for most attributes is found,which supports the functional form considered in the theory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although increasing our knowledge of the properties of networks of cities is essential, these properties can be measured at the city level, and must be assessed by analyzing actor networks. The present volume focuses less on individual characteristics and more on the interactions of actors and institutions that create functional territories in which the structure of existing links constrains emerging links. Rather than basing explanations on external factors, the goal is to determine the extent to which network properties reflect spatial distributions and create local synergies at the meso level that are incorporated into global networks at the macro level where different geographical scales occur. The paper introduces the way to use the graphs structure to identify empirically relevant groups and levels that explain dynamics. It defines what could be called âeurooemulti-levelâeuro, âeurooemulti-scaleâeuro, or âeurooemultidimensionalâeuro networks in the context of urban geography. It explains how the convergence of the network multi-territoriality paradigm collaboratively formulated, and manipulated by geographers and computer scientists produced the SPANGEO project, which is exposed in this volume.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The theory of small-world networks as initiated by Watts and Strogatz (1998) has drawn new insights in spatial analysis as well as systems theory. The theoryâeuro?s concepts and methods are particularly relevant to geography, where spatial interaction is mainstream and where interactions can be described and studied using large numbers of exchanges or similarity matrices. Networks are organized through direct links or by indirect paths, inducing topological proximities that simultaneously involve spatial, social, cultural or organizational dimensions. Network synergies build over similarities and are fed by complementarities between or inside cities, with the two effects potentially amplifying each other according to the âeurooepreferential attachmentâeuro hypothesis that has been explored in a number of different scientific fields (Barabási, Albert 1999; Barabási A-L 2002; Newman M, Watts D, Barabàsi A-L). In fact, according to Barabási and Albert (1999), the high level of hierarchy observed in âeurooescale-free networksâeuro results from âeurooepreferential attachmentâeuro, which characterizes the development of networks: new connections appear preferentially close to nodes that already have the largest number of connections because in this way, the improvement in the network accessibility of the new connection will likely be greater. However, at the same time, network regions gathering dense and numerous weak links (Granovetter, 1985) or network entities acting as bridges between several components (Burt 2005) offer a higher capacity for urban communities to benefit from opportunities and create future synergies. Several methodologies have been suggested to identify such denser and more coherent regions (also called communities or clusters) in terms of links (Watts, Strogatz 1998; Watts 1999; Barabási, Albert 1999; Barabási 2002; Auber 2003; Newman 2006). These communities not only possess a high level of dependency among their member entities but also show a low level of âeurooevulnerabilityâeuro, allowing for numerous redundancies (Burt 2000; Burt 2005). The SPANGEO project 2005âeuro"2008 (SPAtial Networks in GEOgraphy), gathering a team of geographers and computer scientists, has included empirical studies to survey concepts and measures developed in other related fields, such as physics, sociology and communication science. The relevancy and potential interpretation of weighted or non-weighted measures on edges and nodes were examined and analyzed at different scales (intra-urban, inter-urban or both). New classification and clustering schemes based on the relative local density of subgraphs were developed. The present article describes how these notions and methods contribute on a conceptual level, in terms of measures, delineations, explanatory analyses and visualization of geographical phenomena.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The reliable and objective assessment of chronic disease state has been and still is a very significant challenge in clinical medicine. An essential feature of human behavior related to the health status, the functional capacity, and the quality of life is the physical activity during daily life. A common way to assess physical activity is to measure the quantity of body movement. Since human activity is controlled by various factors both extrinsic and intrinsic to the body, quantitative parameters only provide a partial assessment and do not allow for a clear distinction between normal and abnormal activity. In this paper, we propose a methodology for the analysis of human activity pattern based on the definition of different physical activity time series with the appropriate analysis methods. The temporal pattern of postures, movements, and transitions between postures was quantified using fractal analysis and symbolic dynamics statistics. The derived nonlinear metrics were able to discriminate patterns of daily activity generated from healthy and chronic pain states.