854 resultados para data gathering algorithm
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
The multiscale finite volume (MsFV) method has been developed to efficiently solve large heterogeneous problems (elliptic or parabolic); it is usually employed for pressure equations and delivers conservative flux fields to be used in transport problems. The method essentially relies on the hypothesis that the (fine-scale) problem can be reasonably described by a set of local solutions coupled by a conservative global (coarse-scale) problem. In most cases, the boundary conditions assigned for the local problems are satisfactory and the approximate conservative fluxes provided by the method are accurate. In numerically challenging cases, however, a more accurate localization is required to obtain a good approximation of the fine-scale solution. In this paper we develop a procedure to iteratively improve the boundary conditions of the local problems. The algorithm relies on the data structure of the MsFV method and employs a Krylov-subspace projection method to obtain an unconditionally stable scheme and accelerate convergence. Two variants are considered: in the first, only the MsFV operator is used; in the second, the MsFV operator is combined in a two-step method with an operator derived from the problem solved to construct the conservative flux field. The resulting iterative MsFV algorithms allow arbitrary reduction of the solution error without compromising the construction of a conservative flux field, which is guaranteed at any iteration. Since it converges to the exact solution, the method can be regarded as a linear solver. In this context, the schemes proposed here can be viewed as preconditioned versions of the Generalized Minimal Residual method (GMRES), with a very peculiar characteristic that the residual on the coarse grid is zero at any iteration (thus conservative fluxes can be obtained).
Resumo:
This paper proposes a parallel architecture for estimation of the motion of an underwater robot. It is well known that image processing requires a huge amount of computation, mainly at low-level processing where the algorithms are dealing with a great number of data. In a motion estimation algorithm, correspondences between two images have to be solved at the low level. In the underwater imaging, normalised correlation can be a solution in the presence of non-uniform illumination. Due to its regular processing scheme, parallel implementation of the correspondence problem can be an adequate approach to reduce the computation time. Taking into consideration the complexity of the normalised correlation criteria, a new approach using parallel organisation of every processor from the architecture is proposed
Resumo:
Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process
Resumo:
Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.
Resumo:
Le rétinoblastome (Rb) est une tumeur provenant des cellules rétiniennes progénitrices des photorécepteurs. C'est la tumeur pédiatrique maligne la plus fréquente avec une incidence par naissance évaluée entre 1/15Ό00 et 1/20Ό00. Les enfants atteints de Rb sont diagnostiqué dans leur grande majorité avant l'âge de 4 ans, soit le temps nécessaire à la différentiation et à la maturation des photorécepteurs et donc à la disparition de la cellule d'origine du Rb. La survie du patient, la sauvegarde oculaire et le pronostic visuel restent excellents pour autant que le traitement ne soit pas différé. Dans sa variante non héréditaire (60%) le Rb est toujours unilatéral et sporadique. Le Rb héréditaire de transmission dominante autosomique (40%), se décline sous toutes les formes, familiale (10%) ou sporadique (30%), que l'atteinte soit unilatérale ou bilatérale. La majorité des mutations causales sont uniques et distribuées de façon aléatoire sur la totalité du gène RB1 sans région prédisposante. La détection de ces mutations est couteuse et chronophage, tout en présentant un taux de détection relativement bas; surtout dans les cas de Rb sporadiques unilatéraux. Dans le but d'identifier les patients présentant un risque réel de développer un Rb, et de réduire le nombre d'examens sous narcose requis pour le dépistage de la maladie chez les sujets à risque, nous avons développé une stratégie sensible, rapide, efficace et peu couteuse basée sur une analyse de l'haplotype intragénique. Cet algorithme prend en compte a) la perte d'hétérozygotie intratumorale du gène RB1, b) l'origine paternelle préférentielle des nouvelles mutations germinales et c) un risque a priori dérivé des données empiriques de Vogel. Pendant la période allant de janvier 1994 à décembre 2006, nous avons comparé l'apparition de nouveau Rb parmi la fratrie et la descendance de patient atteints au nombre de nouveaux cas attendus calculé par notre algorithme. 134 familles ont été étudiées. L'analyse moléculaire a été effectuée chez 570 personnes dont 99 patients âgés de moins de 4 ans et donc à risque de développer un Rb. Parmi cette cohorte, nous avons observé l'apparition d'un cas de Rb, alors que les risques cumulés a posteriori calculé par notre algorithme prédisait l'apparition de 1.77 nouveau cas. Dans cette étude, nous avons pu valider notre algorithme prédisant la récurrence de Rb chez les parents de 1er degré de patients atteints. Cet outil devrait grandement faciliter le conseil génétique ainsi que le suivi des patients à risque de développer un Rb, surtout dans les cas ou le séquençage direct du gène RB1 n'est pas disponible ou est resté non informatif. - Purpose: Most RBI mutations are unique and distributed throughout the RBI gene. Their detection can be time-consuming and the yield especially low in cases of conservatively-treated sporadic unilateral retinoblas-toma (Rb) patients. In order to identify patients with true risk of developing Rb, and to reduce the number of unnecessary examinations under anesthesia in all other cases, we developed a universal sensitive, efficient and cost-effective strategy based on intragenic haplotype analysis. Methods: This algorithm allows the calculation of the a posteriori risk of developing Rb and takes into account (a) RBI loss of heterozygosity in tumors, (b) preferential paternal origin of new germline mutations, (c) a priori risk derived from empirical data by Vogel, and (d) disease penetrance of 90% in most cases. We report the occurrence of Rb in first degree relatives of patients with sporadic Rb who visited the Jules Gonin Eye Hospital, Lausanne, Switzerland, from January 1994 to December 2006 compared to expected new cases of Rb using our algorithm. Results: A total of 134 families with sporadic Rb were enrolled; testing was performed in 570 individuals and 99 patients younger than 4 years old were identified. We observed one new case of Rb. Using our algorithm, the cumulated total a posteriori risk of recurrence was 1.77. Conclusions: This is the first time that linkage analysis has been validated to monitor the risk of recurrence in sporadic Rb. This should be a useful tool in genetic counseling, especially when direct RBI screening for mutations leaves a negative result or is unavailable.
Resumo:
The 2008 Data Fusion Contest organized by the IEEE Geoscience and Remote Sensing Data Fusion Technical Committee deals with the classification of high-resolution hyperspectral data from an urban area. Unlike in the previous issues of the contest, the goal was not only to identify the best algorithm but also to provide a collaborative effort: The decision fusion of the best individual algorithms was aiming at further improving the classification performances, and the best algorithms were ranked according to their relative contribution to the decision fusion. This paper presents the five awarded algorithms and the conclusions of the contest, stressing the importance of decision fusion, dimension reduction, and supervised classification methods, such as neural networks and support vector machines.
Resumo:
BACKGROUND & AIMS Hy's Law, which states that hepatocellular drug-induced liver injury (DILI) with jaundice indicates a serious reaction, is used widely to determine risk for acute liver failure (ALF). We aimed to optimize the definition of Hy's Law and to develop a model for predicting ALF in patients with DILI. METHODS We collected data from 771 patients with DILI (805 episodes) from the Spanish DILI registry, from April 1994 through August 2012. We analyzed data collected at DILI recognition and at the time of peak levels of alanine aminotransferase (ALT) and total bilirubin (TBL). RESULTS Of the 771 patients with DILI, 32 developed ALF. Hepatocellular injury, female sex, high levels of TBL, and a high ratio of aspartate aminotransferase (AST):ALT were independent risk factors for ALF. We compared 3 ways to use Hy's Law to predict which patients would develop ALF; all included TBL greater than 2-fold the upper limit of normal (×ULN) and either ALT level greater than 3 × ULN, a ratio (R) value (ALT × ULN/alkaline phosphatase × ULN) of 5 or greater, or a new ratio (nR) value (ALT or AST, whichever produced the highest ×ULN/ alkaline phosphatase × ULN value) of 5 or greater. At recognition of DILI, the R- and nR-based models identified patients who developed ALF with 67% and 63% specificity, respectively, whereas use of only ALT level identified them with 44% specificity. However, the level of ALT and the nR model each identified patients who developed ALF with 90% sensitivity, whereas the R criteria identified them with 83% sensitivity. An equal number of patients who did and did not develop ALF had alkaline phosphatase levels greater than 2 × ULN. An algorithm based on AST level greater than 17.3 × ULN, TBL greater than 6.6 × ULN, and AST:ALT greater than 1.5 identified patients who developed ALF with 82% specificity and 80% sensitivity. CONCLUSIONS When applied at DILI recognition, the nR criteria for Hy's Law provides the best balance of sensitivity and specificity whereas our new composite algorithm provides additional specificity in predicting the ultimate development of ALF.
Resumo:
Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices
Resumo:
Objective: The Agency for Healthcare Research and Quality (AHRQ) developed Patient Safety Indicators (PSIs) for use with ICD-9-CM data. Many countries have adopted ICD-10 for coding hospital diagnoses. We conducted this study to develop an internationally harmonized ICD-10 coding algorithm for the AHRQ PSIs. Methods: The AHRQ PSI Version 2.1 has been translated into ICD-10-AM (Australian Modification), and PSI Version 3.0a has been independently translated into ICD-10-GM (German Modification). We converted these two country-specific coding algorithms into ICD-10-WHO (World Health Organization version) and combined them to form one master list. Members of an international expert panel-including physicians, professional medical coders, disease classification specialists, health services researchers, epidemiologists, and users of the PSI-independently evaluated this master list and rated each code as either "include," "exclude," or "uncertain," following the AHRQ PSI definitions. After summarizing the independent rating results, we held a face-to-face meeting to discuss codes for which there was no unanimous consensus and newly proposed codes. A modified Delphi method was employed to generate a final ICD-10 WHO coding list. Results: Of 20 PSIs, 15 that were based mainly on diagnosis codes were selected for translation. At the meeting, panelists discussed 794 codes for which consensus had not been achieved and 2,541 additional codes that were proposed by individual panelists for consideration prior to the meeting. Three documents were generated: a PSI ICD-10-WHO version-coding list, a list of issues for consideration on certain AHRQ PSIs and ICD-9-CM codes, and a recommendation to WHO to improve specification of some disease classifications. Conclusion: An ICD-10-WHO PSI coding list has been developed and structured in a manner similar to the AHRQ manual. Although face validity of the list has been ensured through a rigorous expert panel assessment, its true validity and applicability should be assessed internationally.
Resumo:
High-resolution tomographic imaging of the shallow subsurface is becoming increasingly important for a wide range of environmental, hydrological and engineering applications. Because of their superior resolution power, their sensitivity to pertinent petrophysical parameters, and their far reaching complementarities, both seismic and georadar crosshole imaging are of particular importance. To date, corresponding approaches have largely relied on asymptotic, ray-based approaches, which only account for a very small part of the observed wavefields, inherently suffer from a limited resolution, and in complex environments may prove to be inadequate. These problems can potentially be alleviated through waveform inversion. We have developed an acoustic waveform inversion approach for crosshole seismic data whose kernel is based on a finite-difference time-domain (FDTD) solution of the 2-D acoustic wave equations. This algorithm is tested on and applied to synthetic data from seismic velocity models of increasing complexity and realism and the results are compared to those obtained using state-of-the-art ray-based traveltime tomography. Regardless of the heterogeneity of the underlying models, the waveform inversion approach has the potential of reliably resolving both the geometry and the acoustic properties of features of the size of less than half a dominant wavelength. Our results do, however, also indicate that, within their inherent resolution limits, ray-based approaches provide an effective and efficient means to obtain satisfactory tomographic reconstructions of the seismic velocity structure in the presence of mild to moderate heterogeneity and in absence of strong scattering. Conversely, the excess effort of waveform inversion provides the greatest benefits for the most heterogeneous, and arguably most realistic, environments where multiple scattering effects tend to be prevalent and ray-based methods lose most of their effectiveness.
Resumo:
BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Resumo:
PURPOSE: Most RB1 mutations are unique and distributed throughout the RB1 gene. Their detection can be time-consuming and the yield especially low in cases of conservatively-treated sporadic unilateral retinoblastoma (Rb) patients. In order to identify patients with true risk of developing Rb, and to reduce the number of unnecessary examinations under anesthesia in all other cases, we developed a universal sensitive, efficient and cost-effective strategy based on intragenic haplotype analysis. METHODS: This algorithm allows the calculation of the a posteriori risk of developing Rb and takes into account (a) RB1 loss of heterozygosity in tumors, (b) preferential paternal origin of new germline mutations, (c) a priori risk derived from empirical data by Vogel, and (d) disease penetrance of 90% in most cases. We report the occurrence of Rb in first degree relatives of patients with sporadic Rb who visited the Jules Gonin Eye Hospital, Lausanne, Switzerland, from January 1994 to December 2006 compared to expected new cases of Rb using our algorithm. RESULTS: A total of 134 families with sporadic Rb were enrolled; testing was performed in 570 individuals and 99 patients younger than 4 years old were identified. We observed one new case of Rb. Using our algorithm, the cumulated total a posteriori risk of recurrence was 1.77. CONCLUSIONS: This is the first time that linkage analysis has been validated to monitor the risk of recurrence in sporadic Rb. This should be a useful tool in genetic counseling, especially when direct RB1 screening for mutations leaves a negative result or is unavailable.
Resumo:
Abstract This thesis proposes a set of adaptive broadcast solutions and an adaptive data replication solution to support the deployment of P2P applications. P2P applications are an emerging type of distributed applications that are running on top of P2P networks. Typical P2P applications are video streaming, file sharing, etc. While interesting because they are fully distributed, P2P applications suffer from several deployment problems, due to the nature of the environment on which they perform. Indeed, defining an application on top of a P2P network often means defining an application where peers contribute resources in exchange for their ability to use the P2P application. For example, in P2P file sharing application, while the user is downloading some file, the P2P application is in parallel serving that file to other users. Such peers could have limited hardware resources, e.g., CPU, bandwidth and memory or the end-user could decide to limit the resources it dedicates to the P2P application a priori. In addition, a P2P network is typically emerged into an unreliable environment, where communication links and processes are subject to message losses and crashes, respectively. To support P2P applications, this thesis proposes a set of services that address some underlying constraints related to the nature of P2P networks. The proposed services include a set of adaptive broadcast solutions and an adaptive data replication solution that can be used as the basis of several P2P applications. Our data replication solution permits to increase availability and to reduce the communication overhead. The broadcast solutions aim, at providing a communication substrate encapsulating one of the key communication paradigms used by P2P applications: broadcast. Our broadcast solutions typically aim at offering reliability and scalability to some upper layer, be it an end-to-end P2P application or another system-level layer, such as a data replication layer. Our contributions are organized in a protocol stack made of three layers. In each layer, we propose a set of adaptive protocols that address specific constraints imposed by the environment. Each protocol is evaluated through a set of simulations. The adaptiveness aspect of our solutions relies on the fact that they take into account the constraints of the underlying system in a proactive manner. To model these constraints, we define an environment approximation algorithm allowing us to obtain an approximated view about the system or part of it. This approximated view includes the topology and the components reliability expressed in probabilistic terms. To adapt to the underlying system constraints, the proposed broadcast solutions route messages through tree overlays permitting to maximize the broadcast reliability. Here, the broadcast reliability is expressed as a function of the selected paths reliability and of the use of available resources. These resources are modeled in terms of quotas of messages translating the receiving and sending capacities at each node. To allow a deployment in a large-scale system, we take into account the available memory at processes by limiting the view they have to maintain about the system. Using this partial view, we propose three scalable broadcast algorithms, which are based on a propagation overlay that tends to the global tree overlay and adapts to some constraints of the underlying system. At a higher level, this thesis also proposes a data replication solution that is adaptive both in terms of replica placement and in terms of request routing. At the routing level, this solution takes the unreliability of the environment into account, in order to maximize reliable delivery of requests. At the replica placement level, the dynamically changing origin and frequency of read/write requests are analyzed, in order to define a set of replica that minimizes communication cost.
Resumo:
INTRODUCTION: Optimal identification of subtle cognitive impairment in the primary care setting requires a very brief tool combining (a) patients' subjective impairments, (b) cognitive testing, and (c) information from informants. The present study developed a new, very quick and easily administered case-finding tool combining these assessments ('BrainCheck') and tested the feasibility and validity of this instrument in two independent studies. METHODS: We developed a case-finding tool comprised of patient-directed (a) questions about memory and depression and (b) clock drawing, and (c) the informant-directed 7-item version of the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE). Feasibility study: 52 general practitioners rated the feasibility and acceptance of the patient-directed tool. Validation study: An independent group of 288 Memory Clinic patients (mean ± SD age = 76.6 ± 7.9, education = 12.0 ± 2.6; 53.8% female) with diagnoses of mild cognitive impairment (n = 80), probable Alzheimer's disease (n = 185), or major depression (n = 23) and 126 demographically matched, cognitively healthy volunteer participants (age = 75.2 ± 8.8, education = 12.5 ± 2.7; 40% female) partook. All patient and healthy control participants were administered the patient-directed tool, and informants of 113 patient and 70 healthy control participants completed the very short IQCODE. RESULTS: Feasibility study: General practitioners rated the patient-directed tool as highly feasible and acceptable. Validation study: A Classification and Regression Tree analysis generated an algorithm to categorize patient-directed data which resulted in a correct classification rate (CCR) of 81.2% (sensitivity = 83.0%, specificity = 79.4%). Critically, the CCR of the combined patient- and informant-directed instruments (BrainCheck) reached nearly 90% (that is 89.4%; sensitivity = 97.4%, specificity = 81.6%). CONCLUSION: A new and very brief instrument for general practitioners, 'BrainCheck', combined three sources of information deemed critical for effective case-finding (that is, patients' subject impairments, cognitive testing, informant information) and resulted in a nearly 90% CCR. Thus, it provides a very efficient and valid tool to aid general practitioners in deciding whether patients with suspected cognitive impairments should be further evaluated or not ('watchful waiting').