845 resultados para Sign Data LMS algorithm.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
In this paper, we propose a Loss Tolerant Reliable (LTR) data transport mechanism for dynamic Event Sensing (LTRES) in WSNs. In LTRES, a reliable event sensing requirement at the transport layer is dynamically determined by the sink. A distributed source rate adaptation mechanism is designed, incorporating a loss rate based lightweight congestion control mechanism, to regulate the data traffic injected into the network so that the reliability requirement can be satisfied. An equation based fair rate control algorithm is used to improve the fairness among the LTRES flows sharing the congestion path. The performance evaluations show that LTRES can provide LTR data transport service for multiple events with short convergence time, low lost rate and high overall bandwidth utilization.
Resumo:
Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Many conventional active learning algorithms focus on refining the decision boundary, at the expense of exploring new regions that the current hypothesis misclassifies. We propose a new active learning algorithm that balances such exploration with refining of the decision boundary by dynamically adjusting the probability to explore at each step. Our experimental results demonstrate improved performance on data sets that require extensive exploration while remaining competitive on data sets that do not. Our algorithm also shows significant tolerance of noise.
Resumo:
Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.
Resumo:
Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
Resumo:
Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Resumo:
Background/purpose: Gallstones and cholelithiasis are being increasingly diagnosed in children owing to the widespread use of ultrasonography. The treatment of choice is cholecystectomy, and routine intraoperative cholangiography is recommended to explore the common bile duct. The objectives of this study were to describe our experience with the management of gallstone disease in childhood over the last 18 years and to propose an algorithm to guide the approach to cholelithiasis in children based on clinical and ultrasonographic findings. Methods: The data for this study were obtained by reviewing the records of all patients with gallstone disease treated between January 1994 and October 2011. The patients were divided into the following 5 groups based on their symptoms: group 1, asymptomatic; group 2, nonbiliary obstructive symptoms; group 3, acute cholecystitis symptoms; group 4, a history of biliary obstructive symptoms that were completely resolved by the time of surgery; and group 5, ongoing biliary obstructive symptoms. Patients were treated according to an algorithm based on their clinical, ultrasonographic, and endoscopic retrograde cholangiopancreatography (ERCP) findings. Results: A total of 223 patients were diagnosed with cholelithiasis, and comorbidities were present in 177 patients (79.3%). The most common comorbidities were hemolytic disorders in 139 patients (62.3%) and previous bariatric surgery in 16 (7.1%). Although symptoms were present in 134 patients (60.0%), cholecystectomy was performed for all patients with cholelithiasis, even if they were asymptomatic; the surgery was laparoscopic in 204 patients and open in 19. Fifty-six patients (25.1%) presented with complications as the first sign of cholelithiasis (eg, pancreatitis, choledocolithiasis, or acute calculous cholecystitis). Intraoperative cholangiography was indicated in 15 children, and it was positive in only 1 (0.4%) for whom ERCP was necessary to extract the stone after a laparoscopic cholecystectomy (LC). Preoperative ERCP was performed in 11 patients to extract the stones, and a hepaticojejunostomy was indicated in 2 patients. There were no injuries to the hepatic artery or common bile duct in our series. Conclusions: Based on our experience, we can propose an algorithm to guide the approach to cholelithiasis in the pediatric population. The final conclusion is that LC results in limited postoperative complications in children with gallstones. When a diagnosis of choledocolithiasis or dilation of the choledocus is made, ERCP is necessary if obstructive symptoms persist either before or after an LC. Intraoperative cholangiography and laparoscopic common bile duct exploration are not mandatory. Published by Elsevier Inc.
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
In this paper, we propose nonlinear elliptical models for correlated data with heteroscedastic and/or autoregressive structures. Our aim is to extend the models proposed by Russo et al. [22] by considering a more sophisticated scale structure to deal with variations in data dispersion and/or a possible autocorrelation among measurements taken throughout the same experimental unit. Moreover, to avoid the possible influence of outlying observations or to take into account the non-normal symmetric tails of the data, we assume elliptical contours for the joint distribution of random effects and errors, which allows us to attribute different weights to the observations. We propose an iterative algorithm to obtain the maximum-likelihood estimates for the parameters and derive the local influence curvatures for some specific perturbation schemes. The motivation for this work comes from a pharmacokinetic indomethacin data set, which was analysed previously by Bocheng and Xuping [1] under normality.
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
The main objective of this work is to present an efficient method for phasor estimation based on a compact Genetic Algorithm (cGA) implemented in Field Programmable Gate Array (FPGA). To validate the proposed method, an Electrical Power System (EPS) simulated by the Alternative Transients Program (ATP) provides data to be used by the cGA. This data is as close as possible to the actual data provided by the EPS. Real life situations such as islanding, sudden load increase and permanent faults were considered. The implementation aims to take advantage of the inherent parallelism in Genetic Algorithms in a compact and optimized way, making them an attractive option for practical applications in real-time estimations concerning Phasor Measurement Units (PMUs).
Resumo:
The quality of temperature and humidity retrievals from the infrared SEVIRI sensors on the geostationary Meteosat Second Generation (MSG) satellites is assessed by means of a one dimensional variational algorithm. The study is performed with the aim of improving the spatial and temporal resolution of available observations to feed analysis systems designed for high resolution regional scale numerical weather prediction (NWP) models. The non-hydrostatic forecast model COSMO (COnsortium for Small scale MOdelling) in the ARPA-SIM operational configuration is used to provide background fields. Only clear sky observations over sea are processed. An optimised 1D–VAR set-up comprising of the two water vapour and the three window channels is selected. It maximises the reduction of errors in the model backgrounds while ensuring ease of operational implementation through accurate bias correction procedures and correct radiative transfer simulations. The 1D–VAR retrieval quality is firstly quantified in relative terms employing statistics to estimate the reduction in the background model errors. Additionally the absolute retrieval accuracy is assessed comparing the analysis with independent radiosonde and satellite observations. The inclusion of satellite data brings a substantial reduction in the warm and dry biases present in the forecast model. Moreover it is shown that the retrieval profiles generated by the 1D–VAR are well correlated with the radiosonde measurements. Subsequently the 1D–VAR technique is applied to two three–dimensional case–studies: a false alarm case–study occurred in Friuli–Venezia–Giulia on the 8th of July 2004 and a heavy precipitation case occurred in Emilia–Romagna region between 9th and 12th of April 2005. The impact of satellite data for these two events is evaluated in terms of increments in the integrated water vapour and saturation water vapour over the column, in the 2 meters temperature and specific humidity and in the surface temperature. To improve the 1D–VAR technique a method to calculate flow–dependent model error covariance matrices is also assessed. The approach employs members from an ensemble forecast system generated by perturbing physical parameterisation schemes inside the model. The improved set–up applied to the case of 8th of July 2004 shows a substantial neutral impact.
Resumo:
The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.
Resumo:
By the end of the 19th century, geodesy has contributed greatly to the knowledge of regional tectonics and fault movement through its ability to measure, at sub-centimetre precision, the relative positions of points on the Earth’s surface. Nowadays the systematic analysis of geodetic measurements in active deformation regions represents therefore one of the most important tool in the study of crustal deformation over different temporal scales [e.g., Dixon, 1991]. This dissertation focuses on motion that can be observed geodetically with classical terrestrial position measurements, particularly triangulation and leveling observations. The work is divided into two sections: an overview of the principal methods for estimating longterm accumulation of elastic strain from terrestrial observations, and an overview of the principal methods for rigorously inverting surface coseismic deformation fields for source geometry with tests on synthetic deformation data sets and applications in two different tectonically active regions of the Italian peninsula. For the long-term accumulation of elastic strain analysis, triangulation data were available from a geodetic network across the Messina Straits area (southern Italy) for the period 1971 – 2004. From resulting angle changes, the shear strain rates as well as the orientation of the principal axes of the strain rate tensor were estimated. The computed average annual shear strain rates for the time period between 1971 and 2004 are γ˙1 = 113.89 ± 54.96 nanostrain/yr and γ˙2 = -23.38 ± 48.71 nanostrain/yr, with the orientation of the most extensional strain (θ) at N140.80° ± 19.55°E. These results suggests that the first-order strain field of the area is dominated by extension in the direction perpendicular to the trend of the Straits, sustaining the hypothesis that the Messina Straits could represents an area of active concentrated deformation. The orientation of θ agree well with GPS deformation estimates, calculated over shorter time interval, and is consistent with previous preliminary GPS estimates [D’Agostino and Selvaggi, 2004; Serpelloni et al., 2005] and is also similar to the direction of the 1908 (MW 7.1) earthquake slip vector [e.g., Boschi et al., 1989; Valensise and Pantosti, 1992; Pino et al., 2000; Amoruso et al., 2002]. Thus, the measured strain rate can be attributed to an active extension across the Messina Straits, corresponding to a relative extension rate ranges between < 1mm/yr and up to ~ 2 mm/yr, within the portion of the Straits covered by the triangulation network. These results are consistent with the hypothesis that the Messina Straits is an important active geological boundary between the Sicilian and the Calabrian domains and support previous preliminary GPS-based estimates of strain rates across the Straits, which show that the active deformation is distributed along a greater area. Finally, the preliminary dislocation modelling has shown that, although the current geodetic measurements do not resolve the geometry of the dislocation models, they solve well the rate of interseismic strain accumulation across the Messina Straits and give useful information about the locking the depth of the shear zone. Geodetic data, triangulation and leveling measurements of the 1976 Friuli (NE Italy) earthquake, were available for the inversion of coseismic source parameters. From observed angle and elevation changes, the source parameters of the seismic sequence were estimated in a join inversion using an algorithm called “simulated annealing”. The computed optimal uniform–slip elastic dislocation model consists of a 30° north-dipping shallow (depth 1.30 ± 0.75 km) fault plane with azimuth of 273° and accommodating reverse dextral slip of about 1.8 m. The hypocentral location and inferred fault plane of the main event are then consistent with the activation of Periadriatic overthrusts or other related thrust faults as the Gemona- Kobarid thrust. Then, the geodetic data set exclude the source solution of Aoudia et al. [2000], Peruzza et al. [2002] and Poli et al. [2002] that considers the Susans-Tricesimo thrust as the May 6 event. The best-fit source model is then more consistent with the solution of Pondrelli et al. [2001], which proposed the activation of other thrusts located more to the North of the Susans-Tricesimo thrust, probably on Periadriatic related thrust faults. The main characteristics of the leveling and triangulation data are then fit by the optimal single fault model, that is, these results are consistent with a first-order rupture process characterized by a progressive rupture of a single fault system. A single uniform-slip fault model seems to not reproduce some minor complexities of the observations, and some residual signals that are not modelled by the optimal single-fault plane solution, were observed. In fact, the single fault plane model does not reproduce some minor features of the leveling deformation field along the route 36 south of the main uplift peak, that is, a second fault seems to be necessary to reproduce these residual signals. By assuming movements along some mapped thrust located southward of the inferred optimal single-plane solution, the residual signal has been successfully modelled. In summary, the inversion results presented in this Thesis, are consistent with the activation of some Periadriatic related thrust for the main events of the sequence, and with a minor importance of the southward thrust systems of the middle Tagliamento plain.
Resumo:
Precipitation retrieval over high latitudes, particularly snowfall retrieval over ice and snow, using satellite-based passive microwave spectrometers, is currently an unsolved problem. The challenge results from the large variability of microwave emissivity spectra for snow and ice surfaces, which can mimic, to some degree, the spectral characteristics of snowfall. This work focuses on the investigation of a new snowfall detection algorithm specific for high latitude regions, based on a combination of active and passive sensors able to discriminate between snowing and non snowing areas. The space-borne Cloud Profiling Radar (on CloudSat), the Advanced Microwave Sensor units A and B (on NOAA-16) and the infrared spectrometer MODIS (on AQUA) have been co-located for 365 days, from October 1st 2006 to September 30th, 2007. CloudSat products have been used as truth to calibrate and validate all the proposed algorithms. The methodological approach followed can be summarised into two different steps. In a first step, an empirical search for a threshold, aimed at discriminating the case of no snow, was performed, following Kongoli et al. [2003]. This single-channel approach has not produced appropriate results, a more statistically sound approach was attempted. Two different techniques, which allow to compute the probability above and below a Brightness Temperature (BT) threshold, have been used on the available data. The first technique is based upon a Logistic Distribution to represent the probability of Snow given the predictors. The second technique, defined Bayesian Multivariate Binary Predictor (BMBP), is a fully Bayesian technique not requiring any hypothesis on the shape of the probabilistic model (such as for instance the Logistic), which only requires the estimation of the BT thresholds. The results obtained show that both methods proposed are able to discriminate snowing and non snowing condition over the Polar regions with a probability of correct detection larger than 0.5, highlighting the importance of a multispectral approach.