Biblioteca Digital

880 resultados para Weak Greedy Algorithms

New Algorithms for M-Estimation of Multivariate Scatter and Location

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present new algorithms for M-estimators of multivariate scatter and location and for symmetrized M-estimators of multivariate scatter. The new algorithms are considerably faster than currently used fixed-point and related algorithms. The main idea is to utilize a second order Taylor expansion of the target functional and to devise a partial Newton-Raphson procedure. In connection with symmetrized M-estimators we work with incomplete U-statistics to accelerate our procedures initially.

Plasma environment of a weak comet – Predictions for Comet 67P/Churyumov–Gerasimenko from multifluid-MHD and Hybrid models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The interaction of a comet with the solar wind undergoes various stages as the comet’s activity varies along its orbit. For a comet like 67P/Churyumov–Gerasimenko, the target comet of ESA’s Rosetta mission, the various features include the formation of a Mach cone, the bow shock, and close to perihelion even a diamagnetic cavity. There are different approaches to simulate this complex interplay between the solar wind and the comet’s extended neutral gas coma which include magnetohydrodynamics (MHD) and hybrid-type models. The first treats the plasma as fluids (one fluid in basic single fluid MHD) and the latter treats the ions as individual particles under the influence of the local electric and magnetic fields. The electrons are treated as a charge-neutralizing fluid in both cases. Given the different approaches both models yield different results, in particular for a low production rate comet. In this paper we will show that these differences can be reduced when using a multifluid instead of a single-fluid MHD model and increase the resolution of the Hybrid model. We will show that some major features obtained with a hybrid type approach like the gyration of the cometary heavy ions and the formation of the Mach cone can be partially reproduced with the multifluid-type model.

Shallow Dialogue Processing Using Machine Learning Algorithms (or not)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.

An iterated greedy heuristic for the 1/N portfolio tracking problem

Relevância:

20.00% 20.00%

Publicador:

An iterated greedy heuristic for the 1/N portfolio tracking problem

Relevância:

20.00% 20.00%

Publicador:

False normal Lung Clearance Index in infants with cystic fibrosis due to software algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND Lung clearance index (LCI), a marker of ventilation inhomogeneity, is elevated early in children with cystic fibrosis (CF). However, in infants with CF, LCI values are found to be normal, although structural lung abnormalities are often detectable. We hypothesized that this discrepancy is due to inadequate algorithms of the available software package. AIM Our aim was to challenge the validity of these software algorithms. METHODS We compared multiple breath washout (MBW) results of current software algorithms (automatic modus) to refined algorithms (manual modus) in 17 asymptomatic infants with CF, and 24 matched healthy term-born infants. The main difference between these two analysis methods lies in the calculation of the molar mass differences that the system uses to define the completion of the measurement. RESULTS In infants with CF the refined manual modus revealed clearly elevated LCI above 9 in 8 out of 35 measurements (23%), all showing LCI values below 8.3 using the automatic modus (paired t-test comparing the means, P < 0.001). Healthy infants showed normal LCI values using both analysis methods (n = 47, paired t-test, P = 0.79). The most relevant reason for false normal LCI values in infants with CF using the automatic modus was the incorrect recognition of the end-of-test too early during the washout. CONCLUSION We recommend the use of the manual modus for the analysis of MBW outcomes in infants in order to obtain more accurate results. This will allow appropriate use of infant lung function results for clinical and scientific purposes.

Decreasing Proportion of Recent Infections among Newly Diagnosed HIV-1 Cases in Switzerland, 2008 to 2013 Based on Line-Immunoassay-Based Algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: HIV surveillance requires monitoring of new HIV diagnoses and differentiation of incident and older infections. In 2008, Switzerland implemented a system for monitoring incident HIV infections based on the results of a line immunoassay (Inno-Lia) mandatorily conducted for HIV confirmation and type differentiation (HIV-1, HIV-2) of all newly diagnosed patients. Based on this system, we assessed the proportion of incident HIV infection among newly diagnosed cases in Switzerland during 2008-2013. METHODS AND RESULTS: Inno-Lia antibody reaction patterns recorded in anonymous HIV notifications to the federal health authority were classified by 10 published algorithms into incident (up to 12 months) or older infections. Utilizing these data, annual incident infection estimates were obtained in two ways, (i) based on the diagnostic performance of the algorithms and utilizing the relationship 'incident = true incident + false incident', (ii) based on the window-periods of the algorithms and utilizing the relationship 'Prevalence = Incidence x Duration'. From 2008-2013, 3'851 HIV notifications were received. Adult HIV-1 infections amounted to 3'809 cases, and 3'636 of them (95.5%) contained Inno-Lia data. Incident infection totals calculated were similar for the performance- and window-based methods, amounting on average to 1'755 (95% confidence interval, 1588-1923) and 1'790 cases (95% CI, 1679-1900), respectively. More than half of these were among men who had sex with men. Both methods showed a continuous decline of annual incident infections 2008-2013, totaling -59.5% and -50.2%, respectively. The decline of incident infections continued even in 2012, when a 15% increase in HIV notifications had been observed. This increase was entirely due to older infections. Overall declines 2008-2013 were of similar extent among the major transmission groups. CONCLUSIONS: Inno-Lia based incident HIV-1 infection surveillance proved useful and reliable. It represents a free, additional public health benefit of the use of this relatively costly test for HIV confirmation and type differentiation.

Phenotypic plasticity is a negative, though weak, predictor of the commonness of 105 grassland species

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aim The usual hypothesis about the relationship between niche breadth and range size posits that species with the capacity to use a wider range of resources or to tolerate a greater range of environmental conditions should be more widespread. In plants, broader niches are often hypothesized to be due to pronounced phenotypic plasticity, and more plastic species are therefore predicted to be more common. We examined the relationship between the magnitude of phenotypic plasticity in five functional traits, mainly related to leaves, and several measures of abundance in 105 Central European grassland species. We further tested whether mean values of traits, rather than their plasticity, better explain the commonness of species, possibly because they are pre-adapted to exploiting the most common resources. Location Central Europe. Methods In a multispecies experiment with 105 species we measured leaf thickness, leaf greenness, specific leaf area, leaf dry matter content and plant height, and the plasticity of these traits in response to fertilization, waterlogging and shading. For the same species we also obtained five measures of commonness, ranging from plot-level abundance to range size in Europe. We then examined whether these measures of commonness were associated with the magnitude of phenotypic plasticity, expressed as composite plasticity of all traits across the experimental treatments. We further estimated the relative importance of trait plasticity and trait means for abundance and geographical range size. Results More abundant species were less plastic. This negative relationship was fairly consistent across several spatial scales of commonness, but it was weak. Indeed, compared with trait means, plasticity was relatively unimportant for explaining differences in species commonness. Main conclusions Our results do not indicate that larger phenotypic plasticity of leaf morphological traits enhances species abundance. Furthermore, possession of a particular trait value, rather than of trait plasticity, is a more important determinant of species commonness.

New algorithms for automated phylogenetic reconstruction using artificial intelligence and data mining techniques

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^

Corruption and Growth Under Weak Identification

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of this paper is to revisit the influential work of Mauro [1995] focusing on the strength of his results under weak identification. He finds a negative impact of corruption on investment and economic growth that appears to be robust to endogeneity when using two-stage least squares (2SLS). Since the inception of Mauro [1995], much literature has focused on 2SLS methods revealing the dangers of estimation and thus inference under weak identification. We reproduce the original results of Mauro [1995] with a high level of confidence and show that the instrument used in the original work is in fact 'weak' as defined by Staiger and Stock [1997]. Thus we update the analysis using a test statistic robust to weak instruments. Our results suggest that under Mauro's original model there is a high probability that the parameters of interest are locally almost unidentified in multivariate specifications. To address this problem, we also investigate other instruments commonly used in the corruption literature and obtain similar results.

Outcomes and cost effectiveness of treating type 2 diabetes with a nurse case manager following treatment algorithms versus primary care physicians

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background. Diabetes places a significant burden on the health care system. Reduction in blood glucose levels (HbA1c) reduces the risk of complications; however, little is known about the impact of disease management programs on medical costs for patients with diabetes. In 2001, economic costs associated with diabetes totaled $100 billion, and indirect costs totaled $54 billion. ^ Objective. To compare outcomes of nurse case management by treatment algorithms with conventional primary care for glycemic control and cardiovascular risk factors in type 2 diabetic patients in a low-income Mexican American community-based setting, and to compare the cost effectiveness of the two programs. Patient compliance was also assessed. ^ Research design and methods. An observational group-comparison to evaluate a treatment intervention for type 2 diabetes management was implemented at three out-patient health facilities in San Antonio, Texas. All eligible type 2 diabetic patients attending the clinics during 1994–1996 became part of the study. Data were obtained from the study database, medical records, hospital accounting, and pharmacy cost lists, and entered into a computerized database. Three groups were compared: a Community Clinic Nurse Case Manager (CC-TA) following treatment algorithms, a University Clinic Nurse Case Manager (UC-TA) following treatment algorithms, and Primary Care Physicians (PCP) following conventional care practices at a Family Practice Clinic. The algorithms provided a disease management model specifically for hyperglycemia, dyslipidemia, hypertension, and microalbuminuria that progressively moved the patient toward ideal goals through adjustments in medication, self-monitoring of blood glucose, meal planning, and reinforcement of diet and exercise. Cost effectiveness of hemoglobin AI, final endpoints was compared. ^ Results. There were 358 patients analyzed: 106 patients in CC-TA, 170 patients in UC-TA, and 82 patients in PCP groups. Change in hemoglobin A1c (HbA1c) was the primary outcome measured. HbA1c results were presented at baseline, 6 and 12 months for CC-TA (10.4%, 7.1%, 7.3%), UC-TA (10.5%, 7.1%, 7.2%), and PCP (10.0%, 8.5%, 8.7%). Mean patient compliance was 81%. Levels of cost effectiveness were significantly different between clinics. ^ Conclusion. Nurse case management with treatment algorithms significantly improved glycemic control in patients with type 2 diabetes, and was more cost effective. ^

A Simple Statistical Test of Violation of the Weak Axiom of Cost Minimization

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A problem with a practical application of Varian.s Weak Axiom of Cost Minimization is that an observed violation may be due to random variation in the output quantities produced by firms rather than due to inefficiency on the part of the firm. In this paper, unlike in Varian (1985), the output rather than the input quantities are treated as random and an alternative statistical test of the violation of WACM is proposed. We assume that there is no technical inefficiency and provide a test of the hypothesis that an observed violation of WACM is merely due to random variations in the output levels of the firms being compared.. We suggest an intuitive approach for specifying a value of the variance of the noise term that is needed for the test. The paper includes an illustrative example utilizing a data set relating to a number of U.S. airlines.

Fast Algorithms Using Minimal Data Structures for Common Topological Relationships in Large, Irregularly-spaced Topographic Data Sets

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digital terrain models (DTM) typically contain large numbers of postings, from hundreds of thousands to billions. Many algorithms that run on DTMs require topological knowledge of the postings, such as finding nearest neighbors, finding the posting closest to a chosen location, etc. If the postings are arranged irregu- larly, topological information is costly to compute and to store. This paper offers a practical approach to organizing and searching irregularly-space data sets by presenting a collection of efficient algorithms (O(N),O(lgN)) that compute important topological relationships with only a simple supporting data structure. These relationships include finding the postings within a window, locating the posting nearest a point of interest, finding the neighborhood of postings nearest a point of interest, and ordering the neighborhood counter-clockwise. These algorithms depend only on two sorted arrays of two-element tuples, holding a planimetric coordinate and an integer identification number indicating which posting the coordinate belongs to. There is one array for each planimetric coordinate (eastings and northings). These two arrays cost minimal overhead to create and store but permit the data to remain arranged irregularly.

An empirical evaluation of the Random Forests classifier models for variable selection in a large-scale lung cancer case-control study

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

The information bottleneck method for genome-wide association studies

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

«
1
2
...
51
52
53
54
55
56
57
58
59
»