Biblioteca Digital

877 resultados para Graph Algorithms

New Algorithms for M-Estimation of Multivariate Scatter and Location

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present new algorithms for M-estimators of multivariate scatter and location and for symmetrized M-estimators of multivariate scatter. The new algorithms are considerably faster than currently used fixed-point and related algorithms. The main idea is to utilize a second order Taylor expansion of the target functional and to devise a partial Newton-Raphson procedure. In connection with symmetrized M-estimators we work with incomplete U-statistics to accelerate our procedures initially.

Apparatus vs. Graph: New Models and Interfaces for Text

Relevância:

20.00% 20.00%

Publicador:

MASCG: Multi-Atlas Segmentation Constrained Graph method for accurate segmentation of hip CT images

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the issue of fully automatic segmentation of a hip CT image with the goal to preserve the joint structure for clinical applications in hip disease diagnosis and treatment. For this purpose, we propose a Multi-Atlas Segmentation Constrained Graph (MASCG) method. The MASCG method uses multi-atlas based mesh fusion results to initialize a bone sheetness based multi-label graph cut for an accurate hip CT segmentation which has the inherent advantage of automatic separation of the pelvic region from the bilateral proximal femoral regions. We then introduce a graph cut constrained graph search algorithm to further improve the segmentation accuracy around the bilateral hip joint regions. Taking manual segmentation as the ground truth, we evaluated the present approach on 30 hip CT images (60 hips) with a 15-fold cross validation. When the present approach was compared to manual segmentation, an average surface distance error of 0.30 mm, 0.29 mm, and 0.30 mm was found for the pelvis, the left proximal femur, and the right proximal femur, respectively. A further look at the bilateral hip joint regions demonstrated an average surface distance error of 0.16 mm, 0.21 mm and 0.20 mm for the acetabulum, the left femoral head, and the right femoral head, respectively.

Shallow Dialogue Processing Using Machine Learning Algorithms (or not)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.

False normal Lung Clearance Index in infants with cystic fibrosis due to software algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND Lung clearance index (LCI), a marker of ventilation inhomogeneity, is elevated early in children with cystic fibrosis (CF). However, in infants with CF, LCI values are found to be normal, although structural lung abnormalities are often detectable. We hypothesized that this discrepancy is due to inadequate algorithms of the available software package. AIM Our aim was to challenge the validity of these software algorithms. METHODS We compared multiple breath washout (MBW) results of current software algorithms (automatic modus) to refined algorithms (manual modus) in 17 asymptomatic infants with CF, and 24 matched healthy term-born infants. The main difference between these two analysis methods lies in the calculation of the molar mass differences that the system uses to define the completion of the measurement. RESULTS In infants with CF the refined manual modus revealed clearly elevated LCI above 9 in 8 out of 35 measurements (23%), all showing LCI values below 8.3 using the automatic modus (paired t-test comparing the means, P < 0.001). Healthy infants showed normal LCI values using both analysis methods (n = 47, paired t-test, P = 0.79). The most relevant reason for false normal LCI values in infants with CF using the automatic modus was the incorrect recognition of the end-of-test too early during the washout. CONCLUSION We recommend the use of the manual modus for the analysis of MBW outcomes in infants in order to obtain more accurate results. This will allow appropriate use of infant lung function results for clinical and scientific purposes.

Decreasing Proportion of Recent Infections among Newly Diagnosed HIV-1 Cases in Switzerland, 2008 to 2013 Based on Line-Immunoassay-Based Algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: HIV surveillance requires monitoring of new HIV diagnoses and differentiation of incident and older infections. In 2008, Switzerland implemented a system for monitoring incident HIV infections based on the results of a line immunoassay (Inno-Lia) mandatorily conducted for HIV confirmation and type differentiation (HIV-1, HIV-2) of all newly diagnosed patients. Based on this system, we assessed the proportion of incident HIV infection among newly diagnosed cases in Switzerland during 2008-2013. METHODS AND RESULTS: Inno-Lia antibody reaction patterns recorded in anonymous HIV notifications to the federal health authority were classified by 10 published algorithms into incident (up to 12 months) or older infections. Utilizing these data, annual incident infection estimates were obtained in two ways, (i) based on the diagnostic performance of the algorithms and utilizing the relationship 'incident = true incident + false incident', (ii) based on the window-periods of the algorithms and utilizing the relationship 'Prevalence = Incidence x Duration'. From 2008-2013, 3'851 HIV notifications were received. Adult HIV-1 infections amounted to 3'809 cases, and 3'636 of them (95.5%) contained Inno-Lia data. Incident infection totals calculated were similar for the performance- and window-based methods, amounting on average to 1'755 (95% confidence interval, 1588-1923) and 1'790 cases (95% CI, 1679-1900), respectively. More than half of these were among men who had sex with men. Both methods showed a continuous decline of annual incident infections 2008-2013, totaling -59.5% and -50.2%, respectively. The decline of incident infections continued even in 2012, when a 15% increase in HIV notifications had been observed. This increase was entirely due to older infections. Overall declines 2008-2013 were of similar extent among the major transmission groups. CONCLUSIONS: Inno-Lia based incident HIV-1 infection surveillance proved useful and reliable. It represents a free, additional public health benefit of the use of this relatively costly test for HIV confirmation and type differentiation.

PSHARE: Stata module to compute and graph percentile shares

Relevância:

20.00% 20.00%

Publicador:

Resumo:

-pshare- computes and graphs percentile shares from individual level data. Percentile shares are often used in inequality research to study the distribution of income or wealth. They are defined as differences between Lorenz ordinates of the outcome variable. Technically, the observations are sorted in increasing order of the outcome variable and the specified percentiles are computed from the running sum of the outcomes. Percentile shares are then computed as differences between percentiles, divided by total outcome. pshare requires moremata to be installed on the system; see ssc describe moremata.

New algorithms for automated phylogenetic reconstruction using artificial intelligence and data mining techniques

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^

Outcomes and cost effectiveness of treating type 2 diabetes with a nurse case manager following treatment algorithms versus primary care physicians

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background. Diabetes places a significant burden on the health care system. Reduction in blood glucose levels (HbA1c) reduces the risk of complications; however, little is known about the impact of disease management programs on medical costs for patients with diabetes. In 2001, economic costs associated with diabetes totaled $100 billion, and indirect costs totaled $54 billion. ^ Objective. To compare outcomes of nurse case management by treatment algorithms with conventional primary care for glycemic control and cardiovascular risk factors in type 2 diabetic patients in a low-income Mexican American community-based setting, and to compare the cost effectiveness of the two programs. Patient compliance was also assessed. ^ Research design and methods. An observational group-comparison to evaluate a treatment intervention for type 2 diabetes management was implemented at three out-patient health facilities in San Antonio, Texas. All eligible type 2 diabetic patients attending the clinics during 1994–1996 became part of the study. Data were obtained from the study database, medical records, hospital accounting, and pharmacy cost lists, and entered into a computerized database. Three groups were compared: a Community Clinic Nurse Case Manager (CC-TA) following treatment algorithms, a University Clinic Nurse Case Manager (UC-TA) following treatment algorithms, and Primary Care Physicians (PCP) following conventional care practices at a Family Practice Clinic. The algorithms provided a disease management model specifically for hyperglycemia, dyslipidemia, hypertension, and microalbuminuria that progressively moved the patient toward ideal goals through adjustments in medication, self-monitoring of blood glucose, meal planning, and reinforcement of diet and exercise. Cost effectiveness of hemoglobin AI, final endpoints was compared. ^ Results. There were 358 patients analyzed: 106 patients in CC-TA, 170 patients in UC-TA, and 82 patients in PCP groups. Change in hemoglobin A1c (HbA1c) was the primary outcome measured. HbA1c results were presented at baseline, 6 and 12 months for CC-TA (10.4%, 7.1%, 7.3%), UC-TA (10.5%, 7.1%, 7.2%), and PCP (10.0%, 8.5%, 8.7%). Mean patient compliance was 81%. Levels of cost effectiveness were significantly different between clinics. ^ Conclusion. Nurse case management with treatment algorithms significantly improved glycemic control in patients with type 2 diabetes, and was more cost effective. ^

Fast Algorithms Using Minimal Data Structures for Common Topological Relationships in Large, Irregularly-spaced Topographic Data Sets

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digital terrain models (DTM) typically contain large numbers of postings, from hundreds of thousands to billions. Many algorithms that run on DTMs require topological knowledge of the postings, such as finding nearest neighbors, finding the posting closest to a chosen location, etc. If the postings are arranged irregu- larly, topological information is costly to compute and to store. This paper offers a practical approach to organizing and searching irregularly-space data sets by presenting a collection of efficient algorithms (O(N),O(lgN)) that compute important topological relationships with only a simple supporting data structure. These relationships include finding the postings within a window, locating the posting nearest a point of interest, finding the neighborhood of postings nearest a point of interest, and ordering the neighborhood counter-clockwise. These algorithms depend only on two sorted arrays of two-element tuples, holding a planimetric coordinate and an integer identification number indicating which posting the coordinate belongs to. There is one array for each planimetric coordinate (eastings and northings). These two arrays cost minimal overhead to create and store but permit the data to remain arranged irregularly.

TIME SERIES ANALYSIS AS INPUT FOR PREDICTIVE MODELING: PREDICTING CARDIAC ARREST IN A PEDIATRIC INTENSIVE CARE UNIT

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

A benchmark dataset for the validation of MERIS and MODIS ocean colour turbidity and PAR attenuation algorithms using autonomous buoy data

Relevância:

20.00% 20.00%

Publicador:

Time series of ZOOSCAN images and results of zooplankton samples at Villefranche from 1966 to 2003

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ZooScan with ZooProcess and Plankton Identifier (PkID) software is an integrated analysis system for acquisition and classification of digital zooplankton images from preserved zooplankton samples. Zooplankton samples are digitized by the ZooScan and processed by ZooProcess and PkID in order to detect, enumerate, measure and classify the digitized objects. Here we present a semi-automatic approach that entails automated classification of images followed by manual validation, which allows rapid and accurate classification of zooplankton and abiotic objects. We demonstrate this approach with a biweekly zooplankton time series from the Bay of Villefranche-sur-mer, France. The classification approach proposed here provides a practical compromise between a fully automatic method with varying degrees of bias and a manual but accurate classification of zooplankton. We also evaluate the appropriate number of images to include in digital learning sets and compare the accuracy of six classification algorithms. We evaluate the accuracy of the ZooScan for automated measurements of body size and present relationships between machine measures of size and C and N content of selected zooplankton taxa. We demonstrate that the ZooScan system can produce useful measures of zooplankton abundance, biomass and size spectra, for a variety of ecological studies.

The CoastColour Round Robin datasets: a database to evaluate the performance of algorithms for the retrieval of water quality parameters in coastal waters

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The CoastColour project Round Robin (CCRR) project (http://www.coastcolour.org) funded by the European Space Agency (ESA) was designed to bring together a variety of reference datasets and to use these to test algorithms and assess their accuracy for retrieving water quality parameters. This information was then developed to help end-users of remote sensing products to select the most accurate algorithms for their coastal region. To facilitate this, an inter-comparison of the performance of algorithms for the retrieval of in-water properties over coastal waters was carried out. The comparison used three types of datasets on which ocean colour algorithms were tested. The description and comparison of the three datasets are the focus of this paper, and include the Medium Resolution Imaging Spectrometer (MERIS) Level 2 match-ups, in situ reflectance measurements and data generated by a radiative transfer model (HydroLight). The datasets mainly consisted of 6,484 marine reflectance associated with various geometrical (sensor viewing and solar angles) and sky conditions and water constituents: Total Suspended Matter (TSM) and Chlorophyll-a (CHL) concentrations, and the absorption of Coloured Dissolved Organic Matter (CDOM). Inherent optical properties were also provided in the simulated datasets (5,000 simulations) and from 3,054 match-up locations. The distributions of reflectance at selected MERIS bands and band ratios, CHL and TSM as a function of reflectance, from the three datasets are compared. Match-up and in situ sites where deviations occur are identified. The distribution of the three reflectance datasets are also compared to the simulated and in situ reflectances used previously by the International Ocean Colour Coordinating Group (IOCCG, 2006) for algorithm testing, showing a clear extension of the CCRR data which covers more turbid waters.

Building the View Graph of a Category by Exploiting Image Realism

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a weakly supervised method to arrange images of a given category based on the relative pose between the camera and the object in the scene. Relative poses are points on a sphere centered at the object in a given canonical pose, which we call object viewpoints. Our method builds a graph on this sphere by assigning images with similar viewpoint to the same node and by connecting nodes if they are related by a small rotation. The key idea is to exploit a large unlabeled dataset to validate the likelihood of dominant 3D planes of the object geometry. A number of 3D plane hypotheses are evaluated by applying small 3D rotations to each hypothesis and by measuring how well the deformed images match other images in the dataset. Correct hypotheses will result in deformed images that correspond to plausible views of the object, and thus will likely match well other images in the same category. The identified 3D planes are then used to compute affinities between images related by a change of viewpoint. We then use the affinities to build a view graph via a greedy method and the maximum spanning tree.

«
1
2
...
51
52
53
54
55
56
57
58
59
»