10 resultados para standard batch algorithms
em Duke University
Resumo:
With the popularization of GPS-enabled devices such as mobile phones, location data are becoming available at an unprecedented scale. The locations may be collected from many different sources such as vehicles moving around a city, user check-ins in social networks, and geo-tagged micro-blogging photos or messages. Besides the longitude and latitude, each location record may also have a timestamp and additional information such as the name of the location. Time-ordered sequences of these locations form trajectories, which together contain useful high-level information about people's movement patterns.
The first part of this thesis focuses on a few geometric problems motivated by the matching and clustering of trajectories. We first give a new algorithm for computing a matching between a pair of curves under existing models such as dynamic time warping (DTW). The algorithm is more efficient than standard dynamic programming algorithms both theoretically and practically. We then propose a new matching model for trajectories that avoids the drawbacks of existing models. For trajectory clustering, we present an algorithm that computes clusters of subtrajectories, which correspond to common movement patterns. We also consider trajectories of check-ins, and propose a statistical generative model, which identifies check-in clusters as well as the transition patterns between the clusters.
The second part of the thesis considers the problem of covering shortest paths in a road network, motivated by an EV charging station placement problem. More specifically, a subset of vertices in the road network are selected to place charging stations so that every shortest path contains enough charging stations and can be traveled by an EV without draining the battery. We first introduce a general technique for the geometric set cover problem. This technique leads to near-linear-time approximation algorithms, which are the state-of-the-art algorithms for this problem in either running time or approximation ratio. We then use this technique to develop a near-linear-time algorithm for this
shortest-path cover problem.
Resumo:
PURPOSE: Mammography is known to be one of the most difficult radiographic exams to interpret. Mammography has important limitations, including the superposition of normal tissue that can obscure a mass, chance alignment of normal tissue to mimic a true lesion and the inability to derive volumetric information. It has been shown that stereomammography can overcome these deficiencies by showing that layers of normal tissue lay at different depths. If standard stereomammography (i.e., a single stereoscopic pair consisting of two projection images) can significantly improve lesion detection, how will multiview stereoscopy (MVS), where many projection images are used, compare to mammography? The aim of this study was to assess the relative performance of MVS compared to mammography for breast mass detection. METHODS: The MVS image sets consisted of the 25 raw projection images acquired over an arc of approximately 45 degrees using a Siemens prototype breast tomosynthesis system. The mammograms were acquired using a commercial Siemens FFDM system. The raw data were taken from both of these systems for 27 cases and realistic simulated mass lesions were added to duplicates of the 27 images at the same local contrast. The images with lesions (27 mammography and 27 MVS) and the images without lesions (27 mammography and 27 MVS) were then postprocessed to provide comparable and representative image appearance across the two modalities. All 108 image sets were shown to five full-time breast imaging radiologists in random order on a state-of-the-art stereoscopic display. The observers were asked to give a confidence rating for each image (0 for lesion definitely not present, 100 for lesion definitely present). The ratings were then compiled and processed using ROC and variance analysis. RESULTS: The mean AUC for the five observers was 0.614 +/- 0.055 for mammography and 0.778 +/- 0.052 for multiview stereoscopy. The difference of 0.164 +/- 0.065 was statistically significant with a p-value of 0.0148. CONCLUSIONS: The differences in the AUCs and the p-value suggest that multiview stereoscopy has a statistically significant advantage over mammography in the detection of simulated breast masses. This highlights the dominance of anatomical noise compared to quantum noise for breast mass detection. It also shows that significant lesion detection can be achieved with MVS without any of the artifacts associated with tomosynthesis.
Resumo:
BACKGROUND: Writing plays a central role in the communication of scientific ideas and is therefore a key aspect in researcher education, ultimately determining the success and long-term sustainability of their careers. Despite the growing popularity of e-learning, we are not aware of any existing study comparing on-line vs. traditional classroom-based methods for teaching scientific writing. METHODS: Forty eight participants from a medical, nursing and physiotherapy background from US and Brazil were randomly assigned to two groups (n = 24 per group): An on-line writing workshop group (on-line group), in which participants used virtual communication, google docs and standard writing templates, and a standard writing guidance training (standard group) where participants received standard instruction without the aid of virtual communication and writing templates. Two outcomes, manuscript quality was assessed using the scores obtained in Six subgroup analysis scale as the primary outcome measure, and satisfaction scores with Likert scale were evaluated. To control for observer variability, inter-observer reliability was assessed using Fleiss's kappa. A post-hoc analysis comparing rates of communication between mentors and participants was performed. Nonparametric tests were used to assess intervention efficacy. RESULTS: Excellent inter-observer reliability among three reviewers was found, with an Intraclass Correlation Coefficient (ICC) agreement = 0.931882 and ICC consistency = 0.932485. On-line group had better overall manuscript quality (p = 0.0017, SSQSavg score 75.3 +/- 14.21, ranging from 37 to 94) compared to the standard group (47.27 +/- 14.64, ranging from 20 to 72). Participant satisfaction was higher in the on-line group (4.3 +/- 0.73) compared to the standard group (3.09 +/- 1.11) (p = 0.001). The standard group also had fewer communication events compared to the on-line group (0.91 +/- 0.81 vs. 2.05 +/- 1.23; p = 0.0219). CONCLUSION: Our protocol for on-line scientific writing instruction is better than standard face-to-face instruction in terms of writing quality and student satisfaction. Future studies should evaluate the protocol efficacy in larger longitudinal cohorts involving participants from different languages.
Resumo:
BACKGROUND: With the globalization of clinical trials, a growing emphasis has been placed on the standardization of the workflow in order to ensure the reproducibility and reliability of the overall trial. Despite the importance of workflow evaluation, to our knowledge no previous studies have attempted to adapt existing modeling languages to standardize the representation of clinical trials. Unified Modeling Language (UML) is a computational language that can be used to model operational workflow, and a UML profile can be developed to standardize UML models within a given domain. This paper's objective is to develop a UML profile to extend the UML Activity Diagram schema into the clinical trials domain, defining a standard representation for clinical trial workflow diagrams in UML. METHODS: Two Brazilian clinical trial sites in rheumatology and oncology were examined to model their workflow and collect time-motion data. UML modeling was conducted in Eclipse, and a UML profile was developed to incorporate information used in discrete event simulation software. RESULTS: Ethnographic observation revealed bottlenecks in workflow: these included tasks requiring full commitment of CRCs, transferring notes from paper to computers, deviations from standard operating procedures, and conflicts between different IT systems. Time-motion analysis revealed that nurses' activities took up the most time in the workflow and contained a high frequency of shorter duration activities. Administrative assistants performed more activities near the beginning and end of the workflow. Overall, clinical trial tasks had a greater frequency than clinic routines or other general activities. CONCLUSIONS: This paper describes a method for modeling clinical trial workflow in UML and standardizing these workflow diagrams through a UML profile. In the increasingly global environment of clinical trials, the standardization of workflow modeling is a necessary precursor to conducting a comparative analysis of international clinical trials workflows.
Resumo:
Forward stimulated Brillouin scattering (FSBS) is observed in a standard 2-km-long highly nonlinear fiber. The frequency of FSBS arising from multiple radially guided acoustic resonances is observed up to gigahertz frequencies. The tight confinement of the light and acoustic field enhances the interaction and results in a large gain coefficient of 34.7 W(-1) at a frequency of 933.8 MHz. We also find that the profile on the anti-Stokes side of the pump beam have lineshapes that are asymmetric, which we show is due to the interference between FSBS and the optical Kerr effect. The measured FSBS resonance linewidths are found to increase linearly with the acoustic frequency. Based on this scaling, we conclude that dominant contribution to the linewidth is from surface damping due to the fiber jacket and structural nonuniformities along the fiber.
Resumo:
Proteins are essential components of cells and are crucial for catalyzing reactions, signaling, recognition, motility, recycling, and structural stability. This diversity of function suggests that nature is only scratching the surface of protein functional space. Protein function is determined by structure, which in turn is determined predominantly by amino acid sequence. Protein design aims to explore protein sequence and conformational space to design novel proteins with new or improved function. The vast number of possible protein sequences makes exploring the space a challenging problem.
Computational structure-based protein design (CSPD) allows for the rational design of proteins. Because of the large search space, CSPD methods must balance search accuracy and modeling simplifications. We have developed algorithms that allow for the accurate and efficient search of protein conformational space. Specifically, we focus on algorithms that maintain provability, account for protein flexibility, and use ensemble-based rankings. We present several novel algorithms for incorporating improved flexibility into CSPD with continuous rotamers. We applied these algorithms to two biomedically important design problems. We designed peptide inhibitors of the cystic fibrosis agonist CAL that were able to restore function of the vital cystic fibrosis protein CFTR. We also designed improved HIV antibodies and nanobodies to combat HIV infections.
Resumo:
© 2005-2012 IEEE.Within industrial automation systems, three-dimensional (3-D) vision provides very useful feedback information in autonomous operation of various manufacturing equipment (e.g., industrial robots, material handling devices, assembly systems, and machine tools). The hardware performance in contemporary 3-D scanning devices is suitable for online utilization. However, the bottleneck is the lack of real-time algorithms for recognition of geometric primitives (e.g., planes and natural quadrics) from a scanned point cloud. One of the most important and the most frequent geometric primitive in various engineering tasks is plane. In this paper, we propose a new fast one-pass algorithm for recognition (segmentation and fitting) of planar segments from a point cloud. To effectively segment planar regions, we exploit the orthonormality of certain wavelets to polynomial function, as well as their sensitivity to abrupt changes. After segmentation of planar regions, we estimate the parameters of corresponding planes using standard fitting procedures. For point cloud structuring, a z-buffer algorithm with mesh triangles representation in barycentric coordinates is employed. The proposed recognition method is tested and experimentally validated in several real-world case studies.
Resumo:
Scheduling a set of jobs over a collection of machines to optimize a certain quality-of-service measure is one of the most important research topics in both computer science theory and practice. In this thesis, we design algorithms that optimize {\em flow-time} (or delay) of jobs for scheduling problems that arise in a wide range of applications. We consider the classical model of unrelated machine scheduling and resolve several long standing open problems; we introduce new models that capture the novel algorithmic challenges in scheduling jobs in data centers or large clusters; we study the effect of selfish behavior in distributed and decentralized environments; we design algorithms that strive to balance the energy consumption and performance.
The technically interesting aspect of our work is the surprising connections we establish between approximation and online algorithms, economics, game theory, and queuing theory. It is the interplay of ideas from these different areas that lies at the heart of most of the algorithms presented in this thesis.
The main contributions of the thesis can be placed in one of the following categories.
1. Classical Unrelated Machine Scheduling: We give the first polygorithmic approximation algorithms for minimizing the average flow-time and minimizing the maximum flow-time in the offline setting. In the online and non-clairvoyant setting, we design the first non-clairvoyant algorithm for minimizing the weighted flow-time in the resource augmentation model. Our work introduces iterated rounding technique for the offline flow-time optimization, and gives the first framework to analyze non-clairvoyant algorithms for unrelated machines.
2. Polytope Scheduling Problem: To capture the multidimensional nature of the scheduling problems that arise in practice, we introduce Polytope Scheduling Problem (\psp). The \psp problem generalizes almost all classical scheduling models, and also captures hitherto unstudied scheduling problems such as routing multi-commodity flows, routing multicast (video-on-demand) trees, and multi-dimensional resource allocation. We design several competitive algorithms for the \psp problem and its variants for the objectives of minimizing the flow-time and completion time. Our work establishes many interesting connections between scheduling and market equilibrium concepts, fairness and non-clairvoyant scheduling, and queuing theoretic notion of stability and resource augmentation analysis.
3. Energy Efficient Scheduling: We give the first non-clairvoyant algorithm for minimizing the total flow-time + energy in the online and resource augmentation model for the most general setting of unrelated machines.
4. Selfish Scheduling: We study the effect of selfish behavior in scheduling and routing problems. We define a fairness index for scheduling policies called {\em bounded stretch}, and show that for the objective of minimizing the average (weighted) completion time, policies with small stretch lead to equilibrium outcomes with small price of anarchy. Our work gives the first linear/ convex programming duality based framework to bound the price of anarchy for general equilibrium concepts such as coarse correlated equilibrium.
Resumo:
Determination of copy number variants (CNVs) inferred in genome wide single nucleotide polymorphism arrays has shown increasing utility in genetic variant disease associations. Several CNV detection methods are available, but differences in CNV call thresholds and characteristics exist. We evaluated the relative performance of seven methods: circular binary segmentation, CNVFinder, cnvPartition, gain and loss of DNA, Nexus algorithms, PennCNV and QuantiSNP. Tested data included real and simulated Illumina HumHap 550 data from the Singapore cohort study of the risk factors for Myopia (SCORM) and simulated data from Affymetrix 6.0 and platform-independent distributions. The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization before enacting full analysis. We used 10 SCORM samples for optimizing parameter settings for each method and then evaluated method performance at optimal parameters using 100 SCORM samples. The statistical power, false positive rates, and receiver operating characteristic (ROC) curve residuals were evaluated by simulation studies. Optimal parameters, as determined by NSR and ROC curve residuals, were consistent across datasets. QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs.
Resumo:
PURPOSE: X-ray computed tomography (CT) is widely used, both clinically and preclinically, for fast, high-resolution anatomic imaging; however, compelling opportunities exist to expand its use in functional imaging applications. For instance, spectral information combined with nanoparticle contrast agents enables quantification of tissue perfusion levels, while temporal information details cardiac and respiratory dynamics. The authors propose and demonstrate a projection acquisition and reconstruction strategy for 5D CT (3D+dual energy+time) which recovers spectral and temporal information without substantially increasing radiation dose or sampling time relative to anatomic imaging protocols. METHODS: The authors approach the 5D reconstruction problem within the framework of low-rank and sparse matrix decomposition. Unlike previous work on rank-sparsity constrained CT reconstruction, the authors establish an explicit rank-sparse signal model to describe the spectral and temporal dimensions. The spectral dimension is represented as a well-sampled time and energy averaged image plus regularly undersampled principal components describing the spectral contrast. The temporal dimension is represented as the same time and energy averaged reconstruction plus contiguous, spatially sparse, and irregularly sampled temporal contrast images. Using a nonlinear, image domain filtration approach, the authors refer to as rank-sparse kernel regression, the authors transfer image structure from the well-sampled time and energy averaged reconstruction to the spectral and temporal contrast images. This regularization strategy strictly constrains the reconstruction problem while approximately separating the temporal and spectral dimensions. Separability results in a highly compressed representation for the 5D data in which projections are shared between the temporal and spectral reconstruction subproblems, enabling substantial undersampling. The authors solved the 5D reconstruction problem using the split Bregman method and GPU-based implementations of backprojection, reprojection, and kernel regression. Using a preclinical mouse model, the authors apply the proposed algorithm to study myocardial injury following radiation treatment of breast cancer. RESULTS: Quantitative 5D simulations are performed using the MOBY mouse phantom. Twenty data sets (ten cardiac phases, two energies) are reconstructed with 88 μm, isotropic voxels from 450 total projections acquired over a single 360° rotation. In vivo 5D myocardial injury data sets acquired in two mice injected with gold and iodine nanoparticles are also reconstructed with 20 data sets per mouse using the same acquisition parameters (dose: ∼60 mGy). For both the simulations and the in vivo data, the reconstruction quality is sufficient to perform material decomposition into gold and iodine maps to localize the extent of myocardial injury (gold accumulation) and to measure cardiac functional metrics (vascular iodine). Their 5D CT imaging protocol represents a 95% reduction in radiation dose per cardiac phase and energy and a 40-fold decrease in projection sampling time relative to their standard imaging protocol. CONCLUSIONS: Their 5D CT data acquisition and reconstruction protocol efficiently exploits the rank-sparse nature of spectral and temporal CT data to provide high-fidelity reconstruction results without increased radiation dose or sampling time.