181 resultados para Datavetenskap (datalogi)


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Solutions to combinatorial optimization problems, such as problems of locating facilities, frequently rely on heuristics to minimize the objective function. The optimum is sought iteratively and a criterion is needed to decide when the procedure (almost) attains it. Pre-setting the number of iterations dominates in OR applications, which implies that the quality of the solution cannot be ascertained. A small, almost dormant, branch of the literature suggests using statistical principles to estimate the minimum and its bounds as a tool to decide upon stopping and evaluating the quality of the solution. In this paper we examine the functioning of statistical bounds obtained from four different estimators by using simulated annealing on p-median test problems taken from Beasley’s OR-library. We find the Weibull estimator and the 2nd order Jackknife estimator preferable and the requirement of sample size to be about 10 being much less than the current recommendation. However, reliable statistical bounds are found to depend critically on a sample of heuristic solutions of high quality and we give a simple statistic useful for checking the quality. We end the paper with an illustration on using statistical bounds in a problem of locating some 70 distribution centers of the Swedish Post in one Swedish region. 

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Detta arbete har genomförts i samarbete med Försvarsmakten och behandlar vilka möjligheter som finns för forensiska undersökningar av e-boksläsaren Amazon Kindle. I arbetets litteraturstudie beskrivs hur tidigare forskning inom ämnet är kraftigt begränsad. Arbetet syftar därför till att besvara hur data kan extraheras från en Kindle, vilka data av forensiskt intresse en Kindle kan innehålla, var denna information lagras och om detta skiljer sig åt mellan olika modeller och firmware-versioner samt om det är nog att undersöka endast den del av minnet som är tillgänglig för användaren eller om ytterligare privilegier för att komma åt hela minnesarean bör införskaffas. För att göra detta fylls tre olika modeller av Kindles med information. Därefter tas avbilder på dem, dels på endast användarpartitionen och dels på dess fullständiga minnesarea efter att en privilegie-eskalering har utförts. Inhämtad data analyseras och resultatet presenteras. Resultatet visar att information av forensiskt intresse så som anteckningar, besökta webbsidor och dokument kan återfinnas, varför det finns ett värde i att utföra forensiska undersökningar på Amazon Kindles. Skillnader råder mellan vilken information som kan återfinnas och var den lagras på de olika enheterna. Enheterna har fyra partitioner varav endast en kan kommas åt utan privilegie-eskalering, varför det finns en fördel med att inhämta avbilder av hela minnesarean. Utöver ovanstående presenteras en metod för att förbipassera en enhets kodlås och därigenom få fullständig åtkomst till den även om den är låst.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The p-median problem is often used to locate P service facilities in a geographically distributed population. Important for the performance of such a model is the distance measure. Distance measure can vary if the accuracy of the road network varies. The rst aim in this study is to analyze how the optimal location solutions vary, using the p-median model, when the road network is alternated. It is hard to nd an exact optimal solution for p-median problems. Therefore, in this study two heuristic solutions are applied, simulating annealing and a classic heuristic. The secondary aim is to compare the optimal location solutions using dierent algorithms for large p-median problem. The investigation is conducted by the means of a case study in a rural region with an asymmetrically distributed population, Dalecarlia. The study shows that the use of more accurate road networks gives better solutions for optimal location, regardless what algorithm that is used and regardless how many service facilities that is optimized for. It is also shown that the simulated annealing algorithm not just is much faster than the classic heuristic used here, but also in most cases gives better location solutions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we propose a new method for solving large scale p-median problem instances based on real data. We compare different approaches in terms of runtime, memory footprint and quality of solutions obtained. In order to test the different methods on real data, we introduce a new benchmark for the p-median problem based on real Swedish data. Because of the size of the problem addressed, up to 1938 candidate nodes, a number of algorithms, both exact and heuristic, are considered. We also propose an improved hybrid version of a genetic algorithm called impGA. Experiments show that impGA behaves as well as other methods for the standard set of medium-size problems taken from Beasley’s benchmark, but produces comparatively good results in terms of quality, runtime and memory footprint on our specific benchmark based on real Swedish data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A challenge for the clinical management of advanced Parkinson’s disease (PD) patients is the emergence of fluctuations in motor performance, which represents a significant source of disability during activities of daily living of the patients. There is a lack of objective measurement of treatment effects for in-clinic and at-home use that can provide an overview of the treatment response. The objective of this paper was to develop a method for objective quantification of advanced PD motor symptoms related to off episodes and peak dose dyskinesia, using spiral data gathered by a touch screen telemetry device. More specifically, the aim was to objectively characterize motor symptoms (bradykinesia and dyskinesia), to help in automating the process of visual interpretation of movement anomalies in spirals as rated by movement disorder specialists. Digitized upper limb movement data of 65 advanced PD patients and 10 healthy (HE) subjects were recorded as they performed spiral drawing tasks on a touch screen device in their home environment settings. Several spatiotemporal features were extracted from the time series and used as inputs to machine learning methods. The methods were validated against ratings on animated spirals scored by four movement disorder specialists who visually assessed a set of kinematic features and the motor symptom. The ability of the method to discriminate between PD patients and HE subjects and the test-retest reliability of the computed scores were also evaluated. Computed scores correlated well with mean visual ratings of individual kinematic features. The best performing classifier (Multilayer Perceptron) classified the motor symptom (bradykinesia or dyskinesia) with an accuracy of 84% and area under the receiver operating characteristics curve of 0.86 in relation to visual classifications of the raters. In addition, the method provided high discriminating power when distinguishing between PD patients and HE subjects as well as had good test-retest reliability. This study demonstrated the potential of using digital spiral analysis for objective quantification of PD-specific and/or treatment-induced motor symptoms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The national railway administrations in Scandinavia, Germany, and Austria mainly resort to manual inspections to control vegetation growth along railway embankments. Manually inspecting railways is slow and time consuming. A more worrying aspect concerns the fact that human observers are often unable to estimate the true cover of vegetation on railway embankments. Further human observers often tend to disagree with each other when more than one observer is engaged for inspection. Lack of proper techniques to identify the true cover of vegetation even result in the excess usage of herbicides; seriously harming the environment and threating the ecology. Hence work in this study has investigated aspects relevant to human variationand agreement to be able to report better inspection routines. This was studied by mainly carrying out two separate yet relevant investigations.First, thirteen observers were separately asked to estimate the vegetation cover in nine imagesacquired (in nadir view) over the railway tracks. All such estimates were compared relatively and an analysis of variance resulted in a significant difference on the observers’ cover estimates (p<0.05). Bearing in difference between the observers, a second follow-up field-study on the railway tracks was initiated and properly investigated. Two railway segments (strata) representingdifferent levels of vegetationwere carefully selected. Five sample plots (each covering an area of one-by-one meter) were randomizedfrom each stratumalong the rails from the aforementioned segments and ten images were acquired in nadir view. Further three observers (with knowledge in the railway maintenance domain) were separately asked to estimate the plant cover by visually examining theplots. Again an analysis of variance resulted in a significant difference on the observers’ cover estimates (p<0.05) confirming the result from the first investigation.The differences in observations are compared against a computer vision algorithm which detects the "true" cover of vegetation in a given image. The true cover is defined as the amount of greenish pixels in each image as detected by the computer vision algorithm. Results achieved through comparison strongly indicate that inconsistency is prevalent among the estimates reported by the observers. Hence, an automated approach reporting the use of computer vision is suggested, thus transferring the manual inspections into objective monitored inspections

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Market research is often conducted through conventional methods such as surveys, focus groups and interviews. But the drawbacks of these methods are that they can be costly and timeconsuming. This study develops a new method, based on a combination of standard techniques like sentiment analysis and normalisation, to conduct market research in a manner that is free and quick. The method can be used in many application-areas, but this study focuses mainly on the veganism market to identify vegan food preferences in the form of a profile. Several food words are identified, along with their distribution between positive and negative sentiments in the profile. Surprisingly, non-vegan foods such as cheese, cake, milk, pizza and chicken dominate the profile, indicating that there is a significant market for vegan-suitable alternatives for such foods. Meanwhile, vegan-suitable foods such as coconut, potato, blueberries, kale and tofu also make strong appearances in the profile. Validation is performed by using the method on Volkswagen vehicle data to identify positive and negative sentiment across five car models. Some results were found to be consistent with sales figures and expert reviews, while others were inconsistent. The reliability of the method is therefore questionable, so the results should be used with caution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Unplanned hospital readmissions increase health and medical care costs and indicate lower the lower quality of the healthcare services. Hence, predicting patients at risk to be readmitted is of interest. Using administrative data of patients being treated in the medical centers and hospitals in the Dalarna County, Sweden, during 2008 – 2016 two risk prediction models of hospital readmission are built. The first model relies on the logistic regression (LR) approach, predicts correctly 2,648 out of 3,392 observed readmission in the test dataset, reaching a c-statistics of 0.69. The second model is built using random forests (RF) algorithm; correctly predicts 2,183 readmission (out of 3,366) and 13,198 non-readmission events (out of 18,982). The discriminating ability of the best performing RF model (c-statistic 0.60) is comparable to that of the logistic model. Although the discriminating ability of both LR and RF risk prediction models is relatively modest, still these models are capable to identify patients running high risk of hospital readmission. These patients can then be targeted with specific interventions, in order to prevent the readmission, improve patients’ quality of life and reduce health and medical care costs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Low Level Virtual Machine (LLVM) on moderni koko ohjelman elinkaaren optimointeihin keskittyvä kääntäjäarkkitehtuuri. Java-virtuaalikone on puolestaan suosittu korkean tason virtuaalikone, johon monien ohjelmointikielten toteutus nykyään perustuu. Tutkielmassa esitellään alun perin suorituskykyisen C- ja C++-kääntäjän toteuttamiseksi luotu LLVM-järjestelmä ja arvioidaan, miten hyvin LLVM-infrastruktuuri tukee Java-virtuaalikoneen toteuttamista. Tämän lisäksi tutkielmassa pohditaan, miten dynaamisten kielten usein tarvitsemaa suoritusaikaista ja lähdekieliriippuvaista optimointia voidaan tukea lähdekieliriippumattomassa LLVM-järjestelmässä. Lopuksi tutkielmassa esitellään kehitysehdotelma yleisen roskienkeruuinfrastruktuurin toteuttamiseksi LLVM:ssä, mikä tukisi dynaamista muistia automaattisesti hallitsevien kielten, kuten Javan ja sen virtuaalikoneen toteuttamista.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Metabolism is the cellular subsystem responsible for generation of energy from nutrients and production of building blocks for larger macromolecules. Computational and statistical modeling of metabolism is vital to many disciplines including bioengineering, the study of diseases, drug target identification, and understanding the evolution of metabolism. In this thesis, we propose efficient computational methods for metabolic modeling. The techniques presented are targeted particularly at the analysis of large metabolic models encompassing the whole metabolism of one or several organisms. We concentrate on three major themes of metabolic modeling: metabolic pathway analysis, metabolic reconstruction and the study of evolution of metabolism. In the first part of this thesis, we study metabolic pathway analysis. We propose a novel modeling framework called gapless modeling to study biochemically viable metabolic networks and pathways. In addition, we investigate the utilization of atom-level information on metabolism to improve the quality of pathway analyses. We describe efficient algorithms for discovering both gapless and atom-level metabolic pathways, and conduct experiments with large-scale metabolic networks. The presented gapless approach offers a compromise in terms of complexity and feasibility between the previous graph-theoretic and stoichiometric approaches to metabolic modeling. Gapless pathway analysis shows that microbial metabolic networks are not as robust to random damage as suggested by previous studies. Furthermore the amino acid biosynthesis pathways of the fungal species Trichoderma reesei discovered from atom-level data are shown to closely correspond to those of Saccharomyces cerevisiae. In the second part, we propose computational methods for metabolic reconstruction in the gapless modeling framework. We study the task of reconstructing a metabolic network that does not suffer from connectivity problems. Such problems often limit the usability of reconstructed models, and typically require a significant amount of manual postprocessing. We formulate gapless metabolic reconstruction as an optimization problem and propose an efficient divide-and-conquer strategy to solve it with real-world instances. We also describe computational techniques for solving problems stemming from ambiguities in metabolite naming. These techniques have been implemented in a web-based sofware ReMatch intended for reconstruction of models for 13C metabolic flux analysis. In the third part, we extend our scope from single to multiple metabolic networks and propose an algorithm for inferring gapless metabolic networks of ancestral species from phylogenetic data. Experimenting with 16 fungal species, we show that the method is able to generate results that are easily interpretable and that provide hypotheses about the evolution of metabolism.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large-scale chromosome rearrangements such as copy number variants (CNVs) and inversions encompass a considerable proportion of the genetic variation between human individuals. In a number of cases, they have been closely linked with various inheritable diseases. Single-nucleotide polymorphisms (SNPs) are another large part of the genetic variance between individuals. They are also typically abundant and their measuring is straightforward and cheap. This thesis presents computational means of using SNPs to detect the presence of inversions and deletions, a particular variety of CNVs. Technically, the inversion-detection algorithm detects the suppressed recombination rate between inverted and non-inverted haplotype populations whereas the deletion-detection algorithm uses the EM-algorithm to estimate the haplotype frequencies of a window with and without a deletion haplotype. As a contribution to population biology, a coalescent simulator for simulating inversion polymorphisms has been developed. Coalescent simulation is a backward-in-time method of modelling population ancestry. Technically, the simulator also models multiple crossovers by using the Counting model as the chiasma interference model. Finally, this thesis includes an experimental section. The aforementioned methods were tested on synthetic data to evaluate their power and specificity. They were also applied to the HapMap Phase II and Phase III data sets, yielding a number of candidates for previously unknown inversions, deletions and also correctly detecting known such rearrangements.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.