836 resultados para Probabilistic latent semantic analysis (PLSA)
Resumo:
Pectobacterium atrosepticum on Gram-negatiivinen bakteeri, joka aiheuttaa perunan tyvi- ja märkämätää. P. atrosepticum bakteerin optimilämpötila on melko alhainen ja se on yleinen lauhkeilla alueilla. Tyvimätä leviää pääasiassa siemenperunan välityksellä ja siksi se on ongelma erityisesti siemenperunan tuotannossa. P. atrosepticum kannan SCRI1043 genomi on julkaistu ja sitä tutkitaan malliorganismina märkä- ja tyvimädän taudinaiheuttamisen ymmärtämiseksi. Tämä opportunistinen taudinaiheuttaja voi elää isäntäkasvissa kuukausia piilevänä, aiheuttamatta näkyviä oireita. Suotuisissa olosuhteissa bakteerit alkavat jakautua ja tuottaa kasvin kudoksia hajottavia entsyymejä. Mädäntyvä kasvimassa tarjoaa ravinteita bakteerien kasvuun ja mahdollistaa isäntäkasvin asuttamisen. Soluseiniä hajottavien entsyymien merkitys taudinaiheuttamisessa on hyvin tunnettu, mutta oireettomasta jaksosta ja taudin alkuvaiheista tiedätään vain vähän. Bakteerin genomi sisältää monia toksiineja, adhesiineja, hemolysiineja ja muita proteiineja, joilla saattaa olla merkitys taudinaiheuttamisessa. Tässä työssä käytettiin proteomiikkaa ja mikrosiruanalysiä P. atrosepticum bakteerin erittyvien proteiinien ja geeniekspression tutkimiseen. Proteiinit, jotka eritetään ulos bakteerista, toimivat todennäköisesti taudinaiheuttamisessa, koska ne ovat suorassa kontaktissa isäntäkasvin kanssa. Analyysit suoritettiin olosuhteissa, jotka muistuttavat kasvin soluvälitilaa: matala pH, vähän ravinteita ja matala lämpötila. Isäntäkasvin läsnäolon vaikutusta proteiinien tuottoon ja geeniekspressioon tutkittiin lisäämällä perunauutetta kasvatusalustaan. Tutkimuksessa tunnistettiin P. atrosepticum bakteerin monia jo tunnettuja ja mahdollisesti taudinaiheuttamiseen liittyviä proteiineja. Perunauute lisäsi hiljattain tunnistetun, proteiinien eritysreittiä (tyyppi VI sekreetio, T6SS) koodaavien geenien ilmentymistä. Lisäksi bakteerin havaittiin erittävän useita T6SS:n liittyviä proteiineja kasvualustaan, johon oli lisätty perunauutetta. T6SS:n merkitys bakteereille on vielä epäselvä ja sen vaikutuksesta taudinaiheuttamiseen on julkaistu ristiriitaisia tuloksia. Märkä- ja tyvimädän ymmärtäminen molekulaarisella tasolla luo pohjan tautien kontrollointiin tähtäävään soveltavaan tutkimukseen. Tämä tutkimus lisää tietoa kasvi-patogeeni- interaktiosta ja sitä voidaan tulevaisuudessa käyttää hyväksi esimerkiksi diagnostiikassa, resistenttien perunalajikkeiden jalostuksessa tai viljely- ja varastointiolosuhteiden parantamisessa.
Resumo:
The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.
Resumo:
The prevalence of latent autoimmune diabetes in adults (LADA) in patients diagnosed with type 2 diabetes mellitus (T2DM) ranges from 7 to 10% (1). They present at a younger age and have a lower BMI but poorer glycemic control, which may increase the risk of complications (2). However, a recent analysis of the Collaborative Atorvastatin Diabetes Study (CARDS) has demonstrated no difference in macrovascular or microvascular events between patients with LADA and T2DM, but neuropathy was not assessed (3). Previous studies quantifying neuropathy in patients with LADA are limited. In this study, we aimed to accurately quantify neuropathy in subjects with LADA compared with matched patients with T2DM.
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.
Resumo:
Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.
Resumo:
Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.
Resumo:
What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.
Resumo:
Information exchange (IE) is a critical component of the complex collaborative medication process in residential aged care facilities (RACFs). Designing information and communication technology (ICT) to support complex processes requires a profound understanding of the IE that underpins their execution. There is little existing research that investigates the complexity of IE in RACFs and its impact on ICT design. The aim of this study was thus to undertake an in-depth exploration of the IE process involved in medication management to identify its implications for the design of ICT. The study was undertaken at a large metropolitan facility in NSW, Australia. A total of three focus groups, eleven interviews and two observation sessions were conducted between July to August 2010. Process modelling was undertaken by translating the qualitative data via in-depth iterative inductive analysis. The findings highlight the complexity and collaborative nature of IE in RACF medication management. These models emphasize the need to: a) deal with temporal complexity; b) rely on an interdependent set of coordinative artefacts; and c) use synchronous communication channels for coordination. Taken together these are crucial aspects of the IE process in RACF medication management that need to be catered for when designing ICT in this critical area. This study provides important new evidence of the advantages of viewing process as a part of a system rather than as segregated tasks as a means of identifying the latent requirements for ICT design and that is able to support complex collaborative processes like medication management in RACFs. © 2012 IEEE.
Resumo:
The Kachchh region of Gujarat, India bore the brunt of a disastrous earthquake of magnitude M-w=7.6 that occurred on January 26, 2001. The major cause of failure of various structures including earthen dams was noted to be the presence of liquefiable alluvium in the foundation soil. Results of back-analysis of failures of Chang, Tappar, Kaswati and Rudramata earth dams using pseudo-static limit equilibrium approach presented in this paper confirm that the presence of liquefiable layer contributed to lesser factors of safety leading to a base type of failure that was also observed in the field. Following the earthquake, earth dams have been rehabilitated by the concerned authority and it is imperative that the reconstructed sections of earth dams be reanalyzed. It is also increasingly realized that risk assessment of dams in view of the large-scale investment made and probabilistic analysis is necessary. In this study, it is demonstrated that the probabilistic approach when used in conjunction with deterministic approach helps in providing a rational solution for quantification of safety of the dam and in the estimation of risk associated with the dam construction. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
A straightforward computation of the list of the words (the `tail words' of the list) that are distributionally most similar to a given word (the `head word' of the list) leads to the question: How semantically similar to the head word are the tail words; that is: how similar are their meanings to its meaning? And can we do better? The experiment was done on nearly 18,000 most frequent nouns in a Finnish newsgroup corpus. These nouns are considered to be distributionally similar to the extent that they occur in the same direct dependency relations with the same nouns, adjectives and verbs. The extent of the similarity of their computational representations is quantified with the information radius. The semantic classification of head-tail pairs is intuitive; some tail words seem to be semantically similar to the head word, some do not. Each such pair is also associated with a number of further distributional variables. Individually, their overlap for the semantic classes is large, but the trained classification-tree models have some success in using combinations to predict the semantic class. The training data consists of a random sample of 400 head-tail pairs with the tail word ranked among the 20 distributionally most similar to the head word, excluding names. The models are then tested on a random sample of another 100 such pairs. The best success rates range from 70% to 92% of the test pairs, where a success means that the model predicted my intuitive semantic class of the pair. This seems somewhat promising when distributional similarity is used to capture semantically similar words. This analysis also includes a general discussion of several different similarity formulas, arranged in three groups: those that apply to sets with graded membership, those that apply to the members of a vector space, and those that apply to probability mass functions.
Resumo:
In the present study, results of reliability analyses of four selected rehabilitated earth dam sections, i.e., Chang, Tapar, Rudramata, and Kaswati, under pseudostatic loading conditions, are presented. Using the response surface methodology, in combination with first order reliability method and numerical analysis, the reliability index (beta) values are obtained and results are interpreted in conjunction with conventional factor of safety values. The influence of considering variability in the input soil shear strength parameters, horizontal seismic coefficient (alpha(h)), and location of reservoir full level on the stability assessment of the earth dam sections is discussed in the probabilistic framework. A comparison of results with those obtained from other method of reliability analysis, viz., Monte Carlo simulations combined with limit equilibrium approach, provided a basis for discussing the stability of earth dams in probabilistic terms, and the results of the analysis suggest that the considered earth dam sections are reliable and are expected to perform satisfactorily.
Resumo:
The problem of time variant reliability analysis of existing structures subjected to stationary random dynamic excitations is considered. The study assumes that samples of dynamic response of the structure, under the action of external excitations, have been measured at a set of sparse points on the structure. The utilization of these measurements m in updating reliability models, postulated prior to making any measurements, is considered. This is achieved by using dynamic state estimation methods which combine results from Markov process theory and Bayes' theorem. The uncertainties present in measurements as well as in the postulated model for the structural behaviour are accounted for. The samples of external excitations are taken to emanate from known stochastic models and allowance is made for ability (or lack of it) to measure the applied excitations. The future reliability of the structure is modeled using expected structural response conditioned on all the measurements made. This expected response is shown to have a time varying mean and a random component that can be treated as being weakly stationary. For linear systems, an approximate analytical solution for the problem of reliability model updating is obtained by combining theories of discrete Kalman filter and level crossing statistics. For the case of nonlinear systems, the problem is tackled by combining particle filtering strategies with data based extreme value analysis. In all these studies, the governing stochastic differential equations are discretized using the strong forms of Ito-Taylor's discretization schemes. The possibility of using conditional simulation strategies, when applied external actions are measured, is also considered. The proposed procedures are exemplifiedmby considering the reliability analysis of a few low-dimensional dynamical systems based on synthetically generated measurement data. The performance of the procedures developed is also assessed based on a limited amount of pertinent Monte Carlo simulations. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The methods of secondary wood processing are assumed to evolve over time and to affect the requirements set for the wood material and its suppliers. The study aimed at analysing the industrial operating modes applied by joinery and furniture manufacturers as sawnwood users. Industrial operating mode was defined as a pattern of important decisions and actions taken by a company which describes the company's level of adjustment in the late-industrial transition. A non-probabilistic sample of 127 companies was interviewed, including companies from Denmark, Germany, the Netherlands, and Finland. Fifty-two of the firms were furniture manufacturers and the other 75 were producing windows and doors. Variables related to business philosophy, production operations, and supplier choice criteria were measured and used as a basis for a customer typology; variables related to wood usage and perceived sawmill performance were measured to be used to profile the customer types. Factor analysis was used to determine the latent dimensions of industrial operating mode. Canonical correlations analysis was applied in developing the final base for classifying the observations. Non-hierarchical cluster analysis was employed to build a five-group typology of secondary wood processing firms; these ranged from traditional mass producers to late-industrial flexible manufacturers. There is a clear connection between the amount of late-industrial elements in a company and the share of special and customised sawnwood it uses. Those joinery or furniture manufacturers that are more late-industrial also are likely to use more component-type wood material and to appreciate customer-oriented technical precision. The results show that the change is towards the use of late-industrial sawnwood materials and late-industrial supplier relationships.
Resumo:
The conventional Cornell's source-based approach of probabilistic seismic-hazard assessment (PSHA) has been employed all around the world, whilst many studies often rely on the use of computer packages such as FRISK (McGuire FRISK-a computer program for seismic risk analysis. Open-File Report 78-1007, United States Geological Survey, Department of Interior, Washington 1978) and SEISRISK III (Bender and Perkins SEISRISK III-a computer program for seismic hazard estimation, Bulletin 1772. United States Geological Survey, Department of Interior, Washington 1987). A ``black-box'' syndrome may be resulted if the user of the software does not have another simple and robust PSHA method that can be used to make comparisons. An alternative method for PSHA, namely direct amplitude-based (DAB) approach, has been developed as a heuristic and efficient method enabling users to undertake their own sanity checks on outputs from computer packages. This paper experiments the application of the DAB approach for three cities in China, Iran, and India, respectively, and compares with documented results computed by the source-based approach. Several insights regarding the procedure of conducting PSHA have also been obtained, which could be useful for future seismic-hazard studies.
Resumo:
Context-sensitive points-to analysis is critical for several program optimizations. However, as the number of contexts grows exponentially, storage requirements for the analysis increase tremendously for large programs, making the analysis non-scalable. We propose a scalable flow-insensitive context-sensitive inclusion-based points-to analysis that uses a specially designed multi-dimensional bloom filter to store the points-to information. Two key observations motivate our proposal: (i) points-to information (between pointer-object and between pointer-pointer) is sparse, and (ii) moving from an exact to an approximate representation of points-to information only leads to reduced precision without affecting correctness of the (may-points-to) analysis. By using an approximate representation a multi-dimensional bloom filter can significantly reduce the memory requirements with a probabilistic bound on loss in precision. Experimental evaluation on SPEC 2000 benchmarks and two large open source programs reveals that with an average storage requirement of 4MB, our approach achieves almost the same precision (98.6%) as the exact implementation. By increasing the average memory to 27MB, it achieves precision upto 99.7% for these benchmarks. Using Mod/Ref analysis as the client, we find that the client analysis is not affected that often even when there is some loss of precision in the points-to representation. We find that the NoModRef percentage is within 2% of the exact analysis while requiring 4MB (maximum 15MB) memory and less than 4 minutes on average for the points-to analysis. Another major advantage of our technique is that it allows to trade off precision for memory usage of the analysis.