18 resultados para Statistical Language Model
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Tämä diplomityö liittyy Spektrikuvien tutkimiseen tilastollisen kuvamallin näkökulmasta. Diplomityön ensimmäisessä osassa tarkastellaan tilastollisten parametrien jakaumien vaikutusta väreihin ja korostumiin erilaisissa valaistusolosuhteissa. Havaittiin, että tilastollisten parametrien väliset suhteet eivät riipu valaistusolosuhteista, mutta riippuvat kuvan häiriöttömyydestä. Ilmeni myös, että korkea huipukkuus saattaa aiheutua värikylläisyydestä. Lisäksi työssä kehitettiin tilastolliseen spektrimalliin perustuvaa tekstuurinyhdistämisalgoritmia. Sillä saavutettiin hyviä tuloksia, kun tilastollisten parametrien väliset riippuvuussuhteet olivat voimassa. Työn toisessa osassa erilaisia spektrikuvia tutkittiin käyttäen itsenäistä komponenttien analyysia (ICA). Seuraavia itsenäiseen komponenttien analyysiin tarkoitettuja algoritmia tarkasteltiin: JADE, kiinteän pisteen ICA ja momenttikeskeinen ICA. Tutkimuksissa painotettiin erottelun laatua. Paras erottelu saavutettiin JADE- algoritmilla, joskin erot muiden algoritmien välillä eivät olleet merkittäviä. Algoritmi jakoi kuvan kahteen itsenäiseen, joko korostuneeseen ja korostumattomaan tai kromaattiseen ja akromaattiseen, komponenttiin. Lopuksi pohditaan huipukkuuden suhdetta kuvan ominaisuuksiin, kuten korostuneisuuteen ja värikylläisyyteen. Työn viimeisessä osassa ehdotetaan mahdollisia jatkotutkimuskohteita.
Resumo:
Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.
Resumo:
This thesis was focussed on statistical analysis methods and proposes the use of Bayesian inference to extract information contained in experimental data by estimating Ebola model parameters. The model is a system of differential equations expressing the behavior and dynamics of Ebola. Two sets of data (onset and death data) were both used to estimate parameters, which has not been done by previous researchers in (Chowell, 2004). To be able to use both data, a new version of the model has been built. Model parameters have been estimated and then used to calculate the basic reproduction number and to study the disease-free equilibrium. Estimates of the parameters were useful to determine how well the model fits the data and how good estimates were, in terms of the information they provided about the possible relationship between variables. The solution showed that Ebola model fits the observed onset data at 98.95% and the observed death data at 93.6%. Since Bayesian inference can not be performed analytically, the Markov chain Monte Carlo approach has been used to generate samples from the posterior distribution over parameters. Samples have been used to check the accuracy of the model and other characteristics of the target posteriors.
Resumo:
Biotechnology has been recognized as the key strategic technology for industrial growth. The industry is heavily dependent on basic research. Finland continues to rank in the top 10 of Europe's most innovative countries in terms of tax-policy, education system, infrastructure and the number of patents issued. Regardless of the excellent statistical results, the output of this innovativeness is below acceptable. Research on the issues hindering the output creation has already been done and the identifiable weaknesses in the Finland's National Innovation system are the non-existent growth of entrepreneurship and the missing internationalization. Finland is proven to have all the enablers of the innovation policy tools, but is lacking the incentives and rewards to push the enablers, such as knowledge and human capital, forward. Science Parks are the biggest operator in research institutes in the Finnish Science and Technology system. They exist with the purpose of speeding up the commercialization process of biotechnology innovations which usually include technological uncertainty, technical inexperience, business inexperience and high technology cost. Innovation management only internally is a rather historic approach, current trend drives towards open innovation model with strong triple helix linkages. The evident problems in the innovation management within the biotechnology industry are examined through a case study approach including analysis of the semi-structured interviews which included biotechnology and business expertise from Turku School of Economics. The results from the interviews supported the theoretical implications as well as conclusions derived from the pilot survey, which focused on the companies inside Turku Science Park network. One major issue that the Finland's National innovation system is struggling with is the fact that it is technology driven, not business pulled. Another problem is the university evaluation scale which focuses more on number of graduates and short-term factors, when it should put more emphasis on the cooperation success in the long-term, such as the triple helix connections with interaction and knowledge distribution. The results of this thesis indicated that there is indeed requirement for some structural changes in the Finland's National innovation system and innovation policy in order to generate successful biotechnology companies and innovation output. There is lack of joint output and scales of success, lack of people with experience, lack of language skills, lack of business knowledge and lack of growth companies.
Resumo:
In this thesis a model for managing the product data in a product transfer project was created for ABB Machines. This model was then applied for the ongoing product transfer project during its planning phase. Detailed information about the demands and challenges in product transfer projects was acquired by analyzing previous product transfer projects in participating organizations. This analysis and the ABB Gate Model were then used as a base for the creation of the model for managing the product data in a product transfer project. The created model shows the main tasks during each phase in the project, their sub-tasks and relatedness on general level. Furthermore the model emphasizes need for detailed analysis of the situation during the project planning phase. The created model for managing the product data in a product transfer project was applied into ongoing project two main areas; manufacturing instructions and production item data. The results showed that the greatest challenge considering the product transfer project in previously mentioned areas is the current state of the product data. Based on the findings, process and resource proposals for both the ongoing product transfer project and the BU Machines were given. For manufacturing instructions it is necessary to create detailed process instructions in receiving organizations own language for each department so that the manufacturing instructions can be used as a training material during the training in sending organization. For production item data the English version of the bill of materials needs to be fully in English. In addition it needs to be ensured that bill of materials is updated and these changes implemented before the training in sending organization begins.
Resumo:
In a very volatile industry of high technology it is of utmost importance to accurately forecast customers’ demand. However, statistical forecasting of sales, especially in heavily competitive electronics product business, has always been a challenging task due to very high variation in demand and very short product life cycles of products. The purpose of this thesis is to validate if statistical methods can be applied to forecasting sales of short life cycle electronics products and provide a feasible framework for implementing statistical forecasting in the environment of the case company. Two different approaches have been developed for forecasting on short and medium term and long term horizons. Both models are based on decomposition models, but differ in interpretation of the model residuals. For long term horizons residuals are assumed to represent white noise, whereas for short and medium term forecasting horizon residuals are modeled using statistical forecasting methods. Implementation of both approaches is performed in Matlab. Modeling results have shown that different markets exhibit different demand patterns and therefore different analytical approaches are appropriate for modeling demand in these markets. Moreover, the outcomes of modeling imply that statistical forecasting can not be handled separately from judgmental forecasting, but should be perceived only as a basis for judgmental forecasting activities. Based on modeling results recommendations for further deployment of statistical methods in sales forecasting of the case company are developed.
Resumo:
The optimal design of a heat exchanger system is based on given model parameters together with given standard ranges for machine design variables. The goals set for minimizing the Life Cycle Cost (LCC) function which represents the price of the saved energy, for maximizing the momentary heat recovery output with given constraints satisfied and taking into account the uncertainty in the models were successfully done. Nondominated Sorting Genetic Algorithm II (NSGA-II) for the design optimization of a system is presented and implemented inMatlab environment. Markov ChainMonte Carlo (MCMC) methods are also used to take into account the uncertainty in themodels. Results show that the price of saved energy can be optimized. A wet heat exchanger is found to be more efficient and beneficial than a dry heat exchanger even though its construction is expensive (160 EUR/m2) compared to the construction of a dry heat exchanger (50 EUR/m2). It has been found that the longer lifetime weights higher CAPEX and lower OPEX and vice versa, and the effect of the uncertainty in the models has been identified in a simplified case of minimizing the area of a dry heat exchanger.
Resumo:
The purpose of this comparative study is to profile second language learners by exploring the factors which have an impact on their learning. The subjects come from two different countries: one group comes from Milwaukee, US, and the other from Turku, Finland. The subjects have attended bilingual classes from elementary school to senior high school in their respective countries. In the United States, the subjects (N = 57) started in one elementary school from where they moved on to two high schools in the district. The Finnish subjects (N = 39) attended the same school from elementary to high school. The longitudinal study was conducted during 1994-2004 and combines both qualitative and quantitative research methods. A Pilot Study carried out in 1990-1991 preceded the two subsequent studies that form the core material of this research. The theoretical part of the study focuses first on language policies in the United States and Finland: special emphasis is given to the history, development and current state of bilingual education, and the factors that have affected policy-making in the provision of language instruction. Current language learning theories and models form the theoretical foundation of the research, and underpin the empirical studies. Cognitively-labeled theories are at the forefront, but sociocultural theory and the ecological approach are also accounted for. The research methods consist of questionnaires, compositions and interviews. A combination of statistical methods as well as content analysis were used in the analysis. The attitude of the bilingual learners toward L1 and L2 was generally positive: the subjects enjoyed learning through two languages and were motivated to learn both. The knowledge of L1 and parental support, along with early literacy in L1, facilitated the learning of L2. This was particularly evident in the American subject group. The American subjects’ L2 learning was affected by the attitudes of the learners to the L1 culture and its speakers. Furthermore, the negative attitudes taken by L1 speakers toward L2 speakers and the lack of opportunities to engage in activities in the L1 culture affected the American subjects’ learning of L2, English. The research showed that many American L2 learners were isolated from the L1 culture and were even afraid to use English in everyday communication situations. In light of the research results, a politically neutral linguistic environment, which the Finnish subjects inhabited, was seen to be more favorable for learning. The Finnish subjects were learning L2, English, in a neutral zone where their own attitudes and motivation dictated their learning. The role of L2 as a means of international communication in Finland, as opposed to a means of exercising linguistic power, provided a neutral atmosphere for learning English. In both the American and Finnish groups, the learning of other languages was facilitated when the learner had a good foundation in their L1, and the learning of L1 and L2 were in balance. Learning was also fostered when the learners drew positive experiences from their surroundings and were provided with opportunities to engage in activities where L2 was used.
Resumo:
The article describes some concrete problems that were encountered when writing a two-level model of Mari morphology. Mari is an agglutinative Finno-Ugric language spoken in Russia by about 600 000 people. The work was begun in the 1980s on the basis of K. Koskenniemi’s Two-Level Morphology (1983), but in the latest stage R. Beesley’s and L. Karttunen’s Finite State Morphology (2003) was used. Many of the problems described in the article concern the inexplicitness of the rules in Mari grammars and the lack of information about the exact distribution of some suffixes, e.g. enclitics. The Mari grammars usually give complete paradigms for a few unproblematic verb stems, whereas the difficult or unclear forms of certain verbs are only superficially discussed. Another example of phenomena that are poorly described in grammars is the way suffixes with an initial sibilant combine to stems ending in a sibilant. The help of informants and searches from electronic corpora were used to overcome such difficulties in the development of the two-level model of Mari. The variation of the order of plural markers, case suffixes and possessive suffixes is a typical feature of Mari. The morphotactic rules constructed for Mari declensional forms tend to be recursive and their productivity must be limited by some technical device, such as filters. In the present model, certain plural markers were treated like nouns. The positional and functional versatility of the possessive suffixes can be regarded as the most challenging phenomenon in attempts to formalize the Mari morphology. Cyrillic orthography, which was used in the model, also caused problems. For instance, a Cyrillic letter may represent a sequence of two sounds, the first being part of the word stem while the other belongs to a suffix. In some cases, letters for voiced consonants are also generalized to represent voiceless consonants. Such orthographical conventions distance a morphological model based on orthography from the actual (morpho)phonological processes in the language.
Resumo:
The identifiability of the parameters of a heat exchanger model without phase change was studied in this Master’s thesis using synthetically made data. A fast, two-step Markov chain Monte Carlo method (MCMC) was tested with a couple of case studies and a heat exchanger model. The two-step MCMC-method worked well and decreased the computation time compared to the traditional MCMC-method. The effect of measurement accuracy of certain control variables to the identifiability of parameters was also studied. The accuracy used did not seem to have a remarkable effect to the identifiability of parameters. The use of the posterior distribution of parameters in different heat exchanger geometries was studied. It would be computationally most efficient to use the same posterior distribution among different geometries in the optimisation of heat exchanger networks. According to the results, this was possible in the case when the frontal surface areas were the same among different geometries. In the other cases the same posterior distribution can be used for optimisation too, but that will give a wider predictive distribution as a result. For condensing surface heat exchangers the numerical stability of the simulation model was studied. As a result, a stable algorithm was developed.
Resumo:
The Fed model is a widely used market valuation model. It is often used only on market analysis of the S&P 500 index as a shorthand measure for the attractiveness of equity, and as a timing device for allocating funds between equity and bonds. The Fed model assumes a fixed relationship between bond yield and earnings yield. This relationship is often assumed to be true in market valuation. In this paper we test the Fed model from historical perspective on the European markets. The markets of the United States are also includedfor comparison. The purpose of the tests is to determine if the Fed model and the underlying assumptions come true on different markets. The various tests are made on time-series data ranging from the year 1973 to the end of the year 2008. The statistical methods used are regressions analysis, cointegration analysis and Granger causality. The empirical results do not give strong support for the Fed model. The underlying relationships assumed by the Fed model are statistically not valid in most of the markets examined and therefore the model is not valid in valuation purposes generally. The results vary between the different markets which gives reason to suspect the general use of the Fed model in different market conditions and in different markets.
Resumo:
Denna avhandling tar sin utgångspunkt i ett ifrågasättande av effektiviteten i EU:s konditionalitetspolitik avseende minoritetsrättigheter. Baserat på den rationalistiska teoretiska modellen, External Incentives Model of Governance, syftar denna hypotesprövande avhandling till att förklara om tidsavståndet på det potentiella EU medlemskapet påverkar lagstiftningsnivån avseende minoritetsspråksrättigheter. Mätningen av nivån på lagstiftningen avseende minoritetsspråksrättigheter begränsas till att omfatta icke-diskriminering, användning av minoritetsspråk i officiella sammanhang samt minoriteters språkliga rättigheter i utbildningen. Metodologiskt används ett jämförande angreppssätt både avseende tidsramen för studien, som sträcker sig mellan 2003 och 2010, men även avseende urvalet av stater. På basis av det \"mest lika systemet\" kategoriseras staterna i tre grupper efter deras olika tidsavstånd från det potentiella EU medlemskapet. Hypotesen som prövas är följande: ju kortare tidsavstånd till det potentiella EU medlemskapet desto större sannolikhet att staternas lagstiftningsnivå inom de tre områden som studeras har utvecklats till en hög nivå. Studien visar att hypotesen endast bekräftas delvis. Resultaten avseende icke-diskriminering visar att sambandet mellan tidsavståndet och nivån på lagstiftningen har ökat markant under den undersökta tidsperioden. Detta samband har endast stärkts mellan kategorin av stater som ligger tidsmässigt längst bort ett potentiellt EU medlemskap och de två kategorier som ligger närmare respektive närmast ett potentiellt EU medlemskap. Resultaten avseende användning av minoritetsspråk i officiella sammanhang och minoriteters språkliga rättigheter i utbildningen visar inget respektive nästan inget samband mellan tidsavståndet och utvecklingen på lagstiftningen mellan 2003 och 2010.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.
Resumo:
The capabilities and thus, design complexity of VLSI-based embedded systems have increased tremendously in recent years, riding the wave of Moore’s law. The time-to-market requirements are also shrinking, imposing challenges to the designers, which in turn, seek to adopt new design methods to increase their productivity. As an answer to these new pressures, modern day systems have moved towards on-chip multiprocessing technologies. New architectures have emerged in on-chip multiprocessing in order to utilize the tremendous advances of fabrication technology. Platform-based design is a possible solution in addressing these challenges. The principle behind the approach is to separate the functionality of an application from the organization and communication architecture of hardware platform at several levels of abstraction. The existing design methodologies pertaining to platform-based design approach don’t provide full automation at every level of the design processes, and sometimes, the co-design of platform-based systems lead to sub-optimal systems. In addition, the design productivity gap in multiprocessor systems remain a key challenge due to existing design methodologies. This thesis addresses the aforementioned challenges and discusses the creation of a development framework for a platform-based system design, in the context of the SegBus platform - a distributed communication architecture. This research aims to provide automated procedures for platform design and application mapping. Structural verification support is also featured thus ensuring correct-by-design platforms. The solution is based on a model-based process. Both the platform and the application are modeled using the Unified Modeling Language. This thesis develops a Domain Specific Language to support platform modeling based on a corresponding UML profile. Object Constraint Language constraints are used to support structurally correct platform construction. An emulator is thus introduced to allow as much as possible accurate performance estimation of the solution, at high abstraction levels. VHDL code is automatically generated, in the form of “snippets” to be employed in the arbiter modules of the platform, as required by the application. The resulting framework is applied in building an actual design solution for an MP3 stereo audio decoder application.
Resumo:
With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.