939 resultados para Statistical Language Model
Resumo:
The article describes some concrete problems that were encountered when writing a two-level model of Mari morphology. Mari is an agglutinative Finno-Ugric language spoken in Russia by about 600 000 people. The work was begun in the 1980s on the basis of K. Koskenniemi’s Two-Level Morphology (1983), but in the latest stage R. Beesley’s and L. Karttunen’s Finite State Morphology (2003) was used. Many of the problems described in the article concern the inexplicitness of the rules in Mari grammars and the lack of information about the exact distribution of some suffixes, e.g. enclitics. The Mari grammars usually give complete paradigms for a few unproblematic verb stems, whereas the difficult or unclear forms of certain verbs are only superficially discussed. Another example of phenomena that are poorly described in grammars is the way suffixes with an initial sibilant combine to stems ending in a sibilant. The help of informants and searches from electronic corpora were used to overcome such difficulties in the development of the two-level model of Mari. The variation of the order of plural markers, case suffixes and possessive suffixes is a typical feature of Mari. The morphotactic rules constructed for Mari declensional forms tend to be recursive and their productivity must be limited by some technical device, such as filters. In the present model, certain plural markers were treated like nouns. The positional and functional versatility of the possessive suffixes can be regarded as the most challenging phenomenon in attempts to formalize the Mari morphology. Cyrillic orthography, which was used in the model, also caused problems. For instance, a Cyrillic letter may represent a sequence of two sounds, the first being part of the word stem while the other belongs to a suffix. In some cases, letters for voiced consonants are also generalized to represent voiceless consonants. Such orthographical conventions distance a morphological model based on orthography from the actual (morpho)phonological processes in the language.
Resumo:
Purpose The purpose of the present study was to evaluate the retinal toxicity of a single dose of intravitreal docosahexaenoic acid (DHA) in rabbit eyes over a short-term period. Methods Sixteen New Zealand albino rabbits were selected for this pre-clinical study. Six concentrations of DHA (Brudy Laboratories, Barcelona, Spain) were prepared: 10 mg/50 µl, 5 mg/50 µl, 2'5 mg/50 µl, 50 µg/50 µl, 25 µg/50 µl, and 5 µg/50 µl. Each concentration was injected intravitreally in the right eye of two rabbits. As a control, the vehicle solution was injected in one eye of four animals. Retinal safety was studied by slit-lamp examination, and electroretinography. All the rabbits were euthanized one week after the intravitreal injection of DHA and the eyeballs were processed to morphologic and morphometric histological examination by light microscopy. At the same time aqueous and vitreous humor samples were taken to quantify the concentration of omega-3 acids by gas chromatography. Statistical analysis was performed by SPSS 21.0. Results Slit-lamp examination revealed an important inflammatory reaction on the anterior chamber of the rabbits injected with the higher concentrations of DHA (10 mg/50 µl, 5 mg/50 µl, 2'5 mg/50 µ) Lower concentrations showed no inflammation. Electroretinography and histological studies showed no significant difference between control and DHA-injected groups except for the group injected with 50 µg/50 µl. Conclusions Our results indicate that administration of intravitreal DHA is safe in the albino rabbit model up to the maximum tolerated dose of 25 µg/50 µl. Further studies should be performed in order to evaluate the effect of intravitreal injection of DHA as a treatment, alone or in combination, of different retinal diseases.
Resumo:
The identifiability of the parameters of a heat exchanger model without phase change was studied in this Master’s thesis using synthetically made data. A fast, two-step Markov chain Monte Carlo method (MCMC) was tested with a couple of case studies and a heat exchanger model. The two-step MCMC-method worked well and decreased the computation time compared to the traditional MCMC-method. The effect of measurement accuracy of certain control variables to the identifiability of parameters was also studied. The accuracy used did not seem to have a remarkable effect to the identifiability of parameters. The use of the posterior distribution of parameters in different heat exchanger geometries was studied. It would be computationally most efficient to use the same posterior distribution among different geometries in the optimisation of heat exchanger networks. According to the results, this was possible in the case when the frontal surface areas were the same among different geometries. In the other cases the same posterior distribution can be used for optimisation too, but that will give a wider predictive distribution as a result. For condensing surface heat exchangers the numerical stability of the simulation model was studied. As a result, a stable algorithm was developed.
Resumo:
The Fed model is a widely used market valuation model. It is often used only on market analysis of the S&P 500 index as a shorthand measure for the attractiveness of equity, and as a timing device for allocating funds between equity and bonds. The Fed model assumes a fixed relationship between bond yield and earnings yield. This relationship is often assumed to be true in market valuation. In this paper we test the Fed model from historical perspective on the European markets. The markets of the United States are also includedfor comparison. The purpose of the tests is to determine if the Fed model and the underlying assumptions come true on different markets. The various tests are made on time-series data ranging from the year 1973 to the end of the year 2008. The statistical methods used are regressions analysis, cointegration analysis and Granger causality. The empirical results do not give strong support for the Fed model. The underlying relationships assumed by the Fed model are statistically not valid in most of the markets examined and therefore the model is not valid in valuation purposes generally. The results vary between the different markets which gives reason to suspect the general use of the Fed model in different market conditions and in different markets.
Resumo:
A study about the spatial variability of data of soil resistance to penetration (RSP) was conducted at layers 0.0-0.1 m, 0.1-0.2 m and 0.2-0.3 m depth, using the statistical methods in univariate forms, i.e., using traditional geostatistics, forming thematic maps by ordinary kriging for each layer of the study. It was analyzed the RSP in layer 0.2-0.3 m depth through a spatial linear model (SLM), which considered the layers 0.0-0.1 m and 0.1-0.2 m in depth as covariable, obtaining an estimation model and a thematic map by universal kriging. The thematic maps of the RSP at layer 0.2-0.3 m depth, constructed by both methods, were compared using measures of accuracy obtained from the construction of the matrix of errors and confusion matrix. There are similarities between the thematic maps. All maps showed that the RSP is higher in the north region.
Resumo:
Denna avhandling tar sin utgångspunkt i ett ifrågasättande av effektiviteten i EU:s konditionalitetspolitik avseende minoritetsrättigheter. Baserat på den rationalistiska teoretiska modellen, External Incentives Model of Governance, syftar denna hypotesprövande avhandling till att förklara om tidsavståndet på det potentiella EU medlemskapet påverkar lagstiftningsnivån avseende minoritetsspråksrättigheter. Mätningen av nivån på lagstiftningen avseende minoritetsspråksrättigheter begränsas till att omfatta icke-diskriminering, användning av minoritetsspråk i officiella sammanhang samt minoriteters språkliga rättigheter i utbildningen. Metodologiskt används ett jämförande angreppssätt både avseende tidsramen för studien, som sträcker sig mellan 2003 och 2010, men även avseende urvalet av stater. På basis av det \"mest lika systemet\" kategoriseras staterna i tre grupper efter deras olika tidsavstånd från det potentiella EU medlemskapet. Hypotesen som prövas är följande: ju kortare tidsavstånd till det potentiella EU medlemskapet desto större sannolikhet att staternas lagstiftningsnivå inom de tre områden som studeras har utvecklats till en hög nivå. Studien visar att hypotesen endast bekräftas delvis. Resultaten avseende icke-diskriminering visar att sambandet mellan tidsavståndet och nivån på lagstiftningen har ökat markant under den undersökta tidsperioden. Detta samband har endast stärkts mellan kategorin av stater som ligger tidsmässigt längst bort ett potentiellt EU medlemskap och de två kategorier som ligger närmare respektive närmast ett potentiellt EU medlemskap. Resultaten avseende användning av minoritetsspråk i officiella sammanhang och minoriteters språkliga rättigheter i utbildningen visar inget respektive nästan inget samband mellan tidsavståndet och utvecklingen på lagstiftningen mellan 2003 och 2010.
Resumo:
Reliable predictions of remaining lives of civil or mechanical structures subjected to fatigue damage are very difficult to be made. In general, fatigue damage is extremely sensitive to the random variations of material mechanical properties, environment and loading. These variations may induce large dispersions when the structural fatigue life has to be predicted. Wirsching (1970) mentions dispersions of the order of 30 to 70 % of the mean calculated life. The presented paper introduces a model to estimate the fatigue damage dispersion based on known statistical distributions of the fatigue parameters (material properties and loading). The model is developed by expanding into Taylor series the set of equations that describe fatigue damage for crack initiation.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.
Resumo:
The capabilities and thus, design complexity of VLSI-based embedded systems have increased tremendously in recent years, riding the wave of Moore’s law. The time-to-market requirements are also shrinking, imposing challenges to the designers, which in turn, seek to adopt new design methods to increase their productivity. As an answer to these new pressures, modern day systems have moved towards on-chip multiprocessing technologies. New architectures have emerged in on-chip multiprocessing in order to utilize the tremendous advances of fabrication technology. Platform-based design is a possible solution in addressing these challenges. The principle behind the approach is to separate the functionality of an application from the organization and communication architecture of hardware platform at several levels of abstraction. The existing design methodologies pertaining to platform-based design approach don’t provide full automation at every level of the design processes, and sometimes, the co-design of platform-based systems lead to sub-optimal systems. In addition, the design productivity gap in multiprocessor systems remain a key challenge due to existing design methodologies. This thesis addresses the aforementioned challenges and discusses the creation of a development framework for a platform-based system design, in the context of the SegBus platform - a distributed communication architecture. This research aims to provide automated procedures for platform design and application mapping. Structural verification support is also featured thus ensuring correct-by-design platforms. The solution is based on a model-based process. Both the platform and the application are modeled using the Unified Modeling Language. This thesis develops a Domain Specific Language to support platform modeling based on a corresponding UML profile. Object Constraint Language constraints are used to support structurally correct platform construction. An emulator is thus introduced to allow as much as possible accurate performance estimation of the solution, at high abstraction levels. VHDL code is automatically generated, in the form of “snippets” to be employed in the arbiter modules of the platform, as required by the application. The resulting framework is applied in building an actual design solution for an MP3 stereo audio decoder application.
Resumo:
With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.
Resumo:
In this research, the effectiveness of Naive Bayes and Gaussian Mixture Models classifiers on segmenting exudates in retinal images is studied and the results are evaluated with metrics commonly used in medical imaging. Also, a color variation analysis of retinal images is carried out to find how effectively can retinal images be segmented using only the color information of the pixels.
Resumo:
This research concerns different statistical methods that assist to increase the demand forecasting accuracy of company X’s forecasting model. Current forecasting process was analyzed in details. As a result, graphical scheme of logical algorithm was developed. Based on the analysis of the algorithm and forecasting errors, all the potential directions for model future improvements in context of its accuracy were gathered into the complete list. Three improvement directions were chosen for further practical research, on their basis, three test models were created and verified. Novelty of this work lies in the methodological approach of the original analysis of the model, which identified its critical points, as well as the uniqueness of the developed test models. Results of the study formed the basis of the grant of the Government of St. Petersburg.
Resumo:
An interesting fact about language cognition is that stimulation involving incongruence in the merge operation between verb and complement has often been related to a negative event-related potential (ERP) of augmented amplitude and latency of ca. 400 ms - the N400. Using an automatic ERP latency and amplitude estimator to facilitate the recognition of waves with a low signal-to-noise ratio, the objective of the present study was to study the N400 statistically in 24 volunteers. Stimulation consisted of 80 experimental sentences (40 congruous and 40 incongruous), generated in Brazilian Portuguese, involving two distinct local verb-argument combinations (nominal object and pronominal object series). For each volunteer, the EEG was simultaneously acquired at 20 derivations, topographically localized according to the 10-20 International System. A computerized routine for automatic N400-peak marking (based on the ascendant zero-cross of the first waveform derivative) was applied to the estimated individual ERP waveform for congruous and incongruous sentences in both series for all ERP topographic derivations. Peak-to-peak N400 amplitude was significantly augmented (P < 0.05; one-sided Wilcoxon signed-rank test) due to incongruence in derivations F3, T3, C3, Cz, T5, P3, Pz, and P4 for nominal object series and in P3, Pz and P4 for pronominal object series. The results also indicated high inter-individual variability in ERP waveforms, suggesting that the usual procedure of grand averaging might not be considered a generally adequate approach. Hence, signal processing statistical techniques should be applied in neurolinguistic ERP studies allowing waveform analysis with low signal-to-noise ratio.
Resumo:
Software is a key component in many of our devices and products that we use every day. Most customers demand not only that their devices should function as expected but also that the software should be of high quality, reliable, fault tolerant, efficient, etc. In short, it is not enough that a calculator gives the correct result of a calculation, we want the result instantly, in the right form, with minimal use of battery, etc. One of the key aspects for succeeding in today's industry is delivering high quality. In most software development projects, high-quality software is achieved by rigorous testing and good quality assurance practices. However, today, customers are asking for these high quality software products at an ever-increasing pace. This leaves the companies with less time for development. Software testing is an expensive activity, because it requires much manual work. Testing, debugging, and verification are estimated to consume 50 to 75 per cent of the total development cost of complex software projects. Further, the most expensive software defects are those which have to be fixed after the product is released. One of the main challenges in software development is reducing the associated cost and time of software testing without sacrificing the quality of the developed software. It is often not enough to only demonstrate that a piece of software is functioning correctly. Usually, many other aspects of the software, such as performance, security, scalability, usability, etc., need also to be verified. Testing these aspects of the software is traditionally referred to as nonfunctional testing. One of the major challenges with non-functional testing is that it is usually carried out at the end of the software development process when most of the functionality is implemented. This is due to the fact that non-functional aspects, such as performance or security, apply to the software as a whole. In this thesis, we study the use of model-based testing. We present approaches to automatically generate tests from behavioral models for solving some of these challenges. We show that model-based testing is not only applicable to functional testing but also to non-functional testing. In its simplest form, performance testing is performed by executing multiple test sequences at once while observing the software in terms of responsiveness and stability, rather than the output. The main contribution of the thesis is a coherent model-based testing approach for testing functional and performance related issues in software systems. We show how we go from system models, expressed in the Unified Modeling Language, to test cases and back to models again. The system requirements are traced throughout the entire testing process. Requirements traceability facilitates finding faults in the design and implementation of the software. In the research field of model-based testing, many new proposed approaches suffer from poor or the lack of tool support. Therefore, the second contribution of this thesis is proper tool support for the proposed approach that is integrated with leading industry tools. We o er independent tools, tools that are integrated with other industry leading tools, and complete tool-chains when necessary. Many model-based testing approaches proposed by the research community suffer from poor empirical validation in an industrial context. In order to demonstrate the applicability of our proposed approach, we apply our research to several systems, including industrial ones.
Resumo:
Mass transfer kinetics in osmotic dehydration is usually modeled by Fick's law, empirical models and probabilistic models. The aim of this study was to determine the applicability of Peleg model to investigate the mass transfer during osmotic dehydration of mackerel (Scomber japonicus) slices at different temperatures. Osmotic dehydration was performed on mackerel slices by cooking-infusion in solutions with glycerol and salt (a w = 0.64) at different temperatures: 50, 70, and 90 ºC. Peleg rate constant (K1) (h(g/gdm)-1) varied with temperature variation from 0.761 to 0.396 for water loss, from 5.260 to 2.947 for salt gain, and from 0.854 to 0.566 for glycerol intake. In all cases, it followed the Arrhenius relationship (R²>0.86). The Ea (kJ / mol) values obtained were 16.14; 14.21, and 10.12 for water, salt, and glycerol, respectively. The statistical parameters that qualify the goodness of fit (R²>0.91 and RMSE<0.086) indicate promising applicability of Peleg model.