937 resultados para Algoritmic pairs trading, statistical arbitrage, Kalman filter, mean reversion.
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
In this paper, we study the relationship between the failure rate and the mean residual life of doubly truncated random variables. Accordingly, we develop characterizations for exponential, Pareto 11 and beta distributions. Further, we generalize the identities for fire Pearson and the exponential family of distributions given respectively in Nair and Sankaran (1991) and Consul (1995). Applications of these measures in file context of lengthbiased models are also explored
Resumo:
Futures trading in Commodities has three specific economic functions viz. price discovery, hedging and reduction in volatility. Natural rubber possesses all the specifications required for futures trading. Commodity futures trading in India attained momentum after the starting of national level commodity exchanges in 2003. The success of futures trading depends upon effective price risk management, price discovery and reduced volatility which in turn depends upon the volume of trading. In the case of rubber futures market, the volume of trading depends upon the extent of participation by market players like growers, dealers, manufacturers, rubber marketing co-operative societies and Rubber Producers Societies (RPS). The extent of participation by market players has a direct bearing on their awareness level and their perception about futures trading. In the light of the above facts and the review of literature available on rubber futures market, it is felt that a study on rubber futures market is necessary to fill the research gap, with specific focus on (1) the awareness and perception of rubber futures market participants viz. (i) rubber growers, (ii) dealers, (iii) rubber product manufacturers, (iv) rubber marketing co-operative societies and Rubber Producers Societies (RPS) about futures trading and (2) whether the rubber futures market is fulfilling the economic functions of futures market viz. hedging, reduction in volatility and price discovery or not. The study is confined to growers, dealers, rubber goods manufacturers, rubber marketing co-operative societies and RPS in Kerala. In order to achieve the stated objectives, the study utilized secondary data for the period from 2003 to 2013 from different published sources like bulletins, newsletters, circulars from NMCE, Reserve Bank of India (RBI), Warehousing Corporation and traders. The primary data required for this study were collected from rubber growers, rubber dealers, RPS & Rubber Marketing Co-operative Societies and rubber goods manufacturers in Kerala. Data pertaining to the awareness and perception of futures trading, participation in the futures trading, use of spot and futures prices and source of price information by dealers, farmers, manufacturers and cooperative societies also were collected. Statistical tools used for analysis include percentage, standard deviation, Chi-square test, Mann Whitney U test, Kruskal Wallis test, Augmented Dickey Fuller test statistic, t- statistic, Granger causality test, F- statistic, Johansen co integration test, Trace statistic and Max Eigen statistic. The study found that 71.5 per cent of the total hedges are effective and 28.5 per cent are ineffective for the period under study. It implies that futures market in rubber reduced the impact of price risks by approximately 71.5 per cent. Further, it is observed that, on 54.4 per cent occasions, the futures market exercised a stabilizing effect on the spot market, and on 45.6 per cent occasions futures trading exercised a destabilizing effect on the spot market. It implies that elasticity of expectation of futures market in rubber has a predominant stabilizing effect on spot prices. The market, as a whole, exhibits a bias in favour of long hedges. Spot price volatility of rubber during futures suspension period is more than that of the pre suspension period and post suspension period. There is a bi-directional association-ship or bi-directional causality or pair- wise causality between spot price and futures price of rubber. From the results of the hedging efficiency, spot price volatility, and price discovery, it can be concluded that rubber futures market fulfils all the economic functions expected from a commodity futures market. Thus in India, the future of rubber futures is Bright!!!
Resumo:
Auf dem Gebiet der Strukturdynamik sind computergesttzte Modellvalidierungstechniken inzwischen weit verbreitet. Dabei werden experimentelle Modaldaten, um ein numerisches Modell fr weitere Analysen zu korrigieren. Gleichwohl reprsentiert das validierte Modell nur das dynamische Verhalten der getesteten Struktur. In der Realitt gibt es wiederum viele Faktoren, die zwangslufig zu variierenden Ergebnissen von Modaltests fhren werden: Sich verndernde Umgebungsbedingungen whrend eines Tests, leicht unterschiedliche Testaufbauten, ein Test an einer nominell gleichen aber anderen Struktur (z.B. aus der Serienfertigung), etc. Damit eine stochastische Simulation durchgefhrt werden kann, muss eine Reihe von Annahmen fr die verwendeten Zufallsvariablengetroffen werden. Folglich bedarf es einer inversen Methode, die es ermglicht ein stochastisches Modell aus experimentellen Modaldaten zu identifizieren. Die Arbeit beschreibt die Entwicklung eines parameter-basierten Ansatzes, um stochastische Simulationsmodelle auf dem Gebiet der Strukturdynamik zu identifizieren. Die entwickelte Methode beruht auf Sensitivitten erster Ordnung, mit denen Parametermittelwerte und Kovarianzen des numerischen Modells aus stochastischen experimentellen Modaldaten bestimmt werden knnen.
Resumo:
Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
Resumo:
The preceding two editions of CoDaWork included talks on the possible consideration of densities as infinite compositions: Egozcue and Daz-Barrero (2003) extended the Euclidean structure of the simplex to a Hilbert space structure of the set of densities within a bounded interval, and van den Boogaart (2005) generalized this to the set of densities bounded by an arbitrary reference density. From the many variations of the Hilbert structures available, we work with three cases. For bounded variables, a basis derived from Legendre polynomials is used. For variables with a lower bound, we standardize them with respect to an exponential distribution and express their densities as coordinates in a basis derived from Laguerre polynomials. Finally, for unbounded variables, a normal distribution is used as reference, and coordinates are obtained with respect to a Hermite-polynomials-based basis. To get the coordinates, several approaches can be considered. A numerical accuracy problem occurs if one estimates the coordinates directly by using discretized scalar products. Thus we propose to use a weighted linear regression approach, where all k- order polynomials are used as predictand variables and weights are proportional to the reference density. Finally, for the case of 2-order Hermite polinomials (normal reference) and 1-order Laguerre polinomials (exponential), one can also derive the coordinates from their relationships to the classical mean and variance. Apart of these theoretical issues, this contribution focuses on the application of this theory to two main problems in sedimentary geology: the comparison of several grain size distributions, and the comparison among different rocks of the empirical distribution of a property measured on a batch of individual grains from the same rock or sediment, like their composition
Resumo:
Large scale image mosaicing methods are in great demand among scientists who study dierent aspects of the seabed, and have been fostered by impressive advances in the capabilities of underwater robots in gathering optical data from the seaoor. Cost and weight constraints mean that lowcost Remotely operated vehicles (ROVs) usually have a very limited number of sensors. When a low-cost robot carries out a seafloor survey using a down-looking camera, it usually follows a predetermined trajectory that provides several non time-consecutive overlapping image pairs. Finding these pairs (a process known as topology estimation) is indispensable to obtaining globally consistent mosaics and accurate trajectory estimates, which are necessary for a global view of the surveyed area, especially when optical sensors are the only data source. This thesis presents a set of consistent methods aimed at creating large area image mosaics from optical data obtained during surveys with low-cost underwater vehicles. First, a global alignment method developed within a Feature-based image mosaicing (FIM) framework, where nonlinear minimisation is substituted by two linear steps, is discussed. Then, a simple four-point mosaic rectifying method is proposed to reduce distortions that might occur due to lens distortions, error accumulation and the diculties of optical imaging in an underwater medium. The topology estimation problem is addressed by means of an augmented state and extended Kalman lter combined framework, aimed at minimising the total number of matching attempts and simultaneously obtaining the best possible trajectory. Potential image pairs are predicted by taking into account the uncertainty in the trajectory. The contribution of matching an image pair is investigated using information theory principles. Lastly, a dierent solution to the topology estimation problem is proposed in a bundle adjustment framework. Innovative aspects include the use of fast image similarity criterion combined with a Minimum spanning tree (MST) solution, to obtain a tentative topology. This topology is improved by attempting image matching with the pairs for which there is the most overlap evidence. Unlike previous approaches for large-area mosaicing, our framework is able to deal naturally with cases where time-consecutive images cannot be matched successfully, such as completely unordered sets. Finally, the eciency of the proposed methods is discussed and a comparison made with other state-of-the-art approaches, using a series of challenging datasets in underwater scenarios
Resumo:
The behavior of the Asian summer monsoon is documented and compared using the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis (ERA) and the National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) Reanalysis. In terms of seasonal mean climatologies the results suggest that, in several respects, the ERA is superior to the NCEP-NCAR Reanalysis. The overall better simulation of the precipitation and hence the diabatic heating field over the monsoon domain in ERA means that the analyzed circulation is probably nearer reality. In terms of interannual variability, inconsistencies in the definition of weak and strong monsoon years based on typical monsoon indices such as All-India Rainfall (AIR) anomalies and the large-scale wind shear based dynamical monsoon index (DMI) still exist. Two dominant modes of interannual variability have been identified that together explain nearly 50% of the variance. Individually, they have many features in common with the composite flow patterns associated with weak and strong monsoons, when defined in terms of regional AIR anomalies and the large-scale DMI. The reanalyses also show a common dominant mode of intraseasonal variability that describes the latitudinal displacement of the tropical convergence zone from its oceanic-to-continental regime and essentially captures the low-frequency active/break cycles of the monsoon. The relationship between interannual and intraseasonal variability has been investigated by considering the probability density function (PDF) of the principal component of the dominant intraseasonal mode. Based on the DMI, there is an indication that in years with a weaker monsoon circulation, the PDF is skewed toward negative values (i,e., break conditions). Similarly, the PDFs for El Nino and La Nina years suggest that El Nino predisposes the system to more break spells, although the sample size may limit the statistical significance of the results.
Resumo:
The RobertAsselin time filter is widely used in numerical models of weather and climate. It successfully suppresses the spurious computational mode associated with the leapfrog time-stepping scheme. Unfortunately, it also weakly suppresses the physical mode and severely degrades the numerical accuracy. These two concomitant problems are shown to occur because the filter does not conserve the mean state, averaged over the three time slices on which it operates. The author proposes a simple modification to the RobertAsselin filter, which does conserve the three-time-level mean state. When used in conjunction with the leapfrog scheme, the modification vastly reduces the impacts on the physical mode and increases the numerical accuracy for amplitude errors by two orders, yielding third-order accuracy. The modified filter could easily be incorporated into existing general circulation models of the atmosphere and ocean. In principle, it should deliver more faithful simulations at almost no additional computational expense. Alternatively, it may permit the use of longer time steps with no loss of accuracy, reducing the computational expense of a given simulation.
Resumo:
The Representative Soil Sampling Scheme of England and Wales has recorded information on the soil of agricultural land in England and Wales since 1969. It is a valuable source of information about the soil in the context of monitoring for sustainable agricultural development. Changes in soil nutrient status and pH were examined over the period 1971-2001. Several methods of statistical analysis were applied to data from the surveys during this period. The main focus here is on the data for 1971, 1981, 1991 and 2001. The results of examining change over time in general show that levels of potassium in the soil have increased, those of magnesium have remained fairly constant, those of phosphorus have declined and pH has changed little. Future sampling needs have been assessed in the context of monitoring, to determine the mean at a given level of confidence and tolerable error and to detect change in the mean over time at these same levels over periods of 5 and 10 years. The results of a non-hierarchical multivariate classification suggest that England and Wales could be stratified to optimize future sampling and analysis. To monitor soil quality and health more generally than for agriculture, more of the country should be sampled and a wider range of properties recorded.
Resumo:
A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.
Resumo:
An extensive statistical downscaling study is done to relate large-scale climate information from a general circulation model (GCM) to local-scale river flows in SW France for 51 gauging stations ranging from nival (snow-dominated) to pluvial (rainfall-dominated) river-systems. This study helps to select the appropriate statistical method at a given spatial and temporal scale to downscale hydrology for future climate change impact assessment of hydrological resources. The four proposed statistical downscaling models use large-scale predictors (derived from climate model outputs or reanalysis data) that characterize precipitation and evaporation processes in the hydrological cycle to estimate summary flow statistics. The four statistical models used are generalized linear (GLM) and additive (GAM) models, aggregated boosted trees (ABT) and multi-layer perceptron neural networks (ANN). These four models were each applied at two different spatial scales, namely at that of a single flow-gauging station (local downscaling) and that of a group of flow-gauging stations having the same hydrological behaviour (regional downscaling). For each statistical model and each spatial resolution, three temporal resolutions were considered, namely the daily mean flows, the summary statistics of fortnightly flows and a daily integrated approach. The results show that flow sensitivity to atmospheric factors is significantly different between nival and pluvial hydrological systems which are mainly influenced, respectively, by shortwave solar radiations and atmospheric temperature. The non-linear models (i.e. GAM, ABT and ANN) performed better than the linear GLM when simulating fortnightly flow percentiles. The aggregated boosted trees method showed higher and less variable R2 values to downscale the hydrological variability in both nival and pluvial regimes. Based on GCM cnrm-cm3 and scenarios A2 and A1B, future relative changes of fortnightly median flows were projected based on the regional downscaling approach. The results suggest a global decrease of flow in both pluvial and nival regimes, especially in spring, summer and autumn, whatever the considered scenario. The discussion considers the performance of each statistical method for downscaling flow at different spatial and temporal scales as well as the relationship between atmospheric processes and flow variability.
Resumo:
Background: We report an analysis of a protein network of functionally linked proteins, identified from a phylogenetic statistical analysis of complete eukaryotic genomes. Phylogenetic methods identify pairs of proteins that co-evolve on a phylogenetic tree, and have been shown to have a high probability of correctly identifying known functional links. Results: The eukaryotic correlated evolution network we derive displays the familiar power law scaling of connectivity. We introduce the use of explicit phylogenetic methods to reconstruct the ancestral presence or absence of proteins at the interior nodes of a phylogeny of eukaryote species. We find that the connectivity distribution of proteins at the point they arise on the tree and join the network follows a power law, as does the connectivity distribution of proteins at the time they are lost from the network. Proteins resident in the network acquire connections over time, but we find no evidence that 'preferential attachment' - the phenomenon of newly acquired connections in the network being more likely to be made to proteins with large numbers of connections - influences the network structure. We derive a 'variable rate of attachment' model in which proteins vary in their propensity to form network interactions independently of how many connections they have or of the total number of connections in the network, and show how this model can produce apparent power-law scaling without preferential attachment. Conclusion: A few simple rules can explain the topological structure and evolutionary changes to protein-interaction networks: most change is concentrated in satellite proteins of low connectivity and small phenotypic effect, and proteins differ in their propensity to form attachments. Given these rules of assembly, power law scaled networks naturally emerge from simple principles of selection, yielding protein interaction networks that retain a high-degree of robustness on short time scales and evolvability on longer evolutionary time scales.
Resumo:
A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.
Resumo:
An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35% on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene's presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks.