931 resultados para Topological data analysis


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel densityestimation techniques in the context of compositional data analysis. Indeed, they gavetwo options for the choice of the kernel to be used in the kernel estimator. One ofthese kernels is based on the use the alr transformation on the simplex SD jointly withthe normal distribution on RD-1. However, these authors themselves recognized thatthis method has some deficiencies. A method for overcoming these dificulties based onrecent developments for compositional data analysis and multivariate kernel estimationtheory, combining the ilr transformation with the use of the normal density with a fullbandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu-Figueras (2006). Here we present an extensive simulation study that compares bothmethods in practice, thus exploring the finite-sample behaviour of both estimators

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The quantitative estimation of Sea Surface Temperatures from fossils assemblages is afundamental issue in palaeoclimatic and paleooceanographic investigations. TheModern Analogue Technique, a widely adopted method based on direct comparison offossil assemblages with modern coretop samples, was revised with the aim ofconforming it to compositional data analysis. The new CODAMAT method wasdeveloped by adopting the Aitchison metric as distance measure. Modern coretopdatasets are characterised by a large amount of zeros. The zero replacement was carriedout by adopting a Bayesian approach to the zero replacement, based on a posteriorestimation of the parameter of the multinomial distribution. The number of modernanalogues from which reconstructing the SST was determined by means of a multipleapproach by considering the Proxies correlation matrix, Standardized Residual Sum ofSquares and Mean Squared Distance. This new CODAMAT method was applied to theplanktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea.Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix,Standardized Residual Sum of Squares

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Pounamu (NZ jade), or nephrite, is a protected mineral in its natural form following thetransfer of ownership back to Ngai Tahu under the Ngai Tahu (Pounamu Vesting) Act 1997.Any theft of nephrite is prosecutable under the Crimes Act 1961. Scientific evidence isessential in cases where origin is disputed. A robust method for discrimination of thismaterial through the use of elemental analysis and compositional data analysis is required.Initial studies have characterised the variability within a given nephrite source. This hasincluded investigation of both in situ outcrops and alluvial material. Methods for thediscrimination of two geographically close nephrite sources are being developed.Key Words: forensic, jade, nephrite, laser ablation, inductively coupled plasma massspectrometry, multivariate analysis, elemental analysis, compositional data analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Hydrogeological research usually includes some statistical studies devised to elucidate mean background state, characterise relationships among different hydrochemical parameters, and show the influence of human activities. These goals are achieved either by means of a statistical approach or by mixing modelsbetween end-members. Compositional data analysis has proved to be effective with the first approach, but there is no commonly accepted solution to the end-member problem in a compositional framework.We present here a possible solution based on factor analysis of compositions illustrated with a case study.We find two factors on the compositional bi-plot fitting two non-centered orthogonal axes to the most representative variables. Each one of these axes defines a subcomposition, grouping those variables thatlay nearest to it. With each subcomposition a log-contrast is computed and rewritten as an equilibrium equation. These two factors can be interpreted as the isometric log-ratio coordinates (ilr) of three hiddencomponents, that can be plotted in a ternary diagram. These hidden components might be interpreted as end-members.We have analysed 14 molarities in 31 sampling stations all along the Llobregat River and its tributaries, with a monthly measure during two years. We have obtained a bi-plot with a 57% of explained totalvariance, from which we have extracted two factors: factor G, reflecting geological background enhanced by potash mining; and factor A, essentially controlled by urban and/or farming wastewater. Graphicalrepresentation of these two factors allows us to identify three extreme samples, corresponding to pristine waters, potash mining influence and urban sewage influence. To confirm this, we have available analysisof diffused and widespread point sources identified in the area: springs, potash mining lixiviates, sewage, and fertilisers. Each one of these sources shows a clear link with one of the extreme samples, exceptfertilisers due to the heterogeneity of their composition.This approach is a useful tool to distinguish end-members, and characterise them, an issue generally difficult to solve. It is worth note that the end-member composition cannot be fully estimated but only characterised through log-ratio relationships among components. Moreover, the influence of each endmember in a given sample must be evaluated in relative terms of the other samples. These limitations areintrinsic to the relative nature of compositional data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

General Introduction This thesis can be divided into two main parts :the first one, corresponding to the first three chapters, studies Rules of Origin (RoOs) in Preferential Trade Agreements (PTAs); the second part -the fourth chapter- is concerned with Anti-Dumping (AD) measures. Despite wide-ranging preferential access granted to developing countries by industrial ones under North-South Trade Agreements -whether reciprocal, like the Europe Agreements (EAs) or NAFTA, or not, such as the GSP, AGOA, or EBA-, it has been claimed that the benefits from improved market access keep falling short of the full potential benefits. RoOs are largely regarded as a primary cause of the under-utilization of improved market access of PTAs. RoOs are the rules that determine the eligibility of goods to preferential treatment. Their economic justification is to prevent trade deflection, i.e. to prevent non-preferred exporters from using the tariff preferences. However, they are complex, cost raising and cumbersome, and can be manipulated by organised special interest groups. As a result, RoOs can restrain trade beyond what it is needed to prevent trade deflection and hence restrict market access in a statistically significant and quantitatively large proportion. Part l In order to further our understanding of the effects of RoOs in PTAs, the first chapter, written with Pr. Olivier Cadot, Celine Carrère and Pr. Jaime de Melo, describes and evaluates the RoOs governing EU and US PTAs. It draws on utilization-rate data for Mexican exports to the US in 2001 and on similar data for ACP exports to the EU in 2002. The paper makes two contributions. First, we construct an R-index of restrictiveness of RoOs along the lines first proposed by Estevadeordal (2000) for NAFTA, modifying it and extending it for the EU's single-list (SL). This synthetic R-index is then used to compare Roos under NAFTA and PANEURO. The two main findings of the chapter are as follows. First, it shows, in the case of PANEURO, that the R-index is useful to summarize how countries are differently affected by the same set of RoOs because of their different export baskets to the EU. Second, it is shown that the Rindex is a relatively reliable statistic in the sense that, subject to caveats, after controlling for the extent of tariff preference at the tariff-line level, it accounts for differences in utilization rates at the tariff line level. Finally, together with utilization rates, the index can be used to estimate total compliance costs of RoOs. The second chapter proposes a reform of preferential Roos with the aim of making them more transparent and less discriminatory. Such a reform would make preferential blocs more "cross-compatible" and would therefore facilitate cumulation. It would also contribute to move regionalism toward more openness and hence to make it more compatible with the multilateral trading system. It focuses on NAFTA, one of the most restrictive FTAs (see Estevadeordal and Suominen 2006), and proposes a way forward that is close in spirit to what the EU Commission is considering for the PANEURO system. In a nutshell, the idea is to replace the current array of RoOs by a single instrument- Maximum Foreign Content (MFC). An MFC is a conceptually clear and transparent instrument, like a tariff. Therefore changing all instruments into an MFC would bring improved transparency pretty much like the "tariffication" of NTBs. The methodology for this exercise is as follows: In step 1, I estimate the relationship between utilization rates, tariff preferences and RoOs. In step 2, I retrieve the estimates and invert the relationship to get a simulated MFC that gives, line by line, the same utilization rate as the old array of Roos. In step 3, I calculate the trade-weighted average of the simulated MFC across all lines to get an overall equivalent of the current system and explore the possibility of setting this unique instrument at a uniform rate across lines. This would have two advantages. First, like a uniform tariff, a uniform MFC would make it difficult for lobbies to manipulate the instrument at the margin. This argument is standard in the political-economy literature and has been used time and again in support of reductions in the variance of tariffs (together with standard welfare considerations). Second, uniformity across lines is the only way to eliminate the indirect source of discrimination alluded to earlier. Only if two countries face uniform RoOs and tariff preference will they face uniform incentives irrespective of their initial export structure. The result of this exercise is striking: the average simulated MFC is 25% of good value, a very low (i.e. restrictive) level, confirming Estevadeordal and Suominen's critical assessment of NAFTA's RoOs. Adopting a uniform MFC would imply a relaxation from the benchmark level for sectors like chemicals or textiles & apparel, and a stiffening for wood products, papers and base metals. Overall, however, the changes are not drastic, suggesting perhaps only moderate resistance to change from special interests. The third chapter of the thesis considers whether Europe Agreements of the EU, with the current sets of RoOs, could be the potential model for future EU-centered PTAs. First, I have studied and coded at the six-digit level of the Harmonised System (HS) .both the old RoOs -used before 1997- and the "Single list" Roos -used since 1997. Second, using a Constant Elasticity Transformation function where CEEC exporters smoothly mix sales between the EU and the rest of the world by comparing producer prices on each market, I have estimated the trade effects of the EU RoOs. The estimates suggest that much of the market access conferred by the EAs -outside sensitive sectors- was undone by the cost-raising effects of RoOs. The chapter also contains an analysis of the evolution of the CEECs' trade with the EU from post-communism to accession. Part II The last chapter of the thesis is concerned with anti-dumping, another trade-policy instrument having the effect of reducing market access. In 1995, the Uruguay Round introduced in the Anti-Dumping Agreement (ADA) a mandatory "sunset-review" clause (Article 11.3 ADA) under which anti-dumping measures should be reviewed no later than five years from their imposition and terminated unless there was a serious risk of resumption of injurious dumping. The last chapter, written with Pr. Olivier Cadot and Pr. Jaime de Melo, uses a new database on Anti-Dumping (AD) measures worldwide to assess whether the sunset-review agreement had any effect. The question we address is whether the WTO Agreement succeeded in imposing the discipline of a five-year cycle on AD measures and, ultimately, in curbing their length. Two methods are used; count data analysis and survival analysis. First, using Poisson and Negative Binomial regressions, the count of AD measures' revocations is regressed on (inter alia) the count of "initiations" lagged five years. The analysis yields a coefficient on measures' initiations lagged five years that is larger and more precisely estimated after the agreement than before, suggesting some effect. However the coefficient estimate is nowhere near the value that would give a one-for-one relationship between initiations and revocations after five years. We also find that (i) if the agreement affected EU AD practices, the effect went the wrong way, the five-year cycle being quantitatively weaker after the agreement than before; (ii) the agreement had no visible effect on the United States except for aone-time peak in 2000, suggesting a mopping-up of old cases. Second, the survival analysis of AD measures around the world suggests a shortening of their expected lifetime after the agreement, and this shortening effect (a downward shift in the survival function postagreement) was larger and more significant for measures targeted at WTO members than for those targeted at non-members (for which WTO disciplines do not bind), suggesting that compliance was de jure. A difference-in-differences Cox regression confirms this diagnosis: controlling for the countries imposing the measures, for the investigated countries and for the products' sector, we find a larger increase in the hazard rate of AD measures covered by the Agreement than for other measures.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Modern methods of compositional data analysis are not well known in biomedical research.Moreover, there appear to be few mathematical and statistical researchersworking on compositional biomedical problems. Like the earth and environmental sciences,biomedicine has many problems in which the relevant scienti c information isencoded in the relative abundance of key species or categories. I introduce three problemsin cancer research in which analysis of compositions plays an important role. Theproblems involve 1) the classi cation of serum proteomic pro les for early detection oflung cancer, 2) inference of the relative amounts of di erent tissue types in a diagnostictumor biopsy, and 3) the subcellular localization of the BRCA1 protein, and it'srole in breast cancer patient prognosis. For each of these problems I outline a partialsolution. However, none of these problems is \solved". I attempt to identify areas inwhich additional statistical development is needed with the hope of encouraging morecompositional data analysts to become involved in biomedical research

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The aim of this talk is to convince the reader that there are a lot of interesting statisticalproblems in presentday life science data analysis which seem ultimately connected withcompositional statistics.Key words: SAGE, cDNA microarrays, (1D-)NMR, virus quasispecies

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Until recently, the hard X-ray, phase-sensitive imaging technique called grating interferometry was thought to provide information only in real space. However, by utilizing an alternative approach to data analysis we demonstrated that the angular resolved ultra-small angle X-ray scattering distribution can be retrieved from experimental data. Thus, reciprocal space information is accessible by grating interferometry in addition to real space. Naturally, the quality of the retrieved data strongly depends on the performance of the employed analysis procedure, which involves deconvolution of periodic and noisy data in this context. The aim of this article is to compare several deconvolution algorithms to retrieve the ultra-small angle X-ray scattering distribution in grating interferometry. We quantitatively compare the performance of three deconvolution procedures (i.e., Wiener, iterative Wiener and Lucy-Richardson) in case of realistically modeled, noisy and periodic input data. The simulations showed that the algorithm of Lucy-Richardson is the more reliable and more efficient as a function of the characteristics of the signals in the given context. The availability of a reliable data analysis procedure is essential for future developments in grating interferometry.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The use of synthetic combinatorial peptide libraries in positional scanning format (PS-SCL) has emerged recently as an alternative approach for the identification of peptides recognized by T lymphocytes. The choice of both the PS-SCL used for screening experiments and the method used for data analysis are crucial for implementing this approach. With this aim, we tested the recognition of different PS-SCL by a tyrosinase 368-376-specific CTL clone and analyzed the data obtained with a recently developed biometric data analysis based on a model of independent and additive contribution of individual amino acids to peptide antigen recognition. Mixtures defined with amino acids present at the corresponding positions in the native sequence were among the most active for all of the libraries. Somewhat surprisingly, a higher number of native amino acids were identifiable by using amidated COOH-terminal rather than free COOH-terminal PS-SCL. Also, our data clearly indicate that when using PS-SCL longer than optimal, frame shifts occur frequently and should be taken into account. Biometric analysis of the data obtained with the amidated COOH-terminal nonapeptide library allowed the identification of the native ligand as the sequence with the highest score in a public human protein database. However, the adequacy of the PS-SCL data for the identification for the peptide ligand varied depending on the PS-SCL used. Altogether these results provide insight into the potential of PS-SCL for the identification of CTL-defined tumor-derived antigenic sequences and may significantly implement our ability to interpret the results of these analyses.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Quantitative information from magnetic resonance imaging (MRI) may substantiate clinical findings and provide additional insight into the mechanism of clinical interventions in therapeutic stroke trials. The PERFORM study is exploring the efficacy of terutroban versus aspirin for secondary prevention in patients with a history of ischemic stroke. We report on the design of an exploratory longitudinal MRI follow-up study that was performed in a subgroup of the PERFORM trial. An international multi-centre longitudinal follow-up MRI study was designed for different MR systems employing safety and efficacy readouts: new T2 lesions, new DWI lesions, whole brain volume change, hippocampal volume change, changes in tissue microstructure as depicted by mean diffusivity and fractional anisotropy, vessel patency on MR angiography, and the presence of and development of new microbleeds. A total of 1,056 patients (men and women ≥ 55 years) were included. The data analysis included 3D reformation, image registration of different contrasts, tissue segmentation, and automated lesion detection. This large international multi-centre study demonstrates how new MRI readouts can be used to provide key information on the evolution of cerebral tissue lesions and within the macrovasculature after atherothrombotic stroke in a large sample of patients.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The present study proposes a modification in one of the most frequently applied effect size procedures in single-case data analysis the percent of nonoverlapping data. In contrast to other techniques, the calculus and interpretation of this procedure is straightforward and it can be easily complemented by visual inspection of the graphed data. Although the percent of nonoverlapping data has been found to perform reasonably well in N = 1 data, the magnitude of effect estimates it yields can be distorted by trend and autocorrelation. Therefore, the data correction procedure focuses on removing the baseline trend from data prior to estimating the change produced in the behavior due to intervention. A simulation study is carried out in order to compare the original and the modified procedures in several experimental conditions. The results suggest that the new proposal is unaffected by trend and autocorrelation and can be used in case of unstable baselines and sequentially related measurements.