101 resultados para data hiding
Resumo:
Simpson's paradox, also known as amalgamation or aggregation paradox, appears whendealing with proportions. Proportions are by construction parts of a whole, which canbe interpreted as compositions assuming they only carry relative information. TheAitchison inner product space structure of the simplex, the sample space of compositions, explains the appearance of the paradox, given that amalgamation is a nonlinearoperation within that structure. Here we propose to use balances, which are specificelements of this structure, to analyse situations where the paradox might appear. Withthe proposed approach we obtain that the centre of the tables analysed is a naturalway to compare them, which avoids by construction the possibility of a paradox.Key words: Aitchison geometry, geometric mean, orthogonal projection
Resumo:
Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process
Resumo:
Ireland’s waters constitute one of the richest habitats for cetaceans in Europe. Marine mammals, particularly cetaceans, are known to be definitive hosts of digestive parasites from the Fm.Anisakidae. The main aim of this study is to collect and compile all the information available out there regarding parasites of the Fm. Anisakidae and their definitive hosts. Secondary objectives are to relate the presence of cetacean species with the presence of parasites of the Fm. Anisakidae and to determine whether this greater number of cetaceans relates to a greater level of parasitism. Prevalence and burdens of anisakids in definitive hosts vary widely with host species, geographic location, and season. Results from several post-mortem exams are given. However, they cannot be compared due to differences in collecting techniques. Anisakis simplex is the most commonly and widespread parasite found in the majority of the samples and in a majornumber of hosts, which include harbour porpoise, short-beaked common dolphin and bottlenose dolphin. Studies on harbour porpoise obtained prevalences of Anisakis spp. of 46% (n=26) and of 100% (n= 12). Another study in common dolphin reported a prevalence of 68% (n=25). Several reasons could influence the variations in the presence of Anisakis. Studies on commerciallyexploited fish have reported prevalences of Anisakis simplex ranging from 65-100% in wildAtlantic salmon and from 42-53.4% in Atlantic cod
Resumo:
La infraestructura europea ICOS (Integrated Carbon Observation System), tiene como misión proveer de mediciones de gases de efecto invernadero a largo plazo, lo que ha de permitir estudiar el estado actual y comportamiento futuro del ciclo global del carbono. En este contexto, geomati.co ha desarrollado un portal de búsqueda y descarga de datos que integra las mediciones realizadas en los ámbitos terrestre, marítimo y atmosférico, disciplinas que hasta ahora habían gestionado los datos de forma separada. El portal permite hacer búsquedas por múltiples ámbitos geográficos, por rango temporal, por texto libre o por un subconjunto de magnitudes, realizar vistas previas de los datos, y añadir los conjuntos de datos que se crean interesantes a un “carrito” de descargas. En el momento de realizar la descarga de una colección de datos, se le asignará un identificador universal que permitirá referenciarla en eventuales publicaciones, y repetir su descarga en el futuro (de modo que los experimentos publicados sean reproducibles). El portal se apoya en formatos abiertos de uso común en la comunidad científica, como el formato NetCDF para los datos, y en el perfil ISO de CSW, estándar de catalogación y búsqueda propio del ámbito geoespacial. El portal se ha desarrollado partiendo de componentes de software libre existentes, como Thredds Data Server, GeoNetwork Open Source y GeoExt, y su código y documentación quedarán publicados bajo una licencia libre para hacer posible su reutilización en otros proyecto
Resumo:
Nowadays, there are several services and applications that allow users to locate and move to different tourist areas using a mobile device. These systems can be used either by internet or downloading an application in concrete places like a visitors centre. Although such applications are able to facilitate the location and the search for points of interest, in most cases, these services and applications do not meet the needs of each user. This paper aims to provide a solution by studying the main projects, services and applications, their routing algorithms and their treatment of the real geographical data in Android mobile devices, focusing on the data acquisition and treatment to improve the routing searches in off-line environments.
Resumo:
In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced
Resumo:
The main objective of this paper aims at developing a methodology that takes into account the human factor extracted from the data base used by the recommender systems, and which allow to resolve the specific problems of prediction and recommendation. In this work, we propose to extract the user's human values scale from the data base of the users, to improve their suitability in open environments, such as the recommender systems. For this purpose, the methodology is applied with the data of the user after interacting with the system. The methodology is exemplified with a case study
Resumo:
In the eighties, John Aitchison (1986) developed a new methodological approach for the statistical analysis of compositional data. This new methodology was implemented in Basic routines grouped under the name CODA and later NEWCODA inMatlab (Aitchison, 1997). After that, several other authors have published extensions to this methodology: Marín-Fernández and others (2000), Barceló-Vidal and others (2001), Pawlowsky-Glahn and Egozcue (2001, 2002) and Egozcue and others (2003). (...)
Resumo:
The log-ratio methodology makes available powerful tools for analyzing compositionaldata. Nevertheless, the use of this methodology is only possible for those data setswithout null values. Consequently, in those data sets where the zeros are present, aprevious treatment becomes necessary. Last advances in the treatment of compositionalzeros have been centered especially in the zeros of structural nature and in the roundedzeros. These tools do not contemplate the particular case of count compositional datasets with null values. In this work we deal with \count zeros" and we introduce atreatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichletprobability distribution as a prior and we estimate the posterior probabilities. Then weapply a multiplicative modi¯cation for the non-zero values. We present a case studywhere this new methodology is applied.Key words: count data, multiplicative replacement, composition, log-ratio analysis
Resumo:
In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel densityestimation techniques in the context of compositional data analysis. Indeed, they gavetwo options for the choice of the kernel to be used in the kernel estimator. One ofthese kernels is based on the use the alr transformation on the simplex SD jointly withthe normal distribution on RD-1. However, these authors themselves recognized thatthis method has some deficiencies. A method for overcoming these dificulties based onrecent developments for compositional data analysis and multivariate kernel estimationtheory, combining the ilr transformation with the use of the normal density with a fullbandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu-Figueras (2006). Here we present an extensive simulation study that compares bothmethods in practice, thus exploring the finite-sample behaviour of both estimators
Resumo:
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is afundamental issue in palaeoclimatic and paleooceanographic investigations. TheModern Analogue Technique, a widely adopted method based on direct comparison offossil assemblages with modern coretop samples, was revised with the aim ofconforming it to compositional data analysis. The new CODAMAT method wasdeveloped by adopting the Aitchison metric as distance measure. Modern coretopdatasets are characterised by a large amount of zeros. The zero replacement was carriedout by adopting a Bayesian approach to the zero replacement, based on a posteriorestimation of the parameter of the multinomial distribution. The number of modernanalogues from which reconstructing the SST was determined by means of a multipleapproach by considering the Proxies correlation matrix, Standardized Residual Sum ofSquares and Mean Squared Distance. This new CODAMAT method was applied to theplanktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea.Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix,Standardized Residual Sum of Squares
Resumo:
In the B-ISDN there is a provision for four classes of services, all of them supported by a single transport network (the ATM network). Three of these services, the connected oriented (CO) ones, permit connection access control (CAC) but the fourth, the connectionless oriented (CLO) one, does not. Therefore, when CLO service and CO services have to share the same ATM link, a conflict may arise. This is because a bandwidth allocation to obtain maximum statistical gain can damage the contracted ATM quality of service (QOS); and vice versa, in order to guarantee the contracted QOS, the statistical gain have to be sacrificed. The paper presents a performance evaluation study of the influence of the CLO service on a CO service (a circuit emulation service or a variable bit-rate service) when sharing the same link
Resumo:
Geochemical data that is derived from the whole or partial analysis of various geologic materialsrepresent a composition of mineralogies or solute species. Minerals are composed of structuredrelationships between cations and anions which, through atomic and molecular forces, keep the elementsbound in specific configurations. The chemical compositions of minerals have specific relationships thatare governed by these molecular controls. In the case of olivine, there is a well-defined relationshipbetween Mn-Fe-Mg with Si. Balances between the principal elements defining olivine composition andother significant constituents in the composition (Al, Ti) have been defined, resulting in a near-linearrelationship between the logarithmic relative proportion of Si versus (MgMnFe) and Mg versus (MnFe),which is typically described but poorly illustrated in the simplex.The present contribution corresponds to ongoing research, which attempts to relate stoichiometry andgeochemical data using compositional geometry. We describe here the approach by which stoichiometricrelationships based on mineralogical constraints can be accounted for in the space of simplicialcoordinates using olivines as an example. Further examples for other mineral types (plagioclases andmore complex minerals such as clays) are needed. Issues that remain to be dealt with include thereduction of a bulk chemical composition of a rock comprised of several minerals from which appropriatebalances can be used to describe the composition in a realistic mineralogical framework. The overallobjective of our research is to answer the question: In the cases where the mineralogy is unknown, arethere suitable proxies that can be substituted?Kew words: Aitchison geometry, balances, mineral composition, oxides
Resumo:
Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices