835 resultados para Cluster Analysis. Information Theory. Entropy. Cross Information Potential. Complex Data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently, one of the biggest challenges for the field of data mining is to perform cluster analysis on complex data. Several techniques have been proposed but, in general, they can only achieve good results within specific areas providing no consensus of what would be the best way to group this kind of data. In general, these techniques fail due to non-realistic assumptions about the true probability distribution of the data. Based on this, this thesis proposes a new measure based on Cross Information Potential that uses representative points of the dataset and statistics extracted directly from data to measure the interaction between groups. The proposed approach allows us to use all advantages of this information-theoretic descriptor and solves the limitations imposed on it by its own nature. From this, two cost functions and three algorithms have been proposed to perform cluster analysis. As the use of Information Theory captures the relationship between different patterns, regardless of assumptions about the nature of this relationship, the proposed approach was able to achieve a better performance than the main algorithms in literature. These results apply to the context of synthetic data designed to test the algorithms in specific situations and to real data extracted from problems of different fields

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present paper proposes a technical analysis method for extracting information about movement patterning in studies of motor control, based on a cluster analysis of movement kinematics. In a tutorial fashion, data from three different experiments are presented to exemplify and validate the technical method. When applied to three different basketball-shooting techniques, the method clearly distinguished between the different patterns. When applied to a cyclical wrist supination-pronation task, the cluster analysis provided the same results as an analysis using the conventional discrete relative phase measure. Finally, when analyzing throwing performance constrained by distance to target, the method grouped movement patterns together according to throwing distance. In conclusion, the proposed technical method provides a valuable tool to improve understanding of coordination and control in different movement models, including multiarticular actions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper develops a general theory of land inheritance rules. We distinguish between two classes of rules: those that allow a testator discretion in disposing of his land (like a best-qualified rule), and those that constrain his choice (like primogeniture). The primary benefit of the latter is to prevent rent seeking by heirs, but the cost is that testators cannot make use of information about the relative abilities of his heirs to manage the land. We also account for the impact of scale economies in land use. We conclude by offering some empirical tests of the model using a cross-cultural sample of societies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes an innovative platform that facilitates the collection of objective safety data around occurrences at railway level crossings using data sources including forward-facing video, telemetry from trains and geo-referenced asset and survey data. This platform is being developed with support by the Australian rail industry and the Cooperative Research Centre for Rail Innovation. The paper provides a description of the underlying accident causation model, the development methodology and refinement process as well as a description of the data collection platform. The paper concludes with a brief discussion of benefits this project is expected to provide the Australian rail industry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developing innovative library services requires a real world understanding of faculty members' desired curricular goals. This study aimed to develop a comprehensive and deeper understanding of Purdue's nutrition science and political science faculties' expectations for student learning related to information and data information literacies. Course syllabi were examined using grounded theory techniques that allowed us to identify how faculty were addressing information and data information literacies in their courses, but it also enabled us to understand the interconnectedness of these literacies to other departmental intentions for student learning, such as developing a professional identity or learning to conduct original research. The holistic understanding developed through this research provides the necessary information for designing and suggesting information literacy and data information literacy services to departmental faculty in ways supportive of curricular learning outcomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identification of homogeneous hydrometeorological regions (HMRs) is necessary for various applications. Such regions are delineated by various approaches considering rainfall and temperature as two key variables. In conventional approaches, formation of regions is based on principal components (PCs)/statistics/indices determined from time series of the key variables at monthly and seasonal scales. An issue with use of PCs for regionalization is that they have to be extracted from contemporaneous records of hydrometeorological variables. Therefore, delineated regions may not be effective when the available records are limited over contemporaneous time period. A drawback associated with the use of statistics/indices is that they do not provide effective representation of the key variables when the records exhibit non-stationarity. Consequently, the resulting regions may not be effective for the desired purpose. To address these issues, a new approach is proposed in this article. The approach considers information extracted from wavelet transformations of the observed multivariate hydrometeorological time series as the basis for regionalization by global fuzzy c-means clustering procedure. The approach can account for dynamic variability in the time series and its non-stationarity (if any). Effectiveness of the proposed approach in forming HMRs is demonstrated by application to India, as there are no prior attempts to form such regions over the country. Drought severity-area-frequency (SAF) curves are constructed corresponding to each of the newly formed regions for the use in regional drought analysis, by considering standardized precipitation evapotranspiration index (SPEI) as the drought indicator.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of codes, classically motivated by the need to communicate information reliably in the presence of error, has found new life in fields as diverse as network communication, distributed storage of data, and even has connections to the design of linear measurements used in compressive sensing. But in all contexts, a code typically involves exploiting the algebraic or geometric structure underlying an application. In this thesis, we examine several problems in coding theory, and try to gain some insight into the algebraic structure behind them.

The first is the study of the entropy region - the space of all possible vectors of joint entropies which can arise from a set of discrete random variables. Understanding this region is essentially the key to optimizing network codes for a given network. To this end, we employ a group-theoretic method of constructing random variables producing so-called "group-characterizable" entropy vectors, which are capable of approximating any point in the entropy region. We show how small groups can be used to produce entropy vectors which violate the Ingleton inequality, a fundamental bound on entropy vectors arising from the random variables involved in linear network codes. We discuss the suitability of these groups to design codes for networks which could potentially outperform linear coding.

The second topic we discuss is the design of frames with low coherence, closely related to finding spherical codes in which the codewords are unit vectors spaced out around the unit sphere so as to minimize the magnitudes of their mutual inner products. We show how to build frames by selecting a cleverly chosen set of representations of a finite group to produce a "group code" as described by Slepian decades ago. We go on to reinterpret our method as selecting a subset of rows of a group Fourier matrix, allowing us to study and bound our frames' coherences using character theory. We discuss the usefulness of our frames in sparse signal recovery using linear measurements.

The final problem we investigate is that of coding with constraints, most recently motivated by the demand for ways to encode large amounts of data using error-correcting codes so that any small loss can be recovered from a small set of surviving data. Most often, this involves using a systematic linear error-correcting code in which each parity symbol is constrained to be a function of some subset of the message symbols. We derive bounds on the minimum distance of such a code based on its constraints, and characterize when these bounds can be achieved using subcodes of Reed-Solomon codes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As we enter an era of ‘big data’, asset information is becoming a deliverable of complex projects. Prior research suggests digital technologies enable rapid, flexible forms of project organizing. This research analyses practices of managing change in Airbus, CERN and Crossrail, through desk-based review, interviews, visits and a cross-case workshop. These organizations deliver complex projects, rely on digital technologies to manage large data-sets; and use configuration management, a systems engineering approach with mid-20th century origins, to establish and maintain integrity. In them, configuration management has become more, rather than less, important. Asset information is structured, with change managed through digital systems, using relatively hierarchical, asynchronous and sequential processes. The paper contributes by uncovering limits to flexibility in complex projects where integrity is important. Challenges of managing change are discussed, considering the evolving nature of configuration management; potential use of analytics on complex projects; and implications for research and practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maine has the highest potential for wind energy in New England and falls within the top twenty states in the nation. It falls just behind Wisconsin and California with an estimate electrical output of 56 billion kWhs. The geological makeup of Maine’s mountains in the western part of the state, and the exposed coastline provide opportune areas to capture wind and convert it into energy. The information included in this poster will suggest the most likely areas for wind development based on a number of factors as recommended by the American Wind Energy Association.