836 resultados para CATEGORICAL-DATA ANALYSIS
Resumo:
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.
It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.
The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.
Resumo:
This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.
The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.
The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.
This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.
Resumo:
Large numbers of fishing vessels operating from ports in Latin America participate in surface longline fisheries in the eastern Pacific Ocean (EPO), and several species of sea turtles inhabit the grounds where these fleets operate. The endangered status of several sea turtle species, and the success of circle hooks (‘treatment’ hooks) in reducing turtle hookings in other ocean areas, as compared to J-hooks and Japanese-style tuna hooks (‘control’ hooks), prompted the initiation of a hook exchange program on the west coast of Latin America, the Eastern Pacific Regional Sea Turtle Program (EPRSTP)1. One of the goals of the EPRSTP is to determine if circle hooks would be effective at reducing turtle bycatch in artisanal fisheries of the EPO without significantly reducing the catch of marketable fish species. Participating fishers were provided with circle hooks at no cost and asked to replace the J/Japanese-style tuna hooks on their longlines with circle hooks in an alternating manner. Data collected by the EPRSTP show differences in longline gear and operational characteristics within and among countries. These aspects of the data, in addition to difficulties encountered with implementation of the alternating-hook design, pose challenges for analysis of these data.
Resumo:
Background: Insects constitute the vast majority of known species with their importance including biodiversity, agricultural, and human health concerns. It is likely that the successful adaptation of the Insecta clade depends on specific components in its
Resumo:
This workshop followed on from two previous workshops held in Colombo, Sri Lanka, 2012 and Kochi, India in 2013. The 14 microsattellite markers had previously been developed for Indian Mackerel (Rastrelliger kanagurta) were used on 31 tissue collections from all eight countries were genotyped in India.
Resumo:
Vibration methods are used to identify faults, such as spanning and loss of cover, in long off-shore pipelines. A pipeline `pig', propelled by fluid flow, generates transverse vibration in the pipeline and the measured vibration amplitude reflects the nature of the support condition. Large quantities of vibration data are collected and analyzed by Fourier and wavelet methods.
Resumo:
Compared with construction data sources that are usually stored and analyzed in spreadsheets and single data tables, data sources with more complicated structures, such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our definition and vision for advanced data analysis addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data analysis on text-based, image-based, web-based, and network-based construction sources. It is shown in this paper that particular data preparation, representation, and analysis operations should be identified, and integrated with careful problem investigations and scientific validation measures in order to provide general frameworks in support of information search and knowledge discovery from such information-abundant data sources.