991 resultados para Multiple datasets
Resumo:
BACKGROUND Multiple sclerosis (MS) is a neurodegenerative, autoimmune disease of the central nervous system. Genome-wide association studies (GWAS) have identified over hundred polymorphisms with modest individual effects in MS susceptibility and they have confirmed the main individual effect of the Major Histocompatibility Complex. Additional risk loci with immunologically relevant genes were found significantly overrepresented. Nonetheless, it is accepted that most of the genetic architecture underlying susceptibility to the disease remains to be defined. Candidate association studies of the leukocyte immunoglobulin-like receptor LILRA3 gene in MS have been repeatedly reported with inconsistent results. OBJECTIVES In an attempt to shed some light on these controversial findings, a combined analysis was performed including the previously published datasets and three newly genotyped cohorts. Both wild-type and deleted LILRA3 alleles were discriminated in a single-tube PCR amplification and the resulting products were visualized by their different electrophoretic mobilities. RESULTS AND CONCLUSION Overall, this meta-analysis involved 3200 MS patients and 3069 matched healthy controls and it did not evidence significant association of the LILRA3 deletion [carriers of LILRA3 deletion: p = 0.25, OR (95% CI) = 1.07 (0.95-1.19)], even after stratification by gender and the HLA-DRB1*15:01 risk allele.
Resumo:
This paper presents the two datasets (ARENA and P5) and the challenge that form a part of the PETS 2015 workshop. The datasets consist of scenarios recorded by us- ing multiple visual and thermal sensors. The scenarios in ARENA dataset involve different staged activities around a parked vehicle in a parking lot in UK and those in P5 dataset involve different staged activities around the perimeter of a nuclear power plant in Sweden. The scenarios of each dataset are grouped into ‘Normal’, ‘Warning’ and ‘Alarm’ categories. The Challenge specifically includes tasks that account for different steps in a video understanding system: Low-Level Video Analysis (object detection and tracking), Mid-Level Video Analysis (‘atomic’ event detection) and High-Level Video Analysis (‘complex’ event detection). The evaluation methodology used for the Challenge includes well-established measures.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Resumo:
The Multiple Affect Adjective Check List (MAACL) has been found to have five first-order factors representing Anxiety, Depression, Hostility, Positive Affect, and Sensation Seeking and two second-order factors representing Positive Affect and Sensation Seeking (PASS) and Dysphoria. The present study examines whether these first- and second-order conceptions of affect (based on R-technique factor analysis) can also account for patterns of intraindividual variability in affect (based on P-technique factor analysis) in eight elderly women. Although the hypothesized five-factor model of affect was not testable in all of the present P-technique datasets, the results were consistent with this interindividual model of affect. Moreover, evidence of second-order (PASS and Dysphoria) and third-order (generalized distress) factors was found in one data set. Sufficient convergence in findings between the present P-technique research and prior R-technique research suggests that the MAACL is robust in describing both inter- and intraindividual components of affect in elderly women.
Resumo:
Spatial independent component analysis (sICA) of functional magnetic resonance imaging (fMRI) time series can generate meaningful activation maps and associated descriptive signals, which are useful to evaluate datasets of the entire brain or selected portions of it. Besides computational implications, variations in the input dataset combined with the multivariate nature of ICA may lead to different spatial or temporal readouts of brain activation phenomena. By reducing and increasing a volume of interest (VOI), we applied sICA to different datasets from real activation experiments with multislice acquisition and single or multiple sensory-motor task-induced blood oxygenation level-dependent (BOLD) signal sources with different spatial and temporal structure. Using receiver operating characteristics (ROC) methodology for accuracy evaluation and multiple regression analysis as benchmark, we compared sICA decompositions of reduced and increased VOI fMRI time-series containing auditory, motor and hemifield visual activation occurring separately or simultaneously in time. Both approaches yielded valid results; however, the results of the increased VOI approach were spatially more accurate compared to the results of the decreased VOI approach. This is consistent with the capability of sICA to take advantage of extended samples of statistical observations and suggests that sICA is more powerful with extended rather than reduced VOI datasets to delineate brain activity.
Resumo:
We present a 3000-yr rainfall reconstruction from the Galápagos Islands that is based on paired biomarker records from the sediment of El Junco Lake. Located in the eastern equatorial Pacific, the climate of the Galápagos Islands is governed by movements of the Intertropical Convergence Zone (ITCZ) and the El Niño-Southern Oscillation (ENSO). We use a novel method for reconstructing past ENSO- and ITCZ-related rainfall changes through analysis of molecular and isotopic biomarker records representing several types of plants and algae that grow under differing climatic conditions. We propose that ?D values of dinosterol, a sterol produced by dinoflagellates, record changes in mean rainfall in El Junco Lake, while dD values of C34 botryococcene, a hydrocarbon unique to the green alga Botryococcus braunii, record changes in rainfall associated with moderate-to-strong El Niño events. We use these proxies to infer changes in mean rainfall and El Niño-related rainfall over the past 3000 yr. During periods in which the inferred change in El Niño-related rainfall opposed the change in mean rainfall, we infer changes in the amount of ITCZ-related rainfall. Simulations with an idealized isotope hydrology model of El Junco Lake help illustrate the interpretation of these proxy reconstructions. Opposing changes in El Niño- and ITCZ-related rainfall appear to account for several of the largest inferred hydrologic changes in El Junco Lake. We propose that these reconstructions can be used to infer changes in frequency and/or intensity of El Niño events and changes in the position of the ITCZ in the eastern equatorial Pacific over the past 3000 yr. Comparison with El Junco Lake sediment grain size records indicates general agreement of inferred rainfall changes over the late Holocene.
Resumo:
The analysis of time-dependent data is an important problem in many application domains, and interactive visualization of time-series data can help in understanding patterns in large time series data. Many effective approaches already exist for visual analysis of univariate time series supporting tasks such as assessment of data quality, detection of outliers, or identification of periodically or frequently occurring patterns. However, much fewer approaches exist which support multivariate time series. The existence of multiple values per time stamp makes the analysis task per se harder, and existing visualization techniques often do not scale well. We introduce an approach for visual analysis of large multivariate time-dependent data, based on the idea of projecting multivariate measurements to a 2D display, visualizing the time dimension by trajectories. We use visual data aggregation metaphors based on grouping of similar data elements to scale with multivariate time series. Aggregation procedures can either be based on statistical properties of the data or on data clustering routines. Appropriately defined user controls allow to navigate and explore the data and interactively steer the parameters of the data aggregation to enhance data analysis. We present an implementation of our approach and apply it on a comprehensive data set from the field of earth bservation, demonstrating the applicability and usefulness of our approach.
Resumo:
As the Antarctic Circumpolar Current crosses the South-West Indian Ocean Ridge, it creates an extensive eddy field characterised by high sea level anomaly variability. We investigated the diving behaviour of female southern elephant seals from Marion Island during their post-moult migrations in relation to this eddy field in order to determine its role in the animals' at-sea dispersal. Most seals dived within the region significantly more often than predicted by chance, and these dives were generally shallower and shorter than dives outside the eddy field. Mixed effects models estimated reductions of 44.33 ± 3.00 m (maximum depth) and 6.37 ± 0.10 min (dive duration) as a result of diving within the region, along with low between-seal variability (maximum depth: 5.5 % and dive duration: 8.4 %). U-shaped dives increased in frequency inside the eddy field, whereas W-shaped dives with multiple vertical movements decreased. Results suggest that Marion Island's adult female elephant seals' dives are characterised by lowered cost-of-transport when they encounter the eddy field during the start and end of their post-moult migrations. This might result from changes in buoyancy associated with varying body condition upon leaving and returning to the island. Our results do not suggest that the eddy field is a vital foraging ground for Marion Island's southern elephant seals. However, because seals preferentially travel through this area and likely forage opportunistically while minimising transport costs, we hypothesise that climate-mediated changes in the nature or position of this region may alter the seals' at-sea dispersal patterns.
Resumo:
The software PanGet is a special tool for the download of multiple data sets from PANGAEA. It uses the PANGAEA data set ID which is unique and part of the DOI. In a first step a list of ID's of those data sets to be downloaded must be created. There are two choices to define this individual collection of sets. Based on the ID list, the tool will download the data sets. Failed downloads are written to the file *_failed.txt. The functionality of PanGet is also part of the program Pan2Applic (choose File > Download PANGAEA datasets...) and PanTool2 (choose Basic tools > Download PANGAEA datasets...).
Resumo:
Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this paper, we present several novel techniques to eectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important topological relations: contains, contained, overlap, and disjoint. We rst present a novel framework to construct a multiscale histogram composed of multiple Euler histograms with the guarantee of the exact summarization results for aligned windows in constant time. Then we present an approximate algorithm, with the approximate ratio 19/12, to minimize the storage spaces of such multiscale Euler histograms, although the problem is generally NP-hard. To conform to a limited storage space where only k Euler histograms are allowed, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy. Finally, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. Our extensive experiments against both synthetic and real world datasets demonstrated that the approximate mul- tiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost effciency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for the real datasets.
Resumo:
Ant Colony Optimisation algorithms mimic the way ants use pheromones for marking paths to important locations. Pheromone traces are followed and reinforced by other ants, but also evaporate over time. As a consequence, optimal paths attract more pheromone, whilst the less useful paths fade away. In the Multiple Pheromone Ant Clustering Algorithm (MPACA), ants detect features of objects represented as nodes within graph space. Each node has one or more ants assigned to each feature. Ants attempt to locate nodes with matching feature values, depositing pheromone traces on the way. This use of multiple pheromone values is a key innovation. Ants record other ant encounters, keeping a record of the features and colony membership of ants. The recorded values determine when ants should combine their features to look for conjunctions and whether they should merge into colonies. This ability to detect and deposit pheromone representative of feature combinations, and the resulting colony formation, renders the algorithm a powerful clustering tool. The MPACA operates as follows: (i) initially each node has ants assigned to each feature; (ii) ants roam the graph space searching for nodes with matching features; (iii) when departing matching nodes, ants deposit pheromones to inform other ants that the path goes to a node with the associated feature values; (iv) ant feature encounters are counted each time an ant arrives at a node; (v) if the feature encounters exceed a threshold value, feature combination occurs; (vi) a similar mechanism is used for colony merging. The model varies from traditional ACO in that: (i) a modified pheromone-driven movement mechanism is used; (ii) ants learn feature combinations and deposit multiple pheromone scents accordingly; (iii) ants merge into colonies, the basis of cluster formation. The MPACA is evaluated over synthetic and real-world datasets and its performance compares favourably with alternative approaches.
Resumo:
Popular dimension reduction and visualisation algorithms rely on the assumption that input dissimilarities are typically Euclidean, for instance Metric Multidimensional Scaling, t-distributed Stochastic Neighbour Embedding and the Gaussian Process Latent Variable Model. It is well known that this assumption does not hold for most datasets and often high-dimensional data sits upon a manifold of unknown global geometry. We present a method for improving the manifold charting process, coupled with Elastic MDS, such that we no longer assume that the manifold is Euclidean, or of any particular structure. We draw on the benefits of different dissimilarity measures allowing for the relative responsibilities, under a linear combination, to drive the visualisation process.
Resumo:
The MAREDAT atlas covers 11 types of plankton, ranging in size from bacteria to jellyfish. Together, these plankton groups determine the health and productivity of the global ocean and play a vital role in the global carbon cycle. Working within a uniform and consistent spatial and depth grid (map) of the global ocean, the researchers compiled thousands and tens of thousands of data points to identify regions of plankton abundance and scarcity as well as areas of data abundance and scarcity. At many of the grid points, the MAREDAT team accomplished the difficult conversion from abundance (numbers of organisms) to biomass (carbon mass of organisms). The MAREDAT atlas provides an unprecedented global data set for ecological and biochemical analysis and modeling as well as a clear mandate for compiling additional existing data and for focusing future data gathering efforts on key groups in key areas of the ocean. The present data set presents depth integrated values of diazotrophs abundance and biomass, computed from a collection of source data sets.