818 resultados para MULTI-RELATIONAL DATA MINING


Relevância:

100.00% 100.00%

Publicador:

Resumo:

L'innovazione delle tecnologie di sequenziamento negli ultimi anni ha reso possibile la catalogazione delle varianti genetiche nei campioni umani, portando nuove scoperte e comprensioni nella ricerca medica, farmaceutica, dell'evoluzione e negli studi sulla popolazione. La quantità di sequenze prodotta è molto cospicua, e per giungere all'identificazione delle varianti sono necessari diversi stadi di elaborazione delle informazioni genetiche in cui, ad ogni passo, vengono generate ulteriori informazioni. Insieme a questa immensa accumulazione di dati, è nata la necessità da parte della comunità scientifica di organizzare i dati in repository, dapprima solo per condividere i risultati delle ricerche, poi per permettere studi statistici direttamente sui dati genetici. Gli studi su larga scala coinvolgono quantità di dati nell'ordine dei petabyte, il cui mantenimento continua a rappresentare una sfida per le infrastrutture. Per la varietà e la quantità di dati prodotti, i database giocano un ruolo di primaria importanza in questa sfida. Modelli e organizzazione dei dati in questo campo possono fare la differenza non soltanto per la scalabilità, ma anche e soprattutto per la predisposizione al data mining. Infatti, la memorizzazione di questi dati in file con formati quasi-standard, la dimensione di questi file, e i requisiti computazionali richiesti, rendono difficile la scrittura di software di analisi efficienti e scoraggiano studi su larga scala e su dati eterogenei. Prima di progettare il database si è perciò studiata l’evoluzione, negli ultimi vent’anni, dei formati quasi-standard per i flat file biologici, contenenti metadati eterogenei e sequenze nucleotidiche vere e proprie, con record privi di relazioni strutturali. Recentemente questa evoluzione è culminata nell’utilizzo dello standard XML, ma i flat file delimitati continuano a essere gli standard più supportati da tools e piattaforme online. È seguita poi un’analisi dell’organizzazione interna dei dati per i database biologici pubblici. Queste basi di dati contengono geni, varianti genetiche, strutture proteiche, ontologie fenotipiche, relazioni tra malattie e geni, relazioni tra farmaci e geni. Tra i database pubblici studiati rientrano OMIM, Entrez, KEGG, UniProt, GO. L'obiettivo principale nello studio e nella modellazione del database genetico è stato quello di strutturare i dati in modo da integrare insieme i dati eterogenei prodotti e rendere computazionalmente possibili i processi di data mining. La scelta di tecnologia Hadoop/MapReduce risulta in questo caso particolarmente incisiva, per la scalabilità garantita e per l’efficienza nelle analisi statistiche più complesse e parallele, come quelle riguardanti le varianti alleliche multi-locus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il problema relativo alla predizione, la ricerca di pattern predittivi all‘interno dei dati, è stato studiato ampiamente. Molte metodologie robuste ed efficienti sono state sviluppate, procedimenti che si basano sull‘analisi di informazioni numeriche strutturate. Quella testuale, d‘altro canto, è una tipologia di informazione fortemente destrutturata. Quindi, una immediata conclusione, porterebbe a pensare che per l‘analisi predittiva su dati testuali sia necessario sviluppare metodi completamente diversi da quelli ben noti dalle tecniche di data mining. Un problema di predizione può essere risolto utilizzando invece gli stessi metodi : dati testuali e documenti possono essere trasformati in valori numerici, considerando per esempio l‘assenza o la presenza di termini, rendendo di fatto possibile una utilizzazione efficiente delle tecniche già sviluppate. Il text mining abilita la congiunzione di concetti da campi di applicazione estremamente eterogenei. Con l‘immensa quantità di dati testuali presenti, basti pensare, sul World Wide Web, ed in continua crescita a causa dell‘utilizzo pervasivo di smartphones e computers, i campi di applicazione delle analisi di tipo testuale divengono innumerevoli. L‘avvento e la diffusione dei social networks e della pratica di micro blogging abilita le persone alla condivisione di opinioni e stati d‘animo, creando un corpus testuale di dimensioni incalcolabili aggiornato giornalmente. Le nuove tecniche di Sentiment Analysis, o Opinion Mining, si occupano di analizzare lo stato emotivo o la tipologia di opinione espressa all‘interno di un documento testuale. Esse sono discipline attraverso le quali, per esempio, estrarre indicatori dello stato d‘animo di un individuo, oppure di un insieme di individui, creando una rappresentazione dello stato emotivo sociale. L‘andamento dello stato emotivo sociale può condizionare macroscopicamente l‘evolvere di eventi globali? Studi in campo di Economia e Finanza Comportamentale assicurano un legame fra stato emotivo, capacità nel prendere decisioni ed indicatori economici. Grazie alle tecniche disponibili ed alla mole di dati testuali continuamente aggiornati riguardanti lo stato d‘animo di milioni di individui diviene possibile analizzare tali correlazioni. In questo studio viene costruito un sistema per la previsione delle variazioni di indici di borsa, basandosi su dati testuali estratti dalla piattaforma di microblogging Twitter, sotto forma di tweets pubblici; tale sistema include tecniche di miglioramento della previsione basate sullo studio di similarità dei testi, categorizzandone il contributo effettivo alla previsione.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we will discuss about a project started by the Emilia-Romagna Regional Government regarding the manage of the public transport. In particular we will perform a data mining analysis on the data-set of this project. After introducing the Weka software used to make our analysis, we will discover the most useful data mining techniques and algorithms; and we will show how these results can be used to violate the privacy of the same public transport operators. At the end, despite is off topic of this work, we will spend also a few words about how it's possible to prevent this kind of attack.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sviluppo e analisi di un dataset campione, composto da circa 3 mln di entry ed estratto da un data warehouse di informazioni riguardanti il consumo energetico di diverse smart home.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the first time we present a multi-proxy data set for the Russian Altai, consisting of Siberian larch tree-ring width (TRW), latewood density (MXD), δ13C and δ18O in cellulose chronologies obtained for the period 1779–2007 and cell wall thickness (CWT) for 1900–2008. All of these parameters agree well between each other in the high-frequency variability, while the low-frequency climate information shows systematic differences. The correlation analysis with temperature and precipitation data from the closest weather station and gridded data revealed that annual TRW, MXD, CWT, and δ13C data contain a strong summer temperature signal, while δ18O in cellulose represents a mixed summer and winter temperature and precipitation signal. The temperature and precipitation reconstructions from the Belukha ice core and Teletskoe lake sediments were used to investigate the correspondence of different independent proxies. Low frequency patterns in TRW and δ13C chronologies are consistent with temperature reconstructions from nearby Belukha ice core and Teletskoe lake sediments showing a pronounced warming trend in the last century. Their combination could be used for the regional temperature reconstruction. The long-term δ18O trend agrees with the precipitation reconstruction from the Teletskoe lake sediment indicating more humid conditions during the twentieth century. Therefore, these two proxies could be combined for the precipitation reconstruction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

210Pb, 137Cs and 14C dated sediments of two late Holocene landslide lakes in the Provincial Park Lagunas de Yala (Laguna Rodeo, Laguna Comedero, 24°06′S, 65°30′W, 2100 m asl, northwestern Argentina) reveal a high-resolution multi-proxy data set of climate change and human impact for the past ca. 2000 years. Comparison of the lake sediment data set for the 20th century (sediment mass accumulation rates MARs, pollen spectra, nutrient and charcoal fluxes) with independent dendroecological data from the catchment (fire scars, tree growth) and long regional precipitation series (from 1934 onwards) show that (1) the lake sediment data set is internally highly consistent and compares well with independent data sets, (2) the chronology of the sediment is reliable, (3) large fires (1940s, 1983/1984–1989) as documented in the local fire scar frequency are recorded in the charcoal flux to the lake sediments and coincide with low wet-season precipitation rates (e.g., 1940s, 1983/1984) and/or high interannual precipitation variability (late 1940s), and (4) the regional increase in precipitation after 1970 is recorded in an increase in the MARs (L. Rodeo from 100 to 390 mg cm−2 yr−1) and in an increase in fern spores reflecting wet vegetation. The most significant change in MARs and nutrient fluxes (Corg and P) of the past 2000 years is observed with the transition from the Inca Empire to the Spanish Conquest around 1600 AD. Compared with the pre-17th century conditions, MARs increased by a factor of ca. 5 to >8 (to 800 +130, −280 mg cm−2 yr−1), PO4 fluxes increased by a factor of 7, and Corg fluxes by a factor of 10.5 for the time between 1640 and 1930 AD. 17th to 19th century MARs and nutrient fluxes also exceed 20th century values. Excess Pb deposition as indicated by a significant increase in Pb/Zr and Pb/Rb ratios in the sediments after the 1950s coincides with a rapid expansion of the regional mining industry. Excess Pb is interpreted as atmospheric deposition and direct human impact due to Pb smelting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Smart homes for the aging population have recently started attracting the attention of the research community. The "health state" of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the results of a Secchi depth data mining study for the North Sea - Baltic Sea region. 40,829 measurements of Secchi depth were compiled from the area as a result of this study. 4.3% of the observations were found in the international data centers [ICES Oceanographic Data Center in Denmark and the World Ocean Data Center A (WDC-A) in the USA], while 95.7% of the data was provided by individuals and ocean research institutions from the surrounding North Sea and Baltic Sea countries. Inquiries made at the World Ocean Data Center B (WDC-B) in Russia suggested that there could be significant additional holdings in that archive but, unfortunately, no data could be made available. The earliest Secchi depth measurement retrieved in this study dates back to 1902 for the Baltic Sea, while the bulk of the measurements were gathered after 1970. The spatial distribution of Secchi depth measurements in the North Sea is very uneven with surprisingly large sampling gaps in the Western North Sea. Quarterly and annual Secchi depth maps with a 0.5° x 0.5° spatial resolution are provided for the transition area between the North Sea and the Baltic Sea (4°E-16°E, 53°N-60°N).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The first 1400-year floating varve chronology for north-eastern Germany covering the late Allerød to the early Holocene has been established by microscopic varve counts from the Rehwiese palaeolake sediment record. The Laacher See Tephra (LST), at the base of the studied interval, forms the tephrochronological anchor point. The fine laminations were examined using a combination of micro-facies and ?-XRF analyses and are typical of calcite varves, which in this case provide mainly a warm season signal. Two varve types with different sub-layer structures have been distinguished: (I) complex varves consisting of up to four seasonal sub-layers formed during the Allerød and early Holocene periods, and, (II) simple two sub-layer type varves only occurring during the Younger Dryas. The precision of the chronology has been improved by varve-to-varve comparison of two independently analyzed sediment profiles based on well-defined micro-marker layers. This has enabled both (1) the precise location of single missing varves in one of the sediment profiles, and, (2) the verification of varve interpolation in disturbed varve intervals in the parallel core. Inter-annual and decadal-scale variability in sediment deposition processes were traced by multi-proxy data series including seasonal layer thickness, high-resolution element scans and total organic and inorganic carbon data at a five-varve resolution. These data support the idea of a two-phase Younger Dryas, with the first interval (12,675 - 12,275 varve years BP) characterised by a still significant but gradually decreasing warm-season calcite precipitation and a second phase (12,275 - 11,640 varve years BP) with only weak calcite precipitation. Detailed correlation of these two phases with the Meerfelder Maar record based on the LST isochrone and independent varve counts provides clues about regional differences and seasonal aspects of YD climate change along a transect from a location proximal to the North Atlantic in the west to a more continental site in the east.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Providing accurate maps of coral reefs where the spatial scale and labels of the mapped features correspond to map units appropriate for examining biological and geomorphic structures and processes is a major challenge for remote sensing. The objective of this work is to assess the accuracy and relevance of the process used to derive geomorphic zone and benthic community zone maps for three western Pacific coral reefs produced from multi-scale, object-based image analysis (OBIA) of high-spatial-resolution multi-spectral images, guided by field survey data. Three Quickbird-2 multi-spectral data sets from reefs in Australia, Palau and Fiji and georeferenced field photographs were used in a multi-scale segmentation and object-based image classification to map geomorphic zones and benthic community zones. A per-pixel approach was also tested for mapping benthic community zones. Validation of the maps and comparison to past approaches indicated the multi-scale OBIA process enabled field data, operator field experience and a conceptual hierarchical model of the coral reef environment to be linked to provide output maps at geomorphic zone and benthic community scales on coral reefs. The OBIA mapping accuracies were comparable with previously published work using other methods; however, the classes mapped were matched to a predetermined set of features on the reef.