882 resultados para large scale data gathering
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Contém resumo
Resumo:
The MAP-i doctoral program of the Universities of Minho, Aveiro and Porto
Resumo:
Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plants resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( = 280-400m), suggesting that besides the biological activities of those secondary metabolites, they also play a relevant role for the discrimination and classification of that complex matrix through bioinformatics tools. Finally, a series of machine learning approaches, e.g., partial least square-discriminant analysis (PLS-DA), k-Nearest Neighbors (kNN), and Decision Trees showed to be complementary to PCA and HCA, allowing to obtain relevant information as to the sample discrimination.
Resumo:
Biofilm research is growing more diverse and dependent on high-throughput technologies and the large-scale production of results aggravates data substantiation. In particular, it is often the case that experimental protocols are adapted to meet the needs of a particular laboratory and no statistical validation of the modified method is provided. This paper discusses the impact of intra-laboratory adaptation and non-rigorous documentation of experimental protocols on biofilm data interchange and validation. The case study is a non-standard, but widely used, workflow for Pseudomonas aeruginosa biofilm development, considering three analysis assays: the crystal violet (CV) assay for biomass quantification, the XTT assay for respiratory activity assessment, and the colony forming units (CFU) assay for determination of cell viability. The ruggedness of the protocol was assessed by introducing small changes in the biofilm growth conditions, which simulate minor protocol adaptations and non-rigorous protocol documentation. Results show that even minor variations in the biofilm growth conditions may affect the results considerably, and that the biofilm analysis assays lack repeatability. Intra-laboratory validation of non-standard protocols is found critical to ensure data quality and enable the comparison of results within and among laboratories.
Resumo:
Large scale distributed data stores rely on optimistic replication to scale and remain highly available in the face of net work partitions. Managing data without coordination results in eventually consistent data stores that allow for concurrent data updates. These systems often use anti-entropy mechanisms (like Merkle Trees) to detect and repair divergent data versions across nodes. However, in practice hash-based data structures are too expensive for large amounts of data and create too many false conflicts. Another aspect of eventual consistency is detecting write conflicts. Logical clocks are often used to track data causality, necessary to detect causally concurrent writes on the same key. However, there is a nonnegligible metadata overhead per key, which also keeps growing with time, proportional with the node churn rate. Another challenge is deleting keys while respecting causality: while the values can be deleted, perkey metadata cannot be permanently removed without coordination. Weintroduceanewcausalitymanagementframeworkforeventuallyconsistentdatastores,thatleveragesnodelogicalclocks(BitmappedVersion Vectors) and a new key logical clock (Dotted Causal Container) to provides advantages on multiple fronts: 1) a new efficient and lightweight anti-entropy mechanism; 2) greatly reduced per-key causality metadata size; 3) accurate key deletes without permanent metadata.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale for the purpose of improving predictions of groundwater flow and solute transport. However, extending corresponding approaches to the regional scale still represents one of the major challenges in the domain of hydrogeophysics. To address this problem, we have developed a regional-scale data integration methodology based on a two-step Bayesian sequential simulation approach. Our objective is to generate high-resolution stochastic realizations of the regional-scale hydraulic conductivity field in the common case where there exist spatially exhaustive but poorly resolved measurements of a related geophysical parameter, as well as highly resolved but spatially sparse collocated measurements of this geophysical parameter and the hydraulic conductivity. To integrate this multi-scale, multi-parameter database, we first link the low- and high-resolution geophysical data via a stochastic downscaling procedure. This is followed by relating the downscaled geophysical data to the high-resolution hydraulic conductivity distribution. After outlining the general methodology of the approach, we demonstrate its application to a realistic synthetic example where we consider as data high-resolution measurements of the hydraulic and electrical conductivities at a small number of borehole locations, as well as spatially exhaustive, low-resolution estimates of the electrical conductivity obtained from surface-based electrical resistivity tomography. The different stochastic realizations of the hydraulic conductivity field obtained using our procedure are validated by comparing their solute transport behaviour with that of the underlying ?true? hydraulic conductivity field. We find that, even in the presence of strong subsurface heterogeneity, our proposed procedure allows for the generation of faithful representations of the regional-scale hydraulic conductivity structure and reliable predictions of solute transport over long, regional-scale distances.
Resumo:
BACKGROUND: The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference. RESULTS: GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition. CONCLUSION: To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.
Resumo:
Globalization involves several facility location problems that need to be handled at large scale. Location Allocation (LA) is a combinatorial problem in which the distance among points in the data space matter. Precisely, taking advantage of the distance property of the domain we exploit the capability of clustering techniques to partition the data space in order to convert an initial large LA problem into several simpler LA problems. Particularly, our motivation problem involves a huge geographical area that can be partitioned under overall conditions. We present different types of clustering techniques and then we perform a cluster analysis over our dataset in order to partition it. After that, we solve the LA problem applying simulated annealing algorithm to the clustered and non-clustered data in order to work out how profitable is the clustering and which of the presented methods is the most suitable
Resumo:
Abstract This thesis proposes a set of adaptive broadcast solutions and an adaptive data replication solution to support the deployment of P2P applications. P2P applications are an emerging type of distributed applications that are running on top of P2P networks. Typical P2P applications are video streaming, file sharing, etc. While interesting because they are fully distributed, P2P applications suffer from several deployment problems, due to the nature of the environment on which they perform. Indeed, defining an application on top of a P2P network often means defining an application where peers contribute resources in exchange for their ability to use the P2P application. For example, in P2P file sharing application, while the user is downloading some file, the P2P application is in parallel serving that file to other users. Such peers could have limited hardware resources, e.g., CPU, bandwidth and memory or the end-user could decide to limit the resources it dedicates to the P2P application a priori. In addition, a P2P network is typically emerged into an unreliable environment, where communication links and processes are subject to message losses and crashes, respectively. To support P2P applications, this thesis proposes a set of services that address some underlying constraints related to the nature of P2P networks. The proposed services include a set of adaptive broadcast solutions and an adaptive data replication solution that can be used as the basis of several P2P applications. Our data replication solution permits to increase availability and to reduce the communication overhead. The broadcast solutions aim, at providing a communication substrate encapsulating one of the key communication paradigms used by P2P applications: broadcast. Our broadcast solutions typically aim at offering reliability and scalability to some upper layer, be it an end-to-end P2P application or another system-level layer, such as a data replication layer. Our contributions are organized in a protocol stack made of three layers. In each layer, we propose a set of adaptive protocols that address specific constraints imposed by the environment. Each protocol is evaluated through a set of simulations. The adaptiveness aspect of our solutions relies on the fact that they take into account the constraints of the underlying system in a proactive manner. To model these constraints, we define an environment approximation algorithm allowing us to obtain an approximated view about the system or part of it. This approximated view includes the topology and the components reliability expressed in probabilistic terms. To adapt to the underlying system constraints, the proposed broadcast solutions route messages through tree overlays permitting to maximize the broadcast reliability. Here, the broadcast reliability is expressed as a function of the selected paths reliability and of the use of available resources. These resources are modeled in terms of quotas of messages translating the receiving and sending capacities at each node. To allow a deployment in a large-scale system, we take into account the available memory at processes by limiting the view they have to maintain about the system. Using this partial view, we propose three scalable broadcast algorithms, which are based on a propagation overlay that tends to the global tree overlay and adapts to some constraints of the underlying system. At a higher level, this thesis also proposes a data replication solution that is adaptive both in terms of replica placement and in terms of request routing. At the routing level, this solution takes the unreliability of the environment into account, in order to maximize reliable delivery of requests. At the replica placement level, the dynamically changing origin and frequency of read/write requests are analyzed, in order to define a set of replica that minimizes communication cost.
Resumo:
Many terrestrial and marine systems are experiencing accelerating decline due to the effects of global change. This situation has raised concern about the consequences of biodiversity losses for ecosystem function, ecosystem service provision, and human well-being. Coastal marine habitats are a main focus of attention because they harbour a high biological diversity, are among the most productive systems of the world and present high anthropogenic interaction levels. The accelerating degradation of many terrestrial and marine systems highlights the urgent need to evaluate the consequence of biodiversity loss. Because marine biodiversity is a dynamic entity and this study was interested global change impacts, this study focused on benthic biodiversity trends over large spatial and long temporal scales. The main aim of this project was to investigate the current extent of biodiversity of the high diverse benthic coralligenous community in the Mediterranean Sea, detect its changes, and predict its future changes over broad spatial and long temporal scales. These marine communities are characterized by structural species with low growth rates and long life spans; therefore they are considered particularly sensitive to disturbances. For this purpose, this project analyzed permanent photographic plots over time at four locations in the NW Mediterranean Sea. The spatial scale of this study provided information on the level of species similarity between these locations, thus offering a solid background on the amount of large scale variability in coralligenous communities; whereas the temporal scale was fundamental to determine the natural variability in order to discriminate between changes observed due to natural factors and those related to the impact of disturbances (e.g. mass mortality events related to positive thermal temperatures, extreme catastrophic events). This study directly addressed the challenging task of analyzing quantitative biodiversity data of these high diverse marine benthic communities. Overall, the scientific knowledge gained with this research project will improve our understanding in the function of marine ecosystems and their trajectories related to global change.
Resumo:
NovoTTF-100A (TTF) is a portable device delivering low-intensity, intermediate-frequency, alternating electric fields using noninvasive, disposable scalp electrodes. TTF interferes with tumor cell division, and it has been approved by the US Food and Drug Administration (FDA) for the treatment of recurrent glioblastoma (rGBM) based on data from a phase III trial. This presentation describes the updated survival data 2 years after completing recruitment. Adults with rGBM (KPS ≥ 70) were randomized (stratified by surgery and center) to either continuous TTF (20-24 h/day, 7 days/week) or efficacious chemotherapy based on best physician choice (BPC). The primary endpoint was overall survival (OS), and secondary endpoints were PFS6, 1-year survival, and QOL. Patients were randomized (28 US and European centers) to either TTF alone (n ¼ 120) or BPC (n ¼ 117). Patient characteristics were balanced, median age was 54 years (range, 23-80 years), and median KPS was 80 (range, 50-100). One quarter of the patients had debulking surgery, and over half of the patients were at their second or later recurrence. OS in the intent-to-treat (ITT) population was equivalent in TTF versus BPC patients (median OS, 6.6vs. 6.0 months; n ¼ 237; p ¼ 0.26; HR ¼ 0.86). With a median follow-up of 33.6 months, long-term survival in the TTF group was higher than that in the BPC group at 2, 3, and 4 years of follow-up (9.3% vs. 6.6%; 8.4% vs. 1.4%; 8.4% vs. 0.0%, respectively). Analysis of patients who received at least one treatment course demonstrated a survival benefit for TTF patients compared to BPC patients (median OS, 7.8 vs. 6.0 months; n ¼ 93 vs. n ¼ 117; p ¼ 0.012; HR ¼ 0.69). In this group, 1-year survival was 28% vs. 20%, and PFS6 was 26.2% vs. 15.2% (p ¼ 0.034). TTF, a noninvasive, novel cancer treatment modality shows significant therapeutic efficacy with promising long-term survival results. The impact of TTF was more pronounced when comparing only patients who received the minimal treatment course. A large-scale phase III trial in newly diagnosed GBM is ongoing.
Resumo:
The Anarak, Jandaq and Posht-e-Badam metamorphic complexes occupy the NW part of the Central-East Iranian Microcontinent and are juxtaposed with the Great Kavir block and Sanandaj-Sirjan zone. Our recent findings redefine the origin of these complexes, so far attributed to the Precambrian-Early Paleozoic orogenic episodes, and now directly related to the tectonic evolution of the Paleo-Tethys Ocean. This tectonic evolution was initiated by Late Ordovician-Early Devonian rifting events and terminated in the Triassic by the Eocimmerian collision event due to the docking of the Cimmerian blocks with the Asiatic Turan block. The ``Variscan accretionary complex'' is a new name we proposed for the most widely distributed metamorphic rocks connected to the Anarak and Jandaq complexes. This accretionary complex exposed from SW of Jandaq to the Anarak and Kabudan areas is a thick and fine grain siliciclastic sequence accompanied by marginal-sea ophiolitic remnants, including gabbro-basalts with a supra-subduction-geochemical signature. New Ar-40/Ar-39 ages are obtained as 333-320 Ma for the metamorphism of this sequence under greenschist to amphibolite facies. Moreover, the limy intercalations in the volcano-sedimentary part of this complex in Godar-e-Siah yielded Upper Devonian-Tournaisian conodonts. The northeastern part of this complex in the Jandaq area was intruded by 215 +/- 15 Ma arc to collisional granite and pegmatites dated by ID-TIMS and its metamorphic rocks are characterized by Some Ar-40/Ar-39 radiometric ages of 163-156 Ma. The ``Variscan'' accretionary complex was northwardly accreted to the Airekan granitic terrane dated at 549 +/- 15 Ma. Later, from the Late Carboniferous to Triassic, huge amounts of oceanic material were accreted to its southern side and penetrated by several seamounts such as the Anarak and Kabudan. This new period of accretion is supported by the 280-230 Ma Ar-40/Ar-39 ages for the Anarak mild high-pressure metamorphic rocks and a 262 Ma U-Pb age for the trondhjemite-rhyolite association of that area. The Triassic Bayazeh flysch filled the foreland basin during the final closure of the Paleo-Tethys Ocean and was partly deposited and/or thrusted onto the Cimmerian Yazd block. The Paleo-Tethys magmatic arc products have been well-preserved in the Late Devonian-Carboniferous Godar-e-Siah intra-arc deposits and the Triassic Nakhlak fore-arc succession. On the passive margin of the Cimmerian block, in the Yazd region, the nearly continuous Upper Paleozoic platform-type deposition was totally interrupted during the Middle to Late Triassic. Local erosion, down to Lower Paleozoic levels, may be related to flexural bulge erosion. The platform was finally unconformably covered by Liassic continental molassic deposits of the Shemshak. One of the extensional periods related to Neo-Tethyan back-arc rifting in Late Cretaceous time finally separated parts of the Eocimmerian collisional domain from the Eurasian Turan domain. The opening and closing of this new ocean, characterized by the Nain and Sabzevar ophiolitic melanges, finally transported the Anarak-Jandaq composite terrane to Central Iran, accompanied by large scale rotation of the Central-East Iranian Microcontinent (CEIM). Due to many similarities between the Posht-e-Badam metamorphic complex and the Anarak-Jandaq composite terrane, the former could be part of the latter, if it was transported further south during Tertiary time. (C) 2007 Elsevier B.V. All rights reserved.