861 resultados para multiple data sources
Resumo:
Introduction: Brazil, is one of the main agricultural producers in the world ranking 1st in the production of sugarcane, coffee and oranges. It is also 2nd as world producer of soybeans and a leader in the harvested yields of many other crops. The annual consumption of mineral fertilizers exceeds 20 million mt, 30% of which corresponds to potash fertilizers (ANDA, 2006). From this statistic it may be supposed that fertilizer application in Brazil is rather high, compared with many other countries. However, even if it is assumed that only one fourth of this enormous 8.5 million km2 territory is used for agriculture, average levels of fertilizer application per hectare of arable land are not high enough for sustainable production. One of the major constraints is the relatively low natural fertility status of the soils which contain excessive Fe and Al oxides. Agriculture is also often practised on sandy soils so that the heavy rainfall causes large losses of nutrients through leaching. In general, nutrient removal by crops such as sugarcane and tropical fruits is much more than the average nutrient application via fertilization, especially in regions with a long history of agricultural production. In the recently developed areas, especially in the Cerrado (Brazilian savanna) where agriculture has expanded since 1980, soils are even poorer than in the "old" agricultural regions, and high costs of mineral fertilizers have become a significant input factor in determining soybean, maize and cotton planting. The consumption of mineral fertilizers throughout Brazil is very uneven. According to the 1995/96 Agricultural Census, only in eight of the total of 26 Brazilian states, were 50 per cent or more of the farms treated "systematically" with mineral fertilizers; in many states it was less than 25 per cent, and in five states even less than 12 per cent (Brazilian Institute for Geography and Statistics; Censo Agropecuario1995/96, Instituto Brazileiro de Geografia e Estadistica; IBGE, www.ibge.gov.br). The geographical application distribution pattern of mineral fertilizers may be considered as an important field of research. Understanding geographical disparities in fertilization level requires a complex approach. This includes evaluation of the availability of nutrients in the soil (and related soil properties e.g. CEC and texture), the input of nutrients with fertilizer application, and the removal of nutrients by harvested yields. When all these data are compiled, it is possible to evaluate the balance of particular nutrients for certain areas, and make conclusions as to where agricultural practices should be optimized. This kind of research is somewhat complicated, because it relies on completely different sources of data, usually from incomparable data sources, e.g. soil characteristics attributed to soil type areas, in contrast to yields by administrative regions, or farms. A priority tool in this case is the Geographical Information System (GIS), which enables attribution of data from different fields to the same territorial units, and makes possible integration of these data in an "inputoutput" model, where "input" is the natural availability of a nutrient in the soil plus fertilization, and "output" export of the same nutrient with the removed harvested yield.
Resumo:
This dissertation consists of three standalone articles that contribute to the economics literature concerning technology adoption, information diffusion, and network economics in one way or another, using a couple of primary data sources from Ethiopia. The first empirical paper identifies the main behavioral factors affecting the adoption of brand new (radical) and upgraded (incremental) bioenergy innovations in Ethiopia. The results highlight the importance of targeting different instruments to increase the adoption rate of the two types of innovations. The second and the third empirical papers of this thesis, use primary data collected from 3,693 high school students in Ethiopia, and shed light on how we should select informants to effectively and equitably disseminate new information, mainly concerning environmental issues. There are different well-recognized standard centrality measures that are used to select informants. These standard centrality measures, however, are based on the network topology---shaped only by the number of connections---and fail to incorporate the intrinsic motivations of the informants. This thesis introduces an augmented centrality measure (ACM) by modifying the eigenvector centrality measure through weighting the adjacency matrix with the altruism levels of connected nodes. The results from the two papers suggest that targeting informants based on network position and behavioral attributes ensures more effective and equitable (gender perspective) transmission of information in social networks than selecting informants on network centrality measures alone. Notably, when the information is concerned with environmental issues.
Resumo:
Social interactions have been the focus of social science research for a century, but their study has recently been revolutionized by novel data sources and by methods from computer science, network science, and complex systems science. The study of social interactions is crucial for understanding complex societal behaviours. Social interactions are naturally represented as networks, which have emerged as a unifying mathematical language to understand structural and dynamical aspects of socio-technical systems. Networks are, however, highly dimensional objects, especially when considering the scales of real-world systems and the need to model the temporal dimension. Hence the study of empirical data from social systems is challenging both from a conceptual and a computational standpoint. A possible approach to tackling such a challenge is to use dimensionality reduction techniques that represent network entities in a low-dimensional feature space, preserving some desired properties of the original data. Low-dimensional vector space representations, also known as network embeddings, have been extensively studied, also as a way to feed network data to machine learning algorithms. Network embeddings were initially developed for static networks and then extended to incorporate temporal network data. We focus on dimensionality reduction techniques for time-resolved social interaction data modelled as temporal networks. We introduce a novel embedding technique that models the temporal and structural similarities of events rather than nodes. Using empirical data on social interactions, we show that this representation captures information relevant for the study of dynamical processes unfolding over the network, such as epidemic spreading. We then turn to another large-scale dataset on social interactions: a popular Web-based crowdfunding platform. We show that tensor-based representations of the data and dimensionality reduction techniques such as tensor factorization allow us to uncover the structural and temporal aspects of the system and to relate them to geographic and temporal activity patterns.
Resumo:
In medicine, innovation depends on a better knowledge of the human body mechanism, which represents a complex system of multi-scale constituents. Unraveling the complexity underneath diseases proves to be challenging. A deep understanding of the inner workings comes with dealing with many heterogeneous information. Exploring the molecular status and the organization of genes, proteins, metabolites provides insights on what is driving a disease, from aggressiveness to curability. Molecular constituents, however, are only the building blocks of the human body and cannot currently tell the whole story of diseases. This is why nowadays attention is growing towards the contemporary exploitation of multi-scale information. Holistic methods are then drawing interest to address the problem of integrating heterogeneous data. The heterogeneity may derive from the diversity across data types and from the diversity within diseases. Here, four studies conducted data integration using customly designed workflows that implement novel methods and views to tackle the heterogeneous characterization of diseases. The first study devoted to determine shared gene regulatory signatures for onco-hematology and it showed partial co-regulation across blood-related diseases. The second study focused on Acute Myeloid Leukemia and refined the unsupervised integration of genomic alterations, which turned out to better resemble clinical practice. In the third study, network integration for artherosclerosis demonstrated, as a proof of concept, the impact of network intelligibility when it comes to model heterogeneous data, which showed to accelerate the identification of new potential pharmaceutical targets. Lastly, the fourth study introduced a new method to integrate multiple data types in a unique latent heterogeneous-representation that facilitated the selection of important data types to predict the tumour stage of invasive ductal carcinoma. The results of these four studies laid the groundwork to ease the detection of new biomarkers ultimately beneficial to medical practice and to the ever-growing field of Personalized Medicine.
Resumo:
The pervasive availability of connected devices in any industrial and societal sector is pushing for an evolution of the well-established cloud computing model. The emerging paradigm of the cloud continuum embraces this decentralization trend and envisions virtualized computing resources physically located between traditional datacenters and data sources. By totally or partially executing closer to the network edge, applications can have quicker reactions to events, thus enabling advanced forms of automation and intelligence. However, these applications also induce new data-intensive workloads with low-latency constraints that require the adoption of specialized resources, such as high-performance communication options (e.g., RDMA, DPDK, XDP, etc.). Unfortunately, cloud providers still struggle to integrate these options into their infrastructures. That risks undermining the principle of generality that underlies the cloud computing scale economy by forcing developers to tailor their code to low-level APIs, non-standard programming models, and static execution environments. This thesis proposes a novel system architecture to empower cloud platforms across the whole cloud continuum with Network Acceleration as a Service (NAaaS). To provide commodity yet efficient access to acceleration, this architecture defines a layer of agnostic high-performance I/O APIs, exposed to applications and clearly separated from the heterogeneous protocols, interfaces, and hardware devices that implement it. A novel system component embodies this decoupling by offering a set of agnostic OS features to applications: memory management for zero-copy transfers, asynchronous I/O processing, and efficient packet scheduling. This thesis also explores the design space of the possible implementations of this architecture by proposing two reference middleware systems and by adopting them to support interactive use cases in the cloud continuum: a serverless platform and an Industry 4.0 scenario. A detailed discussion and a thorough performance evaluation demonstrate that the proposed architecture is suitable to enable the easy-to-use, flexible integration of modern network acceleration into next-generation cloud platforms.
Resumo:
The scientific success of the LHC experiments at CERN highly depends on the availability of computing resources which efficiently store, process, and analyse the amount of data collected every year. This is ensured by the Worldwide LHC Computing Grid infrastructure that connect computing centres distributed all over the world with high performance network. LHC has an ambitious experimental program for the coming years, which includes large investments and improvements both for the hardware of the detectors and for the software and computing systems, in order to deal with the huge increase in the event rate expected from the High Luminosity LHC (HL-LHC) phase and consequently with the huge amount of data that will be produced. Since few years the role of Artificial Intelligence has become relevant in the High Energy Physics (HEP) world. Machine Learning (ML) and Deep Learning algorithms have been successfully used in many areas of HEP, like online and offline reconstruction programs, detector simulation, object reconstruction, identification, Monte Carlo generation, and surely they will be crucial in the HL-LHC phase. This thesis aims at contributing to a CMS R&D project, regarding a ML "as a Service" solution for HEP needs (MLaaS4HEP). It consists in a data-service able to perform an entire ML pipeline (in terms of reading data, processing data, training ML models, serving predictions) in a completely model-agnostic fashion, directly using ROOT files of arbitrary size from local or distributed data sources. This framework has been updated adding new features in the data preprocessing phase, allowing more flexibility to the user. Since the MLaaS4HEP framework is experiment agnostic, the ATLAS Higgs Boson ML challenge has been chosen as physics use case, with the aim to test MLaaS4HEP and the contribution done with this work.
Resumo:
A novel two-component system, CbrA-CbrB, was discovered in Pseudomonas aeruginosa; cbrA and cbrB mutants of strain PAO were found to be unable to use several amino acids (such as arginine, histidine and proline), polyamines and agmatine as sole carbon and nitrogen sources. These mutants were also unable to use, or used poorly, many other carbon sources, including mannitol, glucose, pyruvate and citrate. A 7 kb EcoRI fragment carrying the cbrA and cbrB genes was cloned and sequenced. The cbrA and cbrB genes encode a sensor/histidine kinase (Mr 108 379, 983 residues) and a cognate response regulator (Mr 52 254, 478 residues) respectively. The amino-terminal half (490 residues) of CbrA appears to be a sensor membrane domain, as predicted by 12 possible transmembrane helices, whereas the carboxy-terminal part shares homology with the histidine kinases of the NtrB family. The CbrB response regulator shows similarity to the NtrC family members. Complementation and primer extension experiments indicated that cbrA and cbrB are transcribed from separate promoters. In cbrA or cbrB mutants, as well as in the allelic argR9901 and argR9902 mutants, the aot-argR operon was not induced by arginine, indicating an essential role for this two-component system in the expression of the ArgR-dependent catabolic pathways, including the aruCFGDB operon specifying the major aerobic arginine catabolic pathway. The histidine catabolic enzyme histidase was not expressed in cbrAB mutants, even in the presence of histidine. In contrast, proline dehydrogenase, responsible for proline utilization (Pru), was expressed in a cbrB mutant at a level comparable with that of the wild-type strain. When succinate or other C4-dicarboxylates were added to proline medium at 1 mM, the cbrB mutant was restored to a Pru+ phenotype. Such a succinate-dependent Pru+ property was almost abolished by 20 mM ammonia. In conclusion, the CbrA-CbrB system controls the expression of several catabolic pathways and, perhaps together with the NtrB-NtrC system, appears to ensure the intracellular carbon: nitrogen balance in P. aeruginosa.
Resumo:
The first part of my work consisted in samplings conduced in nine different localities of the salento peninsula and Apulia (Italy): Costa Merlata (BR), Punta Penne (BR), Santa Cesarea terme (LE), Santa Caterina (LE), Torre Inserraglio (LE), Torre Guaceto (BR), Porto Cesareo (LE), Otranto (LE), Isole Tremiti (FG). I collected data of species percentage covering from the infralittoral rocky zone, using squares of 50x50 cm. We considered 3 sites for location and 10 replicates for each site, which has been taken randomly. Then I took other data about the same places, collected in some years, and I combined them together, to do a spatial analysis. So I started from a data set of 1896 samples but I decided not to consider time as a factor because I have reason to think that in this period of time anthropogenic stressors and their effects (if present), didn’t change considerably. The response variable I’ve analysed is the covering percentage of an amount of 243 species (subsequently merged into 32 functional groups), including seaweeds, invertebrates, sediment and rock. 2 After the sampling, I have been spent a period of two months at the Hopkins Marine Station of Stanford University, in Monterey (California,USA), at Fiorenza Micheli's laboratory. I've been carried out statistical analysis on my data set, using the software PRIMER 6. My explorative analysis starts with a nMDS in PRIMER 6, considering the original data matrix without, for the moment, the effect of stressors. What comes out is a good separation between localities and it confirms the result of ANOSIM analysis conduced on the original data matrix. What is possible to ensure is that there is not a separation led by a geographic pattern, but there should be something else that leads the differences. Is clear the presence of at least three groups: one composed by Porto cesareo, Torre Guaceto and Isole tremiti (the only marine protected areas considered in this work); another one by Otranto, and the last one by the rest of little, impacted localities. Inside the localities that include MPA(Marine Protected Areas), is also possible to observe a sort of grouping between protected and controlled areas. What comes out from SIMPER analysis is that the most of the species involved in leading differences between populations are not rare species, like: Cystoseira spp., Mytilus sp. and ECR. Moreover I assigned discrete values (0,1,2) of each stressor to all the sites I considered, in relation to the intensity with which the anthropogenic factor affect the localities. 3 Then I tried to estabilish if there were some significant interactions between stressors: by using Spearman rank correlation and Spearman tables of significance, and taking into account 17 grades of freedom, the outcome shows some significant stressors interactions. Then I built a nMDS considering the stressors as response variable. The result was positive: localities are well separeted by stressors. Consequently I related the matrix with 'localities and species' with the 'localities and stressors' one. Stressors combination explains with a good significance level the variability inside my populations. I tried with all the possible data transformations (none, square root, fourth root, log (X+1), P/A), but the fourth root seemed to be the best one, with the highest level of significativity, meaning that also rare species can influence the result. The challenge will be to characterize better which kind of stressors (including also natural ones), act on the ecosystem; and give them a quantitative and more accurate values, trying to understand how they interact (in an additive or non-additive way).
Resumo:
It is contested that the mineral dust found in Greenlandic ice cores during the Holocene stems from multiple source areas. Particles entrained above a more productive, primary source dominate the signal’s multi-seasonal average. Data in sub-annual resolution, however, reveal at least one further source. Whereas distinct inputs from the primary source are visible in elevated concentration levels, various inputs of the secondary source(s) are reflected by multiple maxima in the coarse particle percentage. As long as the dust sources’ respective seasonal cycles are preserved, primary and secondary source can be distinguished. Since the two source’s ejecta eventually detected differ in size, which can be attributed to a change in atmospheric residence times, it is suggested that the secondary source is located in closer proximity to the drilling site than the primary one.
Resumo:
This research tests the role of perceived support from multinational corporations and host-country nationals for the adjustment of expatriates and their spouses while on international assignments. The investigation is carried out with matched data from 134 expatriates and their spouses based in foreign multinationals in Malaysia. The results highlight the different reliance on support providers that expatriates and their accompanying spouses found beneficial for acclimatizing to the host-country environment. Improved adjustment in turn was found to have positive effects on expatriates' performance. The research findings have implications for both international human resource management researchers and practitioners. © 2014 © 2014 Taylor & Francis.
Resumo:
Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.
This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.
The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new
individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the
refreshment sample itself. As we illustrate, nonignorable unit nonresponse
can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse
in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.
The second method incorporates informative prior beliefs about
marginal probabilities into Bayesian latent class models for categorical data.
The basic idea is to append synthetic observations to the original data such that
(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.
We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.
The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.
Resumo:
Giardia duodenalis is a flagellate protozoan that parasitizes humans and several other mammals. Protozoan contamination has been regularly documented at important environmental sites, although most of these studies were performed at the species level. There is a lack of studies that correlate environmental contamination and clinical infections in the same region. The aim of this study is to evaluate the genetic diversity of a set of clinical and environmental samples and to use the obtained data to characterize the genetic profile of the distribution of G. duodenalis and the potential for zoonotic transmission in a metropolitan region of Brazil. The genetic assemblages and subtypes of G. duodenalis isolates obtained from hospitals, a veterinary clinic, a day-care center and important environmental sites were determined via multilocus sequence-based genotyping using three unlinked gene loci. Cysts of Giardia were detected at all of the environmental sites. Mixed assemblages were detected in 25% of the total samples, and an elevated number of haplotypes was identified. The main haplotypes were shared among the groups, and new subtypes were identified at all loci. Ten multilocus genotypes were identified: 7 for assemblage A and 3 for assemblage B. There is persistent G. duodenalis contamination at important environmental sites in the city. The identified mixed assemblages likely represent mixed infections, suggesting high endemicity of Giardia in these hosts. Most Giardia isolates obtained in this study displayed zoonotic potential. The high degree of genetic diversity in the isolates obtained from both clinical and environmental samples suggests that multiple sources of infection are likely responsible for the detected contamination events. The finding that many multilocus genotypes (MLGs) and haplotypes are shared by different groups suggests that these sources of infection may be related and indicates that there is a notable risk of human infection caused by Giardia in this region.
Resumo:
This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.
Resumo:
The performance of three analytical methods for multiple-frequency bioelectrical impedance analysis (MFBIA) data was assessed. The methods were the established method of Cole and Cole, the newly proposed method of Siconolfi and co-workers and a modification of this procedure. Method performance was assessed from the adequacy of the curve fitting techniques, as judged by the correlation coefficient and standard error of the estimate, and the accuracy of the different methods in determining the theoretical values of impedance parameters describing a set of model electrical circuits. The experimental data were well fitted by all curve-fitting procedures (r = 0.9 with SEE 0.3 to 3.5% or better for most circuit-procedure combinations). Cole-Cole modelling provided the most accurate estimates of circuit impedance values, generally within 1-2% of the theoretical values, followed by the Siconolfi procedure using a sixth-order polynomial regression (1-6% variation). None of the methods, however, accurately estimated circuit parameters when the measured impedances were low (<20 Omega) reflecting the electronic limits of the impedance meter used. These data suggest that Cole-Cole modelling remains the preferred method for the analysis of MFBIA data.