893 resultados para data driven approach


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Isotopic data are currently becoming an important source of information regardingsources, evolution and mixing processes of water in hydrogeologic systems. However, itis not clear how to treat with statistics the geochemical data and the isotopic datatogether. We propose to introduce the isotopic information as new parts, and applycompositional data analysis with the resulting increased composition. Results areequivalent to downscale the classical isotopic delta variables, because they are alreadyrelative (as needed in the compositional framework) and isotopic variations are almostalways very small. This methodology is illustrated and tested with the study of theLlobregat River Basin (Barcelona, NE Spain), where it is shown that, though verysmall, isotopic variations comp lement geochemical principal components, and help inthe better identification of pollution sources

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background Multiple logistic regression is precluded from many practical applications in ecology that aim to predict the geographic distributions of species because it requires absence data, which are rarely available or are unreliable. In order to use multiple logistic regression, many studies have simulated "pseudo-absences" through a number of strategies, but it is unknown how the choice of strategy influences models and their geographic predictions of species. In this paper we evaluate the effect of several prevailing pseudo-absence strategies on the predictions of the geographic distribution of a virtual species whose "true" distribution and relationship to three environmental predictors was predefined. We evaluated the effect of using a) real absences b) pseudo-absences selected randomly from the background and c) two-step approaches: pseudo-absences selected from low suitability areas predicted by either Ecological Niche Factor Analysis: (ENFA) or BIOCLIM. We compared how the choice of pseudo-absence strategy affected model fit, predictive power, and information-theoretic model selection results. Results Models built with true absences had the best predictive power, best discriminatory power, and the "true" model (the one that contained the correct predictors) was supported by the data according to AIC, as expected. Models based on random pseudo-absences had among the lowest fit, but yielded the second highest AUC value (0.97), and the "true" model was also supported by the data. Models based on two-step approaches had intermediate fit, the lowest predictive power, and the "true" model was not supported by the data. Conclusion If ecologists wish to build parsimonious GLM models that will allow them to make robust predictions, a reasonable approach is to use a large number of randomly selected pseudo-absences, and perform model selection based on an information theoretic approach. However, the resulting models can be expected to have limited fit.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Crystallographic data about T-Cell Receptor - peptide - major histocompatibility complex class I (TCRpMHC) interaction have revealed extremely diverse TCR binding modes triggering antigen recognition. Understanding the molecular basis that governs TCR orientation over pMHC is still a considerable challenge. We present a simplified rigid approach applied on all non-redundant TCRpMHC crystal structures available. The CHARMM force field in combination with the FACTS implicit solvation model is used to study the role of long-distance interactions between the TCR and pMHC. We demonstrate that the sum of the coulomb interactions and the electrostatic solvation energies is sufficient to identify two orientations corresponding to energetic minima at 0° and 180° from the native orientation. Interestingly, these results are shown to be robust upon small structural variations of the TCR such as changes induced by Molecular Dynamics simulations, suggesting that shape complementarity is not required to obtain a reliable signal. Accurate energy minima are also identified by confronting unbound TCR crystal structures to pMHC. Furthermore, we decompose the electrostatic energy into residue contributions to estimate their role in the overall orientation. Results show that most of the driving force leading to the formation of the complex is defined by CDR1,2/MHC interactions. This long-distance contribution appears to be independent from the binding process itself, since it is reliably identified without considering neither short-range energy terms nor CDR induced fit upon binding. Ultimately, we present an attempt to predict the TCR/pMHC binding mode for a TCR structure obtained by homology modeling. The simplicity of the approach and the absence of any fitted parameters make it also easily applicable to other types of macromolecular protein complexes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Foreign trade statistics are the main data source to the study of international trade.However its accuracy has been under suspicion since Morgernstern published hisfamous work in 1963. Federico and Tena (1991) have resumed the question arguing thatthey can be useful in an adequate level of aggregation. But the geographical assignmentproblem remains unsolved. This article focuses on the spatial variable through theanalysis of the reliability of textile international data for 1913. A geographical biasarises between export and import series, but because of its quantitative importance it canbe negligible in an international scale.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Whether for investigative or intelligence aims, crime analysts often face up the necessity to analyse the spatiotemporal distribution of crimes or traces left by suspects. This article presents a visualisation methodology supporting recurrent practical analytical tasks such as the detection of crime series or the analysis of traces left by digital devices like mobile phone or GPS devices. The proposed approach has led to the development of a dedicated tool that has proven its effectiveness in real inquiries and intelligence practices. It supports a more fluent visual analysis of the collected data and may provide critical clues to support police operations as exemplified by the presented case studies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

ObjectiveCandidate genes for non-alcoholic fatty liver disease (NAFLD) identified by a bioinformatics approach were examined for variant associations to quantitative traits of NAFLD-related phenotypes.Research Design and MethodsBy integrating public database text mining, trans-organism protein-protein interaction transferal, and information on liver protein expression a protein-protein interaction network was constructed and from this a smaller isolated interactome was identified. Five genes from this interactome were selected for genetic analysis. Twenty-one tag single-nucleotide polymorphisms (SNPs) which captured all common variation in these genes were genotyped in 10,196 Danes, and analyzed for association with NAFLD-related quantitative traits, type 2 diabetes (T2D), central obesity, and WHO-defined metabolic syndrome (MetS).Results273 genes were included in the protein-protein interaction analysis and EHHADH, ECHS1, HADHA, HADHB, and ACADL were selected for further examination. A total of 10 nominal statistical significant associations (P<0.05) to quantitative metabolic traits were identified. Also, the case-control study showed associations between variation in the five genes and T2D, central obesity, and MetS, respectively. Bonferroni adjustments for multiple testing negated all associations.ConclusionsUsing a bioinformatics approach we identified five candidate genes for NAFLD. However, we failed to provide evidence of associations with major effects between SNPs in these five genes and NAFLD-related quantitative traits, T2D, central obesity, and MetS.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost-effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool-Seq data from diverse D. melanogaster populations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A major issue in the application of waveform inversion methods to crosshole georadar data is the accurate estimation of the source wavelet. Here, we explore the viability and robustness of incorporating this step into a time-domain waveform inversion procedure through an iterative deconvolution approach. Our results indicate that, at least in non-dispersive electrical environments, such an approach provides remarkably accurate and robust estimates of the source wavelet even in the presence of strong heterogeneity in both the dielectric permittivity and electrical conductivity. Our results also indicate that the proposed source wavelet estimation approach is relatively insensitive to ambient noise and to the phase characteristics of the starting wavelet. Finally, there appears to be little-to-no trade-off between the wavelet estimation and the tomographic imaging procedures.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This project develops a smartphone-based prototype system that supplements the 511 system to improve its dynamic traffic routing service to state highway users under non-recurrent congestion. This system will save considerable time to provide crucial traffic information and en-route assistance to travelers for them to avoid being trapped in traffic congestion due to accidents, work zones, hazards, or special events. It also creates a feedback loop between travelers and responsible agencies that enable the state to effectively collect, fuse, and analyze crowd-sourced data for next-gen transportation planning and management. This project can result in substantial economic savings (e.g. less traffic congestion, reduced fuel wastage and emissions) and safety benefits for the freight industry and society due to better dissemination of real-time traffic information by highway users. Such benefits will increase significantly in future with the expected increase in freight traffic on the network. The proposed system also has the flexibility to be integrated with various transportation management modules to assist state agencies to improve transportation services and daily operations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending the corresponding approaches to the regional scale represents a major, and as-of-yet largely unresolved, challenge. To address this problem, we have developed an upscaling procedure based on a Bayesian sequential simulation approach. This method is then applied to the stochastic integration of low-resolution, regional-scale electrical resistivity tomography (ERT) data in combination with high-resolution, local-scale downhole measurements of the hydraulic and electrical conductivities. Finally, the overall viability of this upscaling approach is tested and verified by performing and comparing flow and transport simulation through the original and the upscaled hydraulic conductivity fields. Our results indicate that the proposed procedure does indeed allow for obtaining remarkably faithful estimates of the regional-scale hydraulic conductivity structure and correspondingly reliable predictions of the transport characteristics over relatively long distances.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The use of herbicides in agriculture may lead to environmental problems, such as surface water pollution, with a potential risk for aquatic organisms. The herbicide glyphosate is the most used active ingredient in the world and in Switzerland. In the Lavaux vineyards it is nearly the only molecule applied. This work aimed at studying its fate in soils and its transfer to surface waters, using a multi-scale approach: from molecular (10-9 m) and microscopic scales (10-6 m), to macroscopic (m) and landscape ones (103 m). First of all, an analytical method was developed for the trace level quantification of this widely used herbicide and its main by-product, aminomethylphosphonic acid (AMPA). Due to their polar nature, their derivatization with 9-fluorenylmethyl chloroformate (FMOC-Cl) was done prior to their concentration and purification by solid phase extraction. They were then analyzed by ultra performance liquid chromatography coupled with tandem mass spectrometry (UPLC-MS/MS). The method was tested in different aqueous matrices with spiking tests and validated for the matrix effect correction in relevant environmental samples. Calibration curves established between 10 and 1000ng/l showed r2 values above 0.989, mean recoveries varied between 86 and 133% and limits of detection and quantification of the method were as low as 5 and 10ng/l respectively. At the parcel scale, two parcels of the Lavaux vineyard area, located near the Lutrive River at 6km to the east of Lausanne, were monitored to assess to which extent glyphosate and AMPA were retained in the soil or exported to surface waters. They were equipped at their bottom with porous ceramic cups and runoff collectors, which allowed retrieving water samples for the growing seasons 2010 and 2011. Results revealed that the mobility of glyphosate and AMPA in the unsaturated zone was likely driven by the precipitation regime and the soil characteristics, such as slope, porosity structure and layer permeability discrepancy. Elevated glyphosate and AMPA concentrations were measured at 60 and 80 cm depth at parcel bottoms, suggesting their infiltration in the upper parts of the parcels and the presence of preferential flow in the studied parcels. Indeed, the succession of rainy days induced the gradual saturation of the soil porosity, leading to rapid infiltration through macropores, as well as surface runoff formation. Furthermore, the presence of more impervious weathered marls at 100 cm depth induced throughflows, the importance of which for the lateral transport of the herbicide molecules was determined by the slope steepness. Important rainfall events (>10 mm/day) were clearly exporting molecules from the soil top layer, as indicated by important concentrations in runoff samples. A mass balance showed that total loss (10-20%) mainly occurred through surface runoff (96%) and, to a minor extent, by throughflows in soils (4%), with subsequent exfiltration to surface waters. Observations made in the Lutrive River revealed interesting details of glyphosate and AMPA dynamics in urbanized landscapes, such as the Lavaux vineyards. Indeed, besides their physical and chemical properties, herbicide dynamics at the catchment level strongly depend on application rates, precipitation regime, land use and also on the presence of drains or constructed channels. Elevated concentrations, up to 4970 ng/l, observed just after the application, confirmed the diffuse export of these compounds from the vineyard area by surface runoff during main rain events. From April to September 2011, a total load of 7.1 kg was calculated, with 85% coming from vineyards and minor urban sources and 15% from arable crops. Small vineyard surfaces could generate high concentrations of herbicides and contribute considerably to the total load calculated at the outlet, due to their steep slopes (~10%). The extrapolated total amount transferred yearly from the Lavaux vineyards to the Lake of Geneva was of 190kg. At the molecular scale, the possible involvement of dissolved organic matter (DOM) in glyphosate and copper transport was studied using UV/Vis fluorescence spectroscopy. Combined with parallel factor (PARAFAC) analysis, this technique allowed characterizing DOM of soil and surface water samples from the studied vineyard area. Glyphosate concentrations were linked to the fulvic-like spectroscopic signature of DOM in soil water samples, as well as to copper, suggesting the formation of ternary complexes. In surface water samples, its concentrations were also correlated to copper ones, but not in a significant way to the fulvic-like signature. Quenching experiments with standards confirmed field tendencies in the laboratory, with a stronger decrease in fluorescence intensity for fulvic-like fluorophore than for more aromatic ones. Lastly, based on maximum concentrations measured in the river, an environmental risk for these compounds was assessed, using laboratory tests and ecotoxicity data from the literature. In our case and with the methodology applied, the risk towards aquatic species was found negligible (RF<1).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

1. Few examples of habitat-modelling studies of rare and endangered species exist in the literature, although from a conservation perspective predicting their distribution would prove particularly useful. Paucity of data and lack of valid absences are the probable reasons for this shortcoming. Analytic solutions to accommodate the lack of absence include the ecological niche factor analysis (ENFA) and the use of generalized linear models (GLM) with simulated pseudo-absences. 2. In this study we tested a new approach to generating pseudo-absences, based on a preliminary ENFA habitat suitability (HS) map, for the endangered species Eryngium alpinum. This method of generating pseudo-absences was compared with two others: (i) use of a GLM with pseudo-absences generated totally at random, and (ii) use of an ENFA only. 3. The influence of two different spatial resolutions (i.e. grain) was also assessed for tackling the dilemma of quality (grain) vs. quantity (number of occurrences). Each combination of the three above-mentioned methods with the two grains generated a distinct HS map. 4. Four evaluation measures were used for comparing these HS maps: total deviance explained, best kappa, Gini coefficient and minimal predicted area (MPA). The last is a new evaluation criterion proposed in this study. 5. Results showed that (i) GLM models using ENFA-weighted pseudo-absence provide better results, except for the MPA value, and that (ii) quality (spatial resolution and locational accuracy) of the data appears to be more important than quantity (number of occurrences). Furthermore, the proposed MPA value is suggested as a useful measure of model evaluation when used to complement classical statistical measures. 6. Synthesis and applications. We suggest that the use of ENFA-weighted pseudo-absence is a possible way to enhance the quality of GLM-based potential distribution maps and that data quality (i.e. spatial resolution) prevails over quantity (i.e. number of data). Increased accuracy of potential distribution maps could help to define better suitable areas for species protection and reintroduction.