913 resultados para Nearest Neighbour
Resumo:
Background: Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.Results: A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.Conclusions: AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin. © 2013 Dimitrov et al.; licensee BioMed Central Ltd.
Resumo:
2000 Mathematics Subject Classification: 68T01, 62H30, 32C09.
Resumo:
The synthesis of a novel heterocyclic–telechelic polymer, α,ω-oxetanyl-telechelic poly(3-nitratomethyl-3-methyl oxetane), is described. Infrared spectroscopy (IR), gel permeation chromatography (GPC), and nuclear magnetic resonance (NMR) spectroscopy have been used to confirm the successful synthesis, demonstrating the presence of the telechelic-oxetanyl moieties. Synthesis of the terminal functionalities has been achieved via displacement of nitrato groups, in a manner similar to that employed with other leaving groups such as azido, bromo, and nitro, initiated by nucleophiles. In the present case, displacement occurs on the ends of a nitrato-functionalized polymer driven by the formation of sodium nitrate, which is supported by the polar aprotic solvent N,N-dimethyl formamide. The formation of an alkoxide at the polymer chain ends is favored and allows internal back-biting to the nearest carbon bearing the nitrato group, intrinsically in an SN2(i) reaction, leading to α,ω-oxetanyl functionalization. The telechelic-oxetanyl moieties have the potential to be cross-linked by chemical (e.g., acidic) or radiative (e.g., ultraviolet) curing methods without the use of high temperatures, usually below 100°C. This type of material was designed for future use as a contraband simulant, whereby it would form the predominant constituent of elastomeric composites comprising rubbery polymer with small quantities of solids, typically crystals of contraband substances, such as explosives or narcotics. This method also provides an alternative approach to ring closure and synthesis of heterocycles.
Resumo:
This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan.
Resumo:
Popular dimension reduction and visualisation algorithms rely on the assumption that input dissimilarities are typically Euclidean, for instance Metric Multidimensional Scaling, t-distributed Stochastic Neighbour Embedding and the Gaussian Process Latent Variable Model. It is well known that this assumption does not hold for most datasets and often high-dimensional data sits upon a manifold of unknown global geometry. We present a method for improving the manifold charting process, coupled with Elastic MDS, such that we no longer assume that the manifold is Euclidean, or of any particular structure. We draw on the benefits of different dissimilarity measures allowing for the relative responsibilities, under a linear combination, to drive the visualisation process.
Resumo:
The ‘currency war’, as it has become known, has three aspects: 1) the inflexible pegs of undervalued currencies; 2) recent attempts by floating exchange-rate countries to resist currency appreciation; 3) quantitative easing. Europe should primarily be concerned about the first issue, which relates to the renewed debate about the international monetary system. The attempts of floating exchange-rate countries to resist currency appreciation are generally justified while China retains a peg. Quantitative easing cannot be deemed a ‘beggar-thy-neighbour’ policy as long as the Fed’s policy is geared towards price stability. Current US inflationary expectations are at historically low levels. Central banks should come to an agreement about the definition of price stability at a time of deflationary pressures. The euro’s exchange rate has not been greatly impacted by the recent currency war; the euro continues to be overvalued, but less than before.
Resumo:
Natural, unenriched Everglades wetlands are known to be limited by phosphorus (P) and responsive to P enrichment. However, whole-ecosystem evaluations of experimental P additions are rare in Everglades or other wetlands. We tested the response of the Everglades wetland ecosystem to continuous, low-level additions of P (0, 5, 15, and 30 μg L−1 above ambient) in replicate, 100 m flow-through flumes located in unenriched Everglades National Park. After the first six months of dosing, the concentration and standing stock of phosphorus increased in the surface water, periphyton, and flocculent detrital layer, but not in the soil or macrophytes. Of the ecosystem components measured, total P concentration increased the most in the floating periphyton mat (30 μg L−1: mean = 1916 μg P g−1, control: mean = 149 μg P g−1), while the flocculent detrital layer stored most of the accumulated P (30 μg L−1: mean = 1.732 g P m−2, control: mean = 0.769 g P m−2). Significant short-term responses of P concentration and standing stock were observed primarily in the high dose (30 μg L−1 above ambient) treatment. In addition, the biomass and estimated P standing stock of aquatic consumers increased in the 30 and 5 μg L−1 treatments. Alterations in P concentration and standing stock occurred only at the upstream ends of the flumes nearest to the point source of added nutrient. The total amount of P stored by the ecosystem within the flume increased with P dosing, although the ecosystem in the flumes retained only a small proportion of the P added over the first six months. These results indicate that oligotrophic Everglades wetlands respond rapidly to short-term, low-level P enrichment, and the initial response is most noticeable in the periphyton and flocculent detrital layer.
Resumo:
This thesis chronicles the design and implementation of a Internet/Intranet and database based application for the quality control of hurricane surface wind observations. A quality control session consists of selecting desired observation types to be viewed and determining a storm track based time window for viewing the data. All observations of the selected types are then plotted in a storm relative view for the chosen time window and geography is positioned for the storm-center time about which an objective analysis can be performed. Users then make decisions about data validity through visual nearest-neighbor comparison and inspection. The project employed an Object Oriented iterative development method from beginning to end and its implementation primarily features the Java programming language. ^
Resumo:
The major objective of this study was to determine the relative importance of landscape factors, local abiotic factors, and biotic interactions in influencing tadpole community structure in temporary wetlands. I also examined the influence of agricultural activities in South-central Florida by comparing tadpole communities in native prairie wetlands (a relatively unmodified habitat) at the Kissimmee Prairie Sanctuary (KPS) to tadpole communities in three agriculturally modified habitats found at MacArthur Agro-Ecology Research Center (MAERC). Environmental characteristics were measured in 24 isolated wetlands, and tadpoles were sampled using throw-traps and dipnets during the 1999 wet season (June–October). Landscape characteristics were expected to predominately influence all aspects of community structure because anurans associated with temporary wetland systems are likely to exist as metapopulations. Both landscape characteristics (wetland proximity to nearest woodland and the amount of woodland surrounding the wetland) and biotic interactions (fish predation) had the largest influence on tadpole community structure. Predatory fish influenced tadpole communities more than expected due to the ubiquity of wetlands, lack of topographic relief, and dispersal abilities of several fish species. Differences in tadpole community structure among habitat types were attributed to differences in woodland attributes and susceptibility to fish colonization. Furthermore, agricultural modification of prairie habitats in South-central Florida may benefit amphibian communities, particularly woodland-dwelling species that are unable to coexist with predatory fish. From a conservation standpoint, temporary wetlands proximal to woodland areas and isolated from permanent water sources appear to be most important to amphibians. In addition, the high tadpole densities attained in these wetlands suggest that these wetlands serve as biological hotspots within the landscape, and their benefits extend into the adjacent terrestrial matrix. Further research efforts are needed to quantify the biological productivity of these systems and determine spatial dynamics of anurans in surrounding terrestrial habitats. ^
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
Modern geographical databases, which are at the core of geographic information systems (GIS), store a rich set of aspatial attributes in addition to geographic data. Typically, aspatial information comes in textual and numeric format. Retrieving information constrained on spatial and aspatial data from geodatabases provides GIS users the ability to perform more interesting spatial analyses, and for applications to support composite location-aware searches; for example, in a real estate database: “Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000”. Efficient processing of such queries require combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by nonspatial filter, or viceversa), which can incur large performance overheads. On the other hand, more recently, the amount of geolocation data has grown rapidly in databases due in part to advances in geolocation technologies (e.g., GPS-enabled smartphones) that allow users to associate location data to objects or events. The latter poses potential data ingestion challenges of large data volumes for practical GIS databases. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre-processing task) can be scaled in MapReduce—a widely-adopted parallel programming model for data intensive problems. The evaluation of our algorithms in a Hadoop cluster showed close to linear scalability in building R-tree indexes. Subsequently, we develop efficient algorithms for processing spatial queries with aspatial conditions. Novel techniques for simultaneously indexing spatial with textual and numeric data are developed to that end. Experimental evaluations with real-world, large spatial datasets measured query response times within the sub-second range for most cases, and up to a few seconds for a small number of cases, which is reasonable for interactive applications. Overall, the previous results show that the MapReduce parallel model is suitable for indexing tasks in spatial databases, and the adequate combination of spatial and aspatial attribute indexes can attain acceptable response times for interactive spatial queries with constraints on aspatial data.
Resumo:
We present here a 4-year dataset (2001–2004) on the spatial and temporal patterns of aboveground net primary production (ANPP) by dominant primary producers (sawgrass, periphyton, mangroves, and seagrasses) along two transects in the oligotrophic Florida Everglades coastal landscape. The 17 sites of the Florida Coastal Everglades Long Term Ecological Research (FCE LTER) program are located along fresh-estuarine gradients in Shark River Slough (SRS) and Taylor River/C-111/Florida Bay (TS/Ph) basins that drain the western and southern Everglades, respectively. Within the SRS basin, sawgrass and periphyton ANPP did not differ significantly among sites but mangrove ANPP was highest at the site nearest the Gulf of Mexico. In the southern Everglades transect, there was a productivity peak in sawgrass and periphyton at the upper estuarine ecotone within Taylor River but no trends were observed in the C-111 Basin for either primary producer. Over the 4 years, average sawgrass ANPP in both basins ranged from 255 to 606 g m−2 year−1. Average periphyton productivity at SRS and TS/Ph was 17–68 g C m−2 year−1 and 342–10371 g C m−2 year−1, respectively. Mangrove productivity ranged from 340 g m−2 year−1 at Taylor River to 2208 g m−2 year−1 at the lower estuarine Shark River site. Average Thalassia testudinum productivity ranged from 91 to 396 g m−2 year−1 and was 4-fold greater at the site nearest the Gulf of Mexico than in eastern Florida Bay. There were no differences in periphyton productivity at Florida Bay. Interannual comparisons revealed no significant differences within each primary producer at either SRS or TS/Ph with the exception of sawgrass at SRS and the C−111 Basin. Future research will address difficulties in assessing and comparing ANPP of different primary producers along gradients as well as the significance of belowground production to the total productivity of this ecosystem.
Resumo:
The economic development of any region involves some consequences to the environment. The choice of a socially optimal development plan must consider a measure of the strategy's environmental impact. This dissertation tackles this problem by examining environmental impacts of new production activities. The study uses the experience of the Carajás region in the north of Brazil. This region, which prior to the 1960's was an isolated outpost of the Amazon area, was integrated to the rest of the country with a non-sophisticated but strategic road system and eventually became the second largest iron ore mining area in the world. Finally, in the 1980's, the area was linked, by way of a railroad, to the nearest seaport along the Atlantic Ocean. The consequence of such changes was a burst of economic growth along the railroad Corridor and neighboring areas. In this work, a Social Accounting Matrix (SAM) is used to construct a 2-region (Corridor and surrounding area), fixed price, Computable General Equilibrium (CGE) Model to examine the relationship between production and pollution by measuring the different pollution effects of alternative growth strategies. SAMs are a very useful tool to examine the environmental impacts of development by linking production activities to measurable indices of natural resource degradation. The simulation results suggest that the strategies leading to faster economic growth in the short run are also those that lead to faster rates of environmental degradation. The simulations also show that the strategies that leads to faster rates of short run growth do so at the price of a rate of environmental depletion that is unsustainable from a long run perspective. These results, therefore, support the concern expressed by environmental economists and policy makers regarding the possible trade-offs between economic growth and environmental preservation. This stresses the need for a careful analysis of the environmental impacts of alternative growth strategies. ^
Resumo:
Voice communication systems such as Voice-over IP (VoIP), Public Switched Telephone Networks, and Mobile Telephone Networks, are an integral means of human tele-interaction. These systems pose distinctive challenges due to their unique characteristics such as low volume, burstiness and stringent delay/loss requirements across heterogeneous underlying network technologies. Effective quality evaluation methodologies are important for system development and refinement, particularly by adopting user feedback based measurement. Presently, most of the evaluation models are system-centric (Quality of Service or QoS-based), which questioned us to explore a user-centric (Quality of Experience or QoE-based) approach as a step towards the human-centric paradigm of system design. We research an affect-based QoE evaluation framework which attempts to capture users' perception while they are engaged in voice communication. Our modular approach consists of feature extraction from multiple information sources including various affective cues and different classification procedures such as Support Vector Machines (SVM) and k-Nearest Neighbor (kNN). The experimental study is illustrated in depth with detailed analysis of results. The evidences collected provide the potential feasibility of our approach for QoE evaluation and suggest the consideration of human affective attributes in modeling user experience.
Resumo:
Melanomagenesis is influenced by environmental and genetic factors. In normal cells, ultraviolet (UV) induced photoproducts are successfully repaired by the nucleotide excision repair (NER) pathway. Mice carrying mutations in the xeroderma pigmentosum (Xp) complementation group of genes (Xpa-Xpg) lack the NER pathway and are therefore highly sensitive to UV light; however, they do not develop melanoma after UV exposure. In humans, the Endothelin 3 signaling pathway has been linked to melanoma progression and its metastatic potential. Transgenic mice that over-express Edn3 under the control of the Keratin 5 promoter (K5-Edn3) and exhibit a hyperpigmentation phenotype, were crossed with Xp deficient mice. Because melanoma is highly metastatic and many primary malignancies spread via the lymphatic system, analyzing the lymph nodes may serve useful in assessing the possible spread of tumor cells to other tissues. This study aimed to determine whether the over-expression of Edn3 is sufficient to lead to melanoma metastasis to the lymph nodes. Mice were exposed to UV radiation and analyzed for the presence of skin lesions. Mice presenting skin lesions were sacrificed and the nearest lymph nodes were excised and examined for the presence of metastasis. Mice with melanoma skin lesions presented enlarged and hyperpigmented lymph nodes. Diagnosis of melanoma was established by immunostaining with melanocyte and melanoma cell markers, and while UV radiation caused the development of skin lesions in both K5-Edn3 transgenic and control mice, only those mice carrying the K5-Edn3 transgene were found to develop melanoma metastasis to the lymph nodes. These results indicate that over-expression of Edn3 is sufficient to lead to lymph node metastasis in mice exposed to at least one dose of UV radiation.