970 resultados para datasets
Resumo:
In this study, 137 corn distillers dried grains with solubles (DDGS) samples from a range of different geographical origins (Jilin Province of China, Heilongjiang Province of China, USA and Europe) were collected and analysed. Different near infrared spectrometers combined with different chemometric packages were used in two independent laboratories to investigate the feasibility of classifying geographical origin of DDGS. Base on the same dataset, one laboratory developed a partial least square discriminant analysis model and another laboratory developed an orthogonal partial least square discriminant analysis model. Results showed that both models could perfectly classify DDGS samples from different geographical origins. These promising results encourage the development of larger scale efforts to produce datasets which can be used to differentiate the geographical origin of DDGS and such efforts are required to provide higher level food security measures on a global scale.
Resumo:
Data registration refers to a series of techniques for matching or bringing similar objects or datasets together into alignment. These techniques enjoy widespread use in a diverse variety of applications, such as video coding, tracking, object and face detection and recognition, surveillance and satellite imaging, medical image analysis and structure from motion. Registration methods are as numerous as their manifold uses, from pixel level and block or feature based methods to Fourier domain methods.
This book is focused on providing algorithms and image and video techniques for registration and quality performance metrics. The authors provide various assessment metrics for measuring registration quality alongside analyses of registration techniques, introducing and explaining both familiar and state-of-the-art registration methodologies used in a variety of targeted applications.
Key features:
- Provides a state-of-the-art review of image and video registration techniques, allowing readers to develop an understanding of how well the techniques perform by using specific quality assessment criteria
- Addresses a range of applications from familiar image and video processing domains to satellite and medical imaging among others, enabling readers to discover novel methodologies with utility in their own research
- Discusses quality evaluation metrics for each application domain with an interdisciplinary approach from different research perspectives
Resumo:
Classification methods with embedded feature selection capability are very appealing for the analysis of complex processes since they allow the analysis of root causes even when the number of input variables is high. In this work, we investigate the performance of three techniques for classification within a Monte Carlo strategy with the aim of root cause analysis. We consider the naive bayes classifier and the logistic regression model with two different implementations for controlling model complexity, namely, a LASSO-like implementation with a L1 norm regularization and a fully Bayesian implementation of the logistic model, the so called relevance vector machine. Several challenges can arise when estimating such models mainly linked to the characteristics of the data: a large number of input variables, high correlation among subsets of variables, the situation where the number of variables is higher than the number of available data points and the case of unbalanced datasets. Using an ecological and a semiconductor manufacturing dataset, we show advantages and drawbacks of each method, highlighting the superior performance in term of classification accuracy for the relevance vector machine with respect to the other classifiers. Moreover, we show how the combination of the proposed techniques and the Monte Carlo approach can be used to get more robust insights into the problem under analysis when faced with challenging modelling conditions.
Resumo:
Continuous research endeavors on hard turning (HT), both on machine tools and cutting tools, have made the previously reported daunting limits easily attainable in the modern scenario. This presents an opportunity for a systematic investigation on finding the current attainable limits of hard turning using a CNC turret lathe. Accordingly, this study aims to contribute to the existing literature by providing the latest experimental results of hard turning of AISI 4340 steel (69 HRC) using a CBN cutting tool. An orthogonal array was developed using a set of judiciously chosen cutting parameters. Subsequently, the longitudinal turning trials were carried out in accordance with a well-designed full factorial-based Taguchi matrix. The speculation indeed proved correct as a mirror finished optical quality machined surface (an average surface roughness value of 45 nm) was achieved by the conventional cutting method. Furthermore, Signal-to-noise (S/N) ratio analysis, Analysis of variance (ANOVA), and Multiple regression analysis were carried out on the experimental datasets to assert the dominance of each machining variable in dictating the machined surface roughness and to optimize the machining parameters. One of the key findings was that when feed rate during hard turning approaches very low (about 0.02mm/rev), it could alone be most significant (99.16%) parameter in influencing the machined surface roughness (Ra). This has, however also been shown that low feed rate results in high tool wear, so the selection of machining parameters for carrying out hard turning must be governed by a trade-off between the cost and quality considerations.
Resumo:
Nematode neuropeptide systems comprise an exceptionally complex array of similar to 250 peptidic signaling molecules that operate within a structurally simple nervous system of similar to 300 neurons. A relatively complete picture of the neuropeptide complement is available for Caenorhabditis elegans, with 30 flp, 38 ins and 43 nlp genes having been documented; accumulating evidence indicates similar complexity in parasitic nematodes from clades I, III, IV and V. In contrast, the picture for parasitic platyhelminths is less clear, with the limited peptide sequence data available providing concrete evidence for only FMRFamide-like peptide (FLP) and neuropeptide F (NPF) signaling systems, each of which only comprises one or two peptides. With the completion of the Schmidtea meditteranea and Schistosoma mansoni genome projects and expressed sequence tag datasets for other flatworm parasites becoming available, the time is ripe for a detailed reanalysis of neuropeptide signaling in flatworms. Although the actual neuropeptides provide limited obvious value as targets for chemotherapeutic-based control strategies, they do highlight the signaling systems present in these helminths and provide tools for the discovery of more amenable targets such as neuropeptide receptors or neuropeptide processing enzymes. Also, they offer opportunities to evaluate the potential of their associated signaling pathways as targets through RNA interference (RNAi)-based, target validation strategies. Currently, within both helminth phyla, the flp signaling systems appear to merit further investigation as they are intrinsically linked with motor function, a proven target for successful anti-parasitics; it is clear that some nematode NLPs also play a role in motor function and could have similar appeal. At this time, it is unclear if flatworm NPF and nematode INS peptides operate in pathways that have utility for parasite control. Clearly, RNAi-based validation could be a starting point for scoring potential target pathways within neuropeptide signaling for parasiticide discovery programs. Also, recent successes in the application of in planta-based RNAi control strategies for plant parasitic nematodes reveal a strategy whereby neuropeptide encoding genes could become targets for parasite control. The possibility of developing these approaches for the control of animal and human parasites is intriguing, but will require significant advances in the delivery of RNAi-triggers.
Resumo:
Biodiversity continues to decline in the face of increasing anthropogenic pressures such as habitat destruction, exploitation, pollution and introduction of alien species. Existing global databases of species' threat status or population time series are dominated by charismatic species. The collation of datasets with broad taxonomic and biogeographic extents, and that support computation of a range of biodiversity indicators, is necessary to enable better understanding of historical declines and to project - and avert - future declines. We describe and assess a new database of more than 1.6 million samples from 78 countries representing over 28,000 species, collated from existing spatial comparisons of local-scale biodiversity exposed to different intensities and types of anthropogenic pressures, from terrestrial sites around the world. The database contains measurements taken in 208 (of 814) ecoregions, 13 (of 14) biomes, 25 (of 35) biodiversity hotspots and 16 (of 17) megadiverse countries. The database contains more than 1% of the total number of all species described, and more than 1% of the described species within many taxonomic groups - including flowering plants, gymnosperms, birds, mammals, reptiles, amphibians, beetles, lepidopterans and hymenopterans. The dataset, which is still being added to, is therefore already considerably larger and more representative than those used by previous quantitative models of biodiversity trends and responses. The database is being assembled as part of the PREDICTS project (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems - http://www.predicts.org.uk). We make site-level summary data available alongside this article. The full database will be publicly available in 2015.
Resumo:
A role for the minichromosome maintenance (MCM) proteins in cancer initiation and progression is slowly emerging. Functioning as a complex to ensure a single chromosomal replication per cell cycle, the six family members have been implicated in several neoplastic disease states, including breast cancer. Our study aim to investigate the prognostic significance of these proteins in breast cancer. We studied the expression of MCMs in various datasets and the associations of the expression with clinicopathological parameters. When considered alone, high level MCM4 overexpression was only weakly associated with shorter survival in the combined breast cancer patient cohort (n = 1441, Hazard Ratio = 1.31; 95% Confidence Interval = 1.11-1.55; p = 0.001). On the other hand, when we studied all six components of the MCM complex, we found that overexpression of all MCMs was strongly associated with shorter survival in the same cohort (n = 1441, Hazard Ratio = 1.75; 95% Confidence Interval = 1.31-2.34; p <0.001), suggesting these MCM proteins may cooperate to promote breast cancer progression. Indeed, their expressions were significantly correlated with each other in these cohorts. In addition, we found that increasing number of overexpressed MCMs was associated with negative ER status as well as treatment response. Together, our findings are reproducible in seven independent breast cancer cohorts, with 1441 patients, and suggest that MCM profiling could potentially be used to predict response to treatment and prognosis in breast cancer patients.
Resumo:
Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).
Resumo:
This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labeled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different preprocessing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).
Resumo:
Ellerman Bombs (EBs) are thought to arise as a result of photospheric magnetic reconnection. We use data from the Swedish 1-m Solar Telescope(SST), to study EB events on the solar disk and at the limb. Both datasets show that EBs are connected to the foot-points of forming chromospheric jets. The limb observations show that a bright structure in the H$\alpha$ blue wing connects to the EB initially fuelling it,leading to the ejection of material upwards. The material moves along a loop structure where a newly formed jet is subsequently observed in the red wing of H$\alpha$. In the disk dataset, an EB initiates a jet which propagates away from the apparent reconnection site within the EB flame.The EB then splits into two, with associated brightenings in the inter-granular lanes (IGLs). Micro-jets are then observed, extending to500 km with a lifetime of a few minutes. Observed velocities of themicro-jets are approximately 5-10 km s$^{-1}$, while their chromospheric counterparts range from 50-80 km s$^{-1}$. MURaM simulations of quiet Sun reconnection show that micro-jets with similar properties to that of the observations follow the line of reconnection in the photosphere,with associated H$\alpha$ brightening at the location of increased temperature.
Resumo:
These guidelines provide a practical and evidence-based resource for the management of patients with Barrett's oesophagus and related early neoplasia. The Appraisal of Guidelines for Research and Evaluation (AGREE II) instrument was followed to provide a methodological strategy for the guideline development. A systematic review of the literature was performed for English language articles published up until December 2012 in order to address controversial issues in Barrett's oesophagus including definition, screening and diagnosis, surveillance, pathological grading for dysplasia, management of dysplasia, and early cancer including training requirements. The rigour and quality of the studies was evaluated using the SIGN checklist system. Recommendations on each topic were scored by each author using a five-tier system (A+, strong agreement, to D+, strongly disagree). Statements that failed to reach substantial agreement among authors, defined as >80% agreement (A or A+), were revisited and modified until substantial agreement (>80%) was reached. In formulating these guidelines, we took into consideration benefits and risks for the population and national health system, as well as patient perspectives. For the first time, we have suggested stratification of patients according to their estimated cancer risk based on clinical and histopathological criteria. In order to improve communication between clinicians, we recommend the use of minimum datasets for reporting endoscopic and pathological findings. We advocate endoscopic therapy for high-grade dysplasia and early cancer, which should be performed in high-volume centres. We hope that these guidelines will standardise and improve management for patients with Barrett's oesophagus and related neoplasia.
Resumo:
Clade V nematodes comprise several parasitic species that include the cyathostomins, primary helminth pathogens of horses. Next generation transcriptome datasets are available for eight parasitic clade V nematodes, although no equine parasites are included in this group. Here, we report next generation transcriptome sequencing analysis for the common cyathostomin species, Cylicostephanus goldi. A cDNA library was generated from RNA extracted from 17 C. goldi male and female adult parasites. Following sequencing using a 454 GS FLX pyrosequencer, a total of 475,215 sequencing reads were generated, which were assembled into 26,910 contigs. Using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes databases, 27% of the transcriptome was annotated. Further in-depth analysis was carried out by comparing the C. goldi dataset with the next generation transcriptomes and genomes of other clade V nematodes, with the Oesophagostomum dentatum transcriptome and the Haemonchus contortus genome showing the highest levels of sequence identity with the cyathostomin dataset (45%). The C. goldi transcriptome was mined for genes associated with anthelmintic mode of action and/or resistance. Sequences encoding proteins previously associated with the three major anthelmintic classes used in horses were identified, with the exception of the P-glycoprotein group. Targeted resequencing of the glutamate gated chloride channel α4 subunit (glc-3), one of the primary targets of the macrocyclic lactone anthelmintics, was performed for several cyathostomin species. We believe this study reports the first transcriptome dataset for an equine helminth parasite, providing the opportunity for in-depth analysis of these important parasites at the molecular level. Sequences encoding enzymes involved in key processes and genes associated with levamisole/pyrantel and macrocyclic lactone resistance, in particular the glutamate gated chloride channels, were identified. This novel data will inform cyathostomin biology and anthelmintic resistance studies in future.
Resumo:
We present the latest analysis and results from SEPPCoN (Survey of Ensemble Physical Properties of Cometary Nuclei). This on-going survey involves studying 100 JFCs - about 25% of the known population - at both mid-infrared and visible wave-lengths to constrain the distributions of sizes, shapes, spins, and albedos of this population. Having earlier reported results from measuring thermal emissions of our sample nuclei [1,2,3,4], we report here progress on the visible-wavelength observations that we have obtained at many ground-based facilities in Chile, Spain, and the United States. To date we have attempted observations of 91% of our sample of 100 JFCs, and at least 64 of those were successfully detected. In most cases the comets were at heliocentric distances between 3.0 and 6.5 AU so as to decrease the odds of a comet having a coma. Of the 64 detected comets, 48 were apparently bare, having no extended emission. Our datasets are further augmented by archival data and photometry from the NEAT program [5]. An important goal of SEPPCoN is to accumulate a large comprehensive set of high quality physical data on cometary nuclei in order to make accurate statistical comparisons with other minor-body populations such as Trojans, Centaurs, and Kuiper-belt objects. Information on the size, shape, spin-rate, albedo and color distributions is critical for understanding their origins and evolutionary processes affecting them.
Resumo:
We present new results from SEPPCoN, a Survey of Ensemble Physical Properties of Cometary Nuclei. This project is currently surveying 100 Jupiter-family comets (JFCs) to measure the mid-infrared thermal emission and visible reflected sunlight of the nuclei. The scientific goal is to determine the distributions of radius, geometric albedo, thermal inertia, axial ratio, and color among the JFC nuclei. In the past we have presented results from the completed mid-IR observations of our sample [1]; here we present preliminary results from ongoing, broadband visible-wavelength observations of nuclei obtained from a variety of ground-based facilities (Mauna Kea, Cerro Pachon, La Silla, La Palma, Apache Point, Table Mtn., and Palomar Mtn.), including contributions from the Near Earth Asteroid Telescope project (NEAT) archive. The nuclei were observed at high heliocentric distance (usually over 4 AU) and so many comets show either no or little contamination from dust coma. While several nuclei have been observed as snapshots, we have multiepoch photometry for many of our targets. With our datasets we are building a large database of photometry, and such a database is essential to the derivation of albedo and shape of a large number of nuclei, and to the understanding of biases in the survey. Support for this work was provided by NSF and the NASA Planetary Astronomy program. Reference: [1] Fernandez, Y.R., et al. 2007, BAAS 39, 827.
Resumo:
Currently wind power is dominated by onshore wind farms in the British Isles, but both the United Kingdom and the Republic of Ireland have high renewable energy targets, expected to come mostly from wind power. However, as the demand for wind power grows to ensure security of energy supply, as a potentially cheaper alternative to fossil fuels and to meet greenhouse gas emissions reduction targets offshore wind power will grow rapidly as the availability of suitable onshore sites decrease. However, wind is variable and stochastic by nature and thus difficult to schedule. In order to plan for these uncertainties market operators use wind forecasting tools, reserve plant and ancillary service agreements. Onshore wind power forecasting techniques have improved dramatically and continue to advance, but offshore wind power forecasting is more difficult due to limited datasets and knowledge. So as the amount of offshore wind power increases in the British Isles robust forecasting and planning techniques are even more critical. This paper presents a methodology to investigate the impacts of better offshore wind forecasting on the operation and management of the single wholesale electricity market in the Republic of Ireland and Northern Ireland using PLEXOS for Power Systems. © 2013 IEEE.