820 resultados para Lanczos, Linear systems, Generalized cross validation
Resumo:
This dissertation established a state-of-the-art programming tool for designing and training artificial neural networks (ANNs) and showed its applicability to brain research. The developed tool, called NeuralStudio, allows users without programming skills to conduct studies based on ANNs in a powerful and very user friendly interface. A series of unique features has been implemented in NeuralStudio, such as ROC analysis, cross-validation, network averaging, topology optimization, and optimization of the activation function’s slopes. It also included a Support Vector Machines module for comparison purposes. Once the tool was fully developed, it was applied to two studies in brain research. In the first study, the goal was to create and train an ANN to detect epileptic seizures from subdural EEG. This analysis involved extracting features from the spectral power in the gamma frequencies. In the second application, a unique method was devised to link EEG recordings to epileptic and nonepileptic subjects. The contribution of this method consisted of developing a descriptor matrix that can be used to represent any EEG file regarding its duration and the number of electrodes. The first study showed that the inter-electrode mean of the spectral power in the gamma frequencies and its duration above a specific threshold performs better than the other frequencies in seizure detection, exhibiting an accuracy of 95.90%, a sensitivity of 92.59%, and a specificity of 96.84%. The second study yielded that Hjorth’s parameter activity is sufficient to accurately relate EEG to epileptic and non-epileptic subjects. After testing, accuracy, sensitivity and specificity of the classifier were all above 0.9667. Statistical tests measured the superiority of activity at over 99.99 % certainty. It was demonstrated that (1) the spectral power in the gamma frequencies is highly effective in locating seizures from EEG and (2) activity can be used to link EEG recordings to epileptic and non-epileptic subjects. These two studies required high computational load and could be addressed thanks to NeuralStudio. From a medical perspective, both methods proved the merits of NeuralStudio in brain research applications. For its outstanding features, NeuralStudio has been recently awarded a patent (US patent No. 7502763).
Resumo:
In this study we have identified key genes that are critical in development of astrocytic tumors. Meta-analysis of microarray studies which compared normal tissue to astrocytoma revealed a set of 646 differentially expressed genes in the majority of astrocytoma. Reverse engineering of these 646 genes using Bayesian network analysis produced a gene network for each grade of astrocytoma (Grade I–IV), and ‘key genes’ within each grade were identified. Genes found to be most influential to development of the highest grade of astrocytoma, Glioblastoma multiforme were: COL4A1, EGFR, BTF3, MPP2, RAB31, CDK4, CD99, ANXA2, TOP2A, and SERBP1. All of these genes were up-regulated, except MPP2 (down regulated). These 10 genes were able to predict tumor status with 96–100% confidence when using logistic regression, cross validation, and the support vector machine analysis. Markov genes interact with NFkβ, ERK, MAPK, VEGF, growth hormone and collagen to produce a network whose top biological functions are cancer, neurological disease, and cellular movement. Three of the 10 genes - EGFR, COL4A1, and CDK4, in particular, seemed to be potential ‘hubs of activity’. Modified expression of these 10 Markov Blanket genes increases lifetime risk of developing glioblastoma compared to the normal population. The glioblastoma risk estimates were dramatically increased with joint effects of 4 or more than 4 Markov Blanket genes. Joint interaction effects of 4, 5, 6, 7, 8, 9 or 10 Markov Blanket genes produced 9, 13, 20.9, 26.7, 52.8, 53.2, 78.1 or 85.9%, respectively, increase in lifetime risk of developing glioblastoma compared to normal population. In summary, it appears that modified expression of several ‘key genes’ may be required for the development of glioblastoma. Further studies are needed to validate these ‘key genes’ as useful tools for early detection and novel therapeutic options for these tumors.
Resumo:
This dissertation established a state-of-the-art programming tool for designing and training artificial neural networks (ANNs) and showed its applicability to brain research. The developed tool, called NeuralStudio, allows users without programming skills to conduct studies based on ANNs in a powerful and very user friendly interface. A series of unique features has been implemented in NeuralStudio, such as ROC analysis, cross-validation, network averaging, topology optimization, and optimization of the activation function’s slopes. It also included a Support Vector Machines module for comparison purposes. Once the tool was fully developed, it was applied to two studies in brain research. In the first study, the goal was to create and train an ANN to detect epileptic seizures from subdural EEG. This analysis involved extracting features from the spectral power in the gamma frequencies. In the second application, a unique method was devised to link EEG recordings to epileptic and non-epileptic subjects. The contribution of this method consisted of developing a descriptor matrix that can be used to represent any EEG file regarding its duration and the number of electrodes. The first study showed that the inter-electrode mean of the spectral power in the gamma frequencies and its duration above a specific threshold performs better than the other frequencies in seizure detection, exhibiting an accuracy of 95.90%, a sensitivity of 92.59%, and a specificity of 96.84%. The second study yielded that Hjorth’s parameter activity is sufficient to accurately relate EEG to epileptic and non-epileptic subjects. After testing, accuracy, sensitivity and specificity of the classifier were all above 0.9667. Statistical tests measured the superiority of activity at over 99.99 % certainty. It was demonstrated that 1) the spectral power in the gamma frequencies is highly effective in locating seizures from EEG and 2) activity can be used to link EEG recordings to epileptic and non-epileptic subjects. These two studies required high computational load and could be addressed thanks to NeuralStudio. From a medical perspective, both methods proved the merits of NeuralStudio in brain research applications. For its outstanding features, NeuralStudio has been recently awarded a patent (US patent No. 7502763).
Resumo:
A number of studies have shown that Fourier transform infrared spectroscopy (FTIR) can be applied to quantitatively assess lacustrine sediment constituents. In this study, we developed calibration models based on FTIR for the quantitative determination of biogenic silica (BSi; n = 420; gradient: 0.9-56.5%), total organic carbon (TOC; n = 309; gradient: 0-2.9%), and total inorganic carbon (TIC; n= 152; gradient: 0-0.4%) in a 318 m-long sediment record with a basal age of 3.6 million years from Lake El'gygytgyn, Far East Russian Arctic. The developed partial least squares (PLS) regression models yield high cross-validated (CV) R2CV = 0.86-0.91 and low root mean square error of cross-validation (RMSECV) (3.1-7.0% of the gradient for the different properties). By applying these models to 6771 samples from the entire sediment record, we obtained detailed insight into bioproductivity variations in Lake El'gygytgyn throughout the middle to late Pliocene and Quaternary. High accumulation rates of BSi indicate a productivity maximum during the middle Pliocene (3.6-3.3 Ma), followed by gradually decreasing rates during the late Pliocene and Quaternary. The average BSi accumulation during the middle Pliocene was ~3 times higher than maximum accumulation rates during the past 1.5 million years. The indicated progressive deterioration of environmental and climatic conditions in the Siberian Arctic starting at ca. 3.3 Ma is consistent with the first occurrence of glacial periods and the finally complete establishment of glacial-interglacial cycles during the Quaternary.
Resumo:
River runoff is an essential climate variable as it is directly linked to the terrestrial water balance and controls a wide range of climatological and ecological processes. Despite its scientific and societal importance, there are to date no pan-European observation-based runoff estimates available. Here we employ a recently developed methodology to estimate monthly runoff rates on regular spatial grid in Europe. For this we first assemble an unprecedented collection of river flow observations, combining information from three distinct data bases. Observed monthly runoff rates are first tested for homogeneity and then related to gridded atmospheric variables (E-OBS version 12) using machine learning. The resulting statistical model is then used to estimate monthly runoff rates (December 1950 - December 2015) on a 0.5° x 0.5° grid. The performance of the newly derived runoff estimates is assessed in terms of cross validation. The paper closes with example applications, illustrating the potential of the new runoff estimates for climatological assessments and drought monitoring.
Resumo:
River runoff is an essential climate variable as it is directly linked to the terrestrial water balance and controls a wide range of climatological and ecological processes. Despite its scientific and societal importance, there are to date no pan-European observation-based runoff estimates available. Here we employ a recently developed methodology to estimate monthly runoff rates on regular spatial grid in Europe. For this we first collect an unprecedented collection of river flow observations, combining information from three distinct data bases. Observed monthly runoff rates are first tested for homogeneity and then related to gridded atmospheric variables (E-OBS version 11) using machine learning. The resulting statistical model is then used to estimate monthly runoff rates (December 1950-December 2014) on a 0.5° × 0.5° grid. The performance of the newly derived runoff estimates is assessed in terms of cross validation. The paper closes with example applications, illustrating the potential of the new runoff estimates for climatological assessments and drought monitoring.
Resumo:
Background. In pre-school and primary education pupils differ in many abilities and competences (‘giftedness’). Yet mainstream educational practice seems rather homogeneous in providing age-based or grade-class subject matter approaches. Aims. To clarify whether pupils scoring initially at high ability level do develop and attain differently at school with respect to language and arithmetic compared with pupils displaying other initial ability levels. To investigate whether specific individual, family or educational variables co-vary with the attainment of these different types of pupils in school. Samples. Data from the large-scale PRIMA cohort study including a total of 8258 grade 2 and 4 pupils from 438 primary schools in The Netherlands. Methods. Secondary analyses were carried out to construct gain scores for both language and arithmetic proficiency and a number of behavioural, attitudinal, family and educational characteristics. The pupils were grouped into different ability categories (highly able; able; above average; average and below). Further analyses used Pearson correlations and analyses of variance both between and within ability categories. Cross-validation was done by introducing a cohort of younger pupils in pre-school and grouping both cohorts into decile groups based on initial ability in language and arithmetic. Results. Highly able pupils generally decreased in attainment in both language and arithmetic, whereas pupils in average and below average groups improved their language and arithmetic scores. Only with highly able pupils were some educational characteristics correlated with the pupils’ development in achievement, behaviour and attitudes. Conclusions. Pre-school and primary education should better match pupils’ differences in abilities and competences from their start in pre-school to improve their functioning, learning processes and outcomes. Recommendations for educational improvement strategies are presented in closing.
Resumo:
In order to predict compressive strength of geopolymers prepared from alumina-silica natural products, based on the effect of Al 2 O 3 /SiO 2, Na 2 O/Al 2 O 3, Na 2 O/H 2 O, and Na/[Na+K], more than 50 pieces of data were gathered from the literature. The data was utilized to train and test a multilayer artificial neural network (ANN). Therefore a multilayer feedforward network was designed with chemical compositions of alumina silicate and alkali activators as inputs and compressive strength as output. In this study, a feedforward network with various numbers of hidden layers and neurons were tested to select the optimum network architecture. The developed three-layer neural network simulator model used the feedforward back propagation architecture, demonstrated its ability in training the given input/output patterns. The cross-validation data was used to show the validity and high prediction accuracy of the network. This leads to the optimum chemical composition and the best paste can be made from activated alumina-silica natural products using alkaline hydroxide, and alkaline silicate. The research results are in agreement with mechanism of geopolymerization.
Read More: http://ascelibrary.org/doi/abs/10.1061/(ASCE)MT.1943-5533.0000829
Resumo:
Cette thèse développe des méthodes bootstrap pour les modèles à facteurs qui sont couram- ment utilisés pour générer des prévisions depuis l'article pionnier de Stock et Watson (2002) sur les indices de diffusion. Ces modèles tolèrent l'inclusion d'un grand nombre de variables macroéconomiques et financières comme prédicteurs, une caractéristique utile pour inclure di- verses informations disponibles aux agents économiques. Ma thèse propose donc des outils éco- nométriques qui améliorent l'inférence dans les modèles à facteurs utilisant des facteurs latents extraits d'un large panel de prédicteurs observés. Il est subdivisé en trois chapitres complémen- taires dont les deux premiers en collaboration avec Sílvia Gonçalves et Benoit Perron. Dans le premier article, nous étudions comment les méthodes bootstrap peuvent être utilisées pour faire de l'inférence dans les modèles de prévision pour un horizon de h périodes dans le futur. Pour ce faire, il examine l'inférence bootstrap dans un contexte de régression augmentée de facteurs où les erreurs pourraient être autocorrélées. Il généralise les résultats de Gonçalves et Perron (2014) et propose puis justifie deux approches basées sur les résidus : le block wild bootstrap et le dependent wild bootstrap. Nos simulations montrent une amélioration des taux de couverture des intervalles de confiance des coefficients estimés en utilisant ces approches comparativement à la théorie asymptotique et au wild bootstrap en présence de corrélation sérielle dans les erreurs de régression. Le deuxième chapitre propose des méthodes bootstrap pour la construction des intervalles de prévision permettant de relâcher l'hypothèse de normalité des innovations. Nous y propo- sons des intervalles de prédiction bootstrap pour une observation h périodes dans le futur et sa moyenne conditionnelle. Nous supposons que ces prévisions sont faites en utilisant un ensemble de facteurs extraits d'un large panel de variables. Parce que nous traitons ces facteurs comme latents, nos prévisions dépendent à la fois des facteurs estimés et les coefficients de régres- sion estimés. Sous des conditions de régularité, Bai et Ng (2006) ont proposé la construction d'intervalles asymptotiques sous l'hypothèse de Gaussianité des innovations. Le bootstrap nous permet de relâcher cette hypothèse et de construire des intervalles de prédiction valides sous des hypothèses plus générales. En outre, même en supposant la Gaussianité, le bootstrap conduit à des intervalles plus précis dans les cas où la dimension transversale est relativement faible car il prend en considération le biais de l'estimateur des moindres carrés ordinaires comme le montre une étude récente de Gonçalves et Perron (2014). Dans le troisième chapitre, nous suggérons des procédures de sélection convergentes pour les regressions augmentées de facteurs en échantillons finis. Nous démontrons premièrement que la méthode de validation croisée usuelle est non-convergente mais que sa généralisation, la validation croisée «leave-d-out» sélectionne le plus petit ensemble de facteurs estimés pour l'espace généré par les vraies facteurs. Le deuxième critère dont nous montrons également la validité généralise l'approximation bootstrap de Shao (1996) pour les regressions augmentées de facteurs. Les simulations montrent une amélioration de la probabilité de sélectionner par- cimonieusement les facteurs estimés comparativement aux méthodes de sélection disponibles. L'application empirique revisite la relation entre les facteurs macroéconomiques et financiers, et l'excès de rendement sur le marché boursier américain. Parmi les facteurs estimés à partir d'un large panel de données macroéconomiques et financières des États Unis, les facteurs fortement correlés aux écarts de taux d'intérêt et les facteurs de Fama-French ont un bon pouvoir prédictif pour les excès de rendement.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Recently, there has been considerable interest in solving viscoelastic problems in 3D particularly with the improvement in modern computing power. In many applications the emphasis has been on economical algorithms which can cope with the extra complexity that the third dimension brings. Storage and computer time are of the essence. The advantage of the finite volume formulation is that a large amount of memory space is not required. Iterative methods rather than direct methods can be used to solve the resulting linear systems efficiently.
Resumo:
Considering the social and economic importance that the milk has, the objective of this study was to evaluate the incidence and quantifying antimicrobial residues in the food. The samples were collected in dairy industry of southwestern Paraná state and thus they were able to cover all ten municipalities in the region of Pato Branco. The work focused on the development of appropriate models for the identification and quantification of analytes: tetracycline, sulfamethazine, sulfadimethoxine, chloramphenicol and ampicillin, all antimicrobials with health interest. For the calibration procedure and validation of the models was used the Infrared Spectroscopy Fourier Transform associated with chemometric method based on Partial Least Squares regression (PLS - Partial Least Squares). To prepare a work solution antimicrobials, the five analytes of interest were used in increasing doses, namely tetracycline from 0 to 0.60 ppm, sulfamethazine 0 to 0.12 ppm, sulfadimethoxine 0 to 2.40 ppm chloramphenicol 0 1.20 ppm and ampicillin 0 to 1.80 ppm to perform the work with the interest in multiresidues analysis. The performance of the models constructed was evaluated through the figures of merit: mean square error of calibration and cross-validation, correlation coefficients and offset performance ratio. For the purposes of applicability in this work, it is considered that the models generated for Tetracycline, Sulfadimethoxine and Chloramphenicol were considered viable, with the greatest predictive power and efficiency, then were employed to evaluate the quality of raw milk from the region of Pato Branco . Among the analyzed samples by NIR, 70% were in conformity with sanitary legislation, and 5% of these samples had concentrations below the Maximum Residue permitted, and is also satisfactory. However 30% of the sample set showed unsatisfactory results when evaluating the contamination with antimicrobials residues, which is non conformity related to the presence of antimicrobial unauthorized use or concentrations above the permitted limits. With the development of this work can be said that laboratory tests in the food area, using infrared spectroscopy with multivariate calibration was also good, fast in analysis, reduced costs and with minimum generation of laboratory waste. Thus, the alternative method proposed meets the quality concerns and desired efficiency by industrial sectors and society in general.
Resumo:
The routine analysis for quantization of organic acids and sugars are generally slow methods that involve the use and preparation of several reagents, require trained professional, the availability of special equipment and is expensive. In this context, it has been increasing investment in research whose purpose is the development of substitutive methods to reference, which are faster, cheap and simple, and infrared spectroscopy have been highlighted in this regard. The present study developed multivariate calibration models for the simultaneous and quantitative determination of ascorbic acid, citric, malic and tartaric and sugars sucrose, glucose and fructose, and soluble solids in juices and fruit nectars and classification models for ACP. We used methods of spectroscopy in the near infrared (Near Infrared, NIR) in association with the method regression of partial least squares (PLS). Were used 42 samples between juices and fruit nectars commercially available in local shops. For the construction of the models were performed with reference analysis using high-performance liquid chromatography (HPLC) and refractometry for the analysis of soluble solids. Subsequently, the acquisition of the spectra was done in triplicate, in the spectral range 12500 to 4000 cm-1. The best models were applied to the quantification of analytes in study on natural juices and juice samples produced in the Paraná Southwest Region. The juices used in the application of the models also underwent physical and chemical analysis. Validation of chromatographic methodology has shown satisfactory results, since the external calibration curve obtained R-square value (R2) above 0.98 and coefficient of variation (%CV) for intermediate precision and repeatability below 8.83%. Through the Principal Component Analysis (PCA) was possible to separate samples of juices into two major groups, grape and apple and tangerine and orange, while for nectars groups separated guava and grape, and pineapple and apple. Different validation methods, and pre-processes that were used separately and in combination, were obtained with multivariate calibration models with average forecast square error (RMSEP) and cross validation (RMSECV) errors below 1.33 and 1.53 g.100 mL-1, respectively and R2 above 0.771, except for malic acid. The physicochemical analysis enabled the characterization of drinks, including the pH working range (variation of 2.83 to 5.79) and acidity within the parameters Regulation for each flavor. Regression models have demonstrated the possibility of determining both ascorbic acids, citric, malic and tartaric with successfully, besides sucrose, glucose and fructose by means of only a spectrum, suggesting that the models are economically viable for quality control and product standardization in the fruit juice and nectars processing industry.
Resumo:
Solving linear systems is an important problem for scientific computing. Exploiting parallelism is essential for solving complex systems, and this traditionally involves writing parallel algorithms on top of a library such as MPI. The SPIKE family of algorithms is one well-known example of a parallel solver for linear systems. The Hierarchically Tiled Array data type extends traditional data-parallel array operations with explicit tiling and allows programmers to directly manipulate tiles. The tiles of the HTA data type map naturally to the block nature of many numeric computations, including the SPIKE family of algorithms. The higher level of abstraction of the HTA enables the same program to be portable across different platforms. Current implementations target both shared-memory and distributed-memory models. In this thesis we present a proof-of-concept for portable linear solvers. We implement two algorithms from the SPIKE family using the HTA library. We show that our implementations of SPIKE exploit the abstractions provided by the HTA to produce a compact, clean code that can run on both shared-memory and distributed-memory models without modification. We discuss how we map the algorithms to HTA programs as well as examine their performance. We compare the performance of our HTA codes to comparable codes written in MPI as well as current state-of-the-art linear algebra routines.
Resumo:
In this paper we use some classical ideas from linear systems theory to analyse convolutional codes. In particular, we exploit input-state-output representations of periodic linear systems to study periodically time-varying convolutional codes. In this preliminary work we focus on the column distance of these codes and derive explicit necessary and sufficient conditions for an (n, 2, 1) periodically time-varying convolutional code to have Maximum Distance Profile (MDP).