874 resultados para cross validation
Resumo:
O carcinoma epidermóide de pênis (CEP) representa 95% das neoplasias penianas e afeta quase sempre pacientes não circuncidados estando muitas vezes associado à falta de higiene local adequada e à fimose. No Brasil a sua incidência é de 2,7 % porém em algumas áreas do país pode chegar a 17% dos casos diagnosticados por ano. O tumor pode ocorrer em qualquer parte do órgão sexual masculino e o tipo de estadiamento empregado é controverso. A classificação de Broders é a mais utilizada. Estudos sugerem a relação entre o desenvolvimento do carcinoma de pênis com a infecção por HPV (Papiloma Vírus Humano). O método de avaliação dos linfonodos inguinais permanece controverso sendo difícil a diferenciação entre linfadenomegalia inflamatória reacional e metastática. O exame físico não é um preditor confiável do comprometimento linfonodal pois pacientes com linfonodos palpáveis podem não apresentar metástases. Há poucas publicações sobre os mecanismos moleculares envolvidos na gênese e progressão do CEP. Apesar de vários marcadores terem sido avaliados, atualmente a aplicação clínica destes é limitada. A maior parte dos marcadores estudados requer procedimentos invasivos para obtenção do tecido tumoral. Existe a necessidade de encontrar através de uma técnica pouco invasiva marcadores tumorais circulantes capazes de diferenciar portadores de CEP com e sem envolvimento metastático. Neste tipo de neoplasia, a descoberta de biomarcadores que avaliem o prognóstico é relevante, pois o exame físico não é um indicador confiável do comprometimento linfonodal e da sobrevida.Os objetivos foram 1) revisar e discutir a epidemiologia, a etiologia, os diversos tipos de abordagem cirúrgica e as controvérsias no tratamento cirúrgico do câncer de pênis 2) investigar através da plataforma ClinProt/ MALDI / TOF a presença de marcadores plasmáticos capazes de discriminar indivíduos saudáveis de pacientes afetados por carcinoma epidermóide de pênis (CEP) 3) avaliar a importância destes marcadores na evolução da doença. Foram coletados e analisados pela plataforma ClinProt / MALDI / TOF o plasma de 36 indivíduos saudáveis e 25 pacientes com CEP invasivo, submetidos a tratamento cirúrgico entre junho de 2010 e junho de 2011, nos serviços de urologia do Instituto Nacional de Câncer e do Hospital Mário Kröeff (Rio de Janeiro). Nossos resultados apontaram para um conjunto de dois peptídeos (A = m / z 1897,22 + -9 Da e B = m / z 2021,99 + -9 Da) que foram capazes de diferenciar pacientes com CEP de indivíduos controles. Esses peptídeos foram posteriormente identificados como fragmentos C3 e C4 A/B do sistema complemento. A validação cruzada, utilizando toda casuística apresentou 62,5% e 86,76% de sensibilidade e de especificidade, respectivamente, com uma alta sensibilidade (100%) e especificidade (97%) nos pacientes que morreram pela doença. Além disso, os pacientes com envolvimento ganglionar obtiveram uma sensibilidade e uma especificidade de 80 % e 97%, respectivamente. Ficou demonstrado que à medida que a doença progride mais subexpressos está o conjunto de peptídeos quando comparados com indivíduos saudáveis. Estes resultados podem ser úteis como ferramentas para a avaliação do prognóstico destes pacientes.
Resumo:
O objetivo deste trabalho é contribuir com o desenvolvimento de uma técnica baseada em sistemas inteligentes que possibilite a localização exata ou aproximada do ponto de origem de uma Variação de Tensão de Curta Duração (VTCD) (gerada por uma falta) em um sistema de distribuição de energia elétrica. Este trabalho utiliza um Phase-Locked Loop (PLL) com o intuito de detectar as faltas. Uma vez que a falta é detectada, os sinais de tensão obtidos durante a falta são decompostos em componentes simétricas instantâneas por meio do método proposto. Em seguida, as energias das componentes simétricas são calculadas e utilizadas para estimar a localização da falta. Nesta pesquisa, são avaliadas duas estruturas baseadas em Redes Neurais Artificiais (RNAs). A primeira é projetada para classificar a localização da falta em um dos pontos possíveis e a segunda é projetada para estimar a distância da falta ao alimentador. A técnica aqui proposta aplica-se a alimentadores trifásicos com cargas equilibradas. No desenvolvimento da mesma, considera-se que há disponibilidade de medições de tensões no nó inicial do alimentador e também em pontos esparsos ao longo da rede de distribuição. O banco de dados empregado foi obtido através de simulações de um modelo de alimentador radial usando o programa PSCAD/EMTDC. Testes de sensibilidade empregando validação-cruzada são realizados em ambas as arquiteturas de redes neurais com o intuito de verificar a confiabilidade dos resultados obtidos. Adicionalmente foram realizados testes com faltas não inicialmente contidas no banco de dados a fim de se verificar a capacidade de generalização das redes. Os desempenhos de ambas as arquiteturas de redes neurais foram satisfatórios e demonstram a viabilidade das técnicas propostas para realizar a localização de faltas em redes de distribuição.
Resumo:
A partir de 2011, ocorreram e ainda ocorrerão eventos de grande repercussão para a cidade do Rio de Janeiro, como a conferência Rio+20 das Nações Unidas e eventos esportivos de grande importância mundial (Copa do Mundo de Futebol, Olimpíadas e Paraolimpíadas). Estes acontecimentos possibilitam a atração de recursos financeiros para a cidade, assim como a geração de empregos, melhorias de infraestrutura e valorização imobiliária, tanto territorial quanto predial. Ao optar por um imóvel residencial em determinado bairro, não se avalia apenas o imóvel, mas também as facilidades urbanas disponíveis na localidade. Neste contexto, foi possível definir uma interpretação qualitativa linguística inerente aos bairros da cidade do Rio de Janeiro, integrando-se três técnicas de Inteligência Computacional para a avaliação de benefícios: Lógica Fuzzy, Máquina de Vetores Suporte e Algoritmos Genéticos. A base de dados foi construída com informações da web e institutos governamentais, evidenciando o custo de imóveis residenciais, benefícios e fragilidades dos bairros da cidade. Implementou-se inicialmente a Lógica Fuzzy como um modelo não supervisionado de agrupamento através das Regras Elipsoidais pelo Princípio de Extensão com o uso da Distância de Mahalanobis, configurando-se de forma inferencial os grupos de designação linguística (Bom, Regular e Ruim) de acordo com doze características urbanas. A partir desta discriminação, foi tangível o uso da Máquina de Vetores Suporte integrado aos Algoritmos Genéticos como um método supervisionado, com o fim de buscar/selecionar o menor subconjunto das variáveis presentes no agrupamento que melhor classifique os bairros (Princípio da Parcimônia). A análise das taxas de erro possibilitou a escolha do melhor modelo de classificação com redução do espaço de variáveis, resultando em um subconjunto que contém informações sobre: IDH, quantidade de linhas de ônibus, instituições de ensino, valor m médio, espaços ao ar livre, locais de entretenimento e crimes. A modelagem que combinou as três técnicas de Inteligência Computacional hierarquizou os bairros do Rio de Janeiro com taxas de erros aceitáveis, colaborando na tomada de decisão para a compra e venda de imóveis residenciais. Quando se trata de transporte público na cidade em questão, foi possível perceber que a malha rodoviária ainda é a prioritária
Resumo:
We describe the design steps and final implementation of a MIMO OFDM prototype platform developed to enhance the performance of wireless LAN standards such as HiperLAN/2 and 802.11, using multiple transmit and multiple receive antennas. We first describe the channel measurement campaign used to characterize the indoor operational propagation environment, and analyze the influence of the channel on code design through a ray-tracing channel simulator. We also comment on some antenna and RF issues which are of importance for the final realization of the testbed. Multiple coding, decoding, and channel estimation strategies are discussed and their respective performance-complexity trade-offs are evaluated over the realistic channel obtained from the propagation studies. Finally,we present the design methodology, including cross-validation of the Matlab, C++, and VHDL components, and the final demonstrator architecture. We highlight the increased measured performance of the MIMO testbed over the single-antenna system. £.
Resumo:
本论文通过对计算方法的筛选,把目前被认为是最有前途的多元统计学方法--主组份回归法(PCR)和偏最小二乘法(PLS)以及人们使用较多的CPA矩阵法固较为成熟的,且普遍使用的光度分析有机地结合在一起,对多组份混合体系进行了同时测定的应用研究。并详细阐述了多元线性回归方法(MLR)、PLS、PCR方法的基本数学原理,继而又以运行速度较快的FORTRAN语言分别编制了CPA矩阵法,PCR法,PLS法的计算机程序,实现了对光谱数据矩阵和校准浓度矩阵的计算机全处理过程,获得了预期的效果。经过它们处理计算的几个多组份混合体系的同时测定,也都取得了满意的结果。本文还通过对CPA矩阵法,PCR法和PLS法的计算测定的比较,归纳总结了它们各自的优缺点,并在校准样品的系列统计设计。以交叉证实法(Cross-Validation)确定最佳校准模型的因子数,不相容因子(DF)判定检查结果的可靠性等方面都作了较系统的有益探索,并提出了些新颖的观点和看法,证明了其具有广阔的应用前景。即使对有交互作用较强的药物样品的定量分析,仍取得了较满意的结果。本论文共作了如下四方面的探讨。1.CPA矩阵法在光度分析中进行多组份体系同时测定的研究。2.偏最小二乘法(PLS)在分光光度定量分析中的应用。它是以因子分析为基础的多元统计学方法。3.主组份回归法(PCR)同时计算测定钨、钼、钒。它是因子分析(FA)和MLR相结合的产物。故兼容了FA和多元线性回归法中的经典最小二乘法(CLS)和逆最小二法(ILS)的优点。4.多元统计学方法在光度分析中应用的研究。本文将较为优异的计算方法,PCR和PLS分别进行了多方面的分析测定研究。总之,PCR和PLS法都是因子分析(FA)和多元线性回归法(MLR)相结合的产物。在目前的计算方法中,被认为是最有前途的多元统计学方法。
Resumo:
在区域水土流失模型研究中,空间插值可提供每个计算栅格的气象要素资料。考虑到研究区域降雨与高程相关性很弱,不宜采用梯度距离反比法(GIDS),故采用距离反比法(IDW)和普通克里格法(Kriging),对延安示范区及其周围共50个站点2000—2003年的5—10月逐月降雨量进行插值。交叉验证结果表明:对2种插值方法,二者经过对数变换后平均相对误差(MRE)为8.30%和7.67%,分别比原始数据插值后的MRE下降了23.17%和23.50%,说明插值精度得到了提升,对研究区域某一年逐月降水的插值Kriging方法比IDW方法更加精确。
Resumo:
Mapping the spatial distribution of contaminants in soils is the basis of pollution evaluation and risk control. Interpolation methods are extensively applied in the mapping processes to estimate the heavy metal concentrations at unsampled sites. The performances of interpolation methods (inverse distance weighting, local polynomial, ordinary kriging and radial basis functions) were assessed and compared using the root mean square error for cross validation. The results indicated that all interpolation methods provided a high prediction accuracy of the mean concentration of soil heavy metals. However, the classic method based on percentages of polluted samples, gave a pollution area 23.54-41.92% larger than that estimated by interpolation methods. The difference in contaminated area estimation among the four methods reached 6.14%. According to the interpolation results, the spatial uncertainty of polluted areas was mainly located in three types of region: (a) the local maxima concentration region surrounded by low concentration (clean) sites, (b) the local minima concentration region surrounded with highly polluted samples; and (c) the boundaries of the contaminated areas. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
A novel edge degree f(i) for heteroatom and multiple bonds in molecular graph is derived on the basis of the edge degree delta(e(r)). A novel edge connectivity index F-m is introduced. The multiple linear regression by using the edge connectivity index F-m and alcohol-type parameter delta, alcohol-distance parameter L can provide high-quality QSPR models for the normal boiling points (BPs), molar volumes (MVs), molar refraction (MRs), water solubility(log(1/S)) and octanol/water partition (logP) of alcohols with up to 17 non-hydrogen atoms. The results imply that these physical properties may be expressed as a liner combination of the edge connectivity index and alcohol-type parameter, 6, alcohol-distance parameter, L. For the models of the five properties, the correlation coefficient r and the standard errors are 0.9969,3.022; 0.9993, 1.504; 0.9992, 0.446; 0.9924,0.129 and 0.9973,0.123 for BPs, MVs, MRs, log(1/S) and logP, respectively. The cross-validation by using the leave-one-out method demonstrates the models to be highly reliable from the point of view of statistics.
Resumo:
Formation resistivity is one of the most important parameters to be evaluated in the evaluation of reservoir. In order to acquire the true value of virginal formation, various types of resistivity logging tools have been developed. However, with the increment of the proved reserves, the thickness of interest pay zone is becoming thinner and thinner, especially in the terrestrial deposit oilfield, so that electrical logging tools, limited by the contradictory requirements of resolution and investigation depth of this kinds of tools, can not provide the true value of the formation resistivity. Therefore, resitivity inversion techniques have been popular in the determination of true formation resistivity based on the improving logging data from new tools. In geophysical inverse problems, non-unique solution is inevitable due to the noisy data and deficient measurement information. I address this problem in my dissertation from three aspects, data acquisition, data processing/inversion and applications of the results/ uncertainty evaluation of the non-unique solution. Some other problems in the traditional inversion methods such as slowness speed of the convergence and the initial-correlation results. Firstly, I deal with the uncertainties in the data to be processed. The combination of micro-spherically focused log (MSFL) and dual laterolog(DLL) is the standard program to determine formation resistivity. During the inversion, the readings of MSFL are regarded as the resistivity of invasion zone of the formation after being corrected. However, the errors can be as large as 30 percent due to mud cake influence even if the rugose borehole effects on the readings of MSFL can be ignored. Furthermore, there still are argues about whether the two logs can be quantitatively used to determine formation resisitivities due to the different measurement principles. Thus, anew type of laterolog tool is designed theoretically. The new tool can provide three curves with different investigation depths and the nearly same resolution. The resolution is about 0.4meter. Secondly, because the popular iterative inversion method based on the least-square estimation can not solve problems more than two parameters simultaneously and the new laterolog logging tool is not applied to practice, my work is focused on two parameters inversion (radius of the invasion and the resistivty of virgin information ) of traditional dual laterolog logging data. An unequal weighted damp factors- revised method is developed to instead of the parameter-revised techniques used in the traditional inversion method. In this new method, the parameter is revised not only dependency on the damp its self but also dependency on the difference between the measurement data and the fitting data in different layers. At least 2 iterative numbers are reduced than the older method, the computation cost of inversion is reduced. The damp least-squares inversion method is the realization of Tikhonov's tradeoff theory on the smooth solution and stability of inversion process. This method is realized through linearity of non-linear inversion problem which must lead to the dependency of solution on the initial value of parameters. Thus, severe debates on efficiency of this kinds of methods are getting popular with the developments of non-linear processing methods. The artificial neural net method is proposed in this dissertation. The database of tool's response to formation parameters is built through the modeling of the laterolog tool and then is used to training the neural nets. A unit model is put forward to simplify the dada space and an additional physical limitation is applied to optimize the net after the cross-validation method is done. Results show that the neural net inversion method could replace the traditional inversion method in a single formation and can be used a method to determine the initial value of the traditional method. No matter what method is developed, the non-uniqueness and uncertainties of the solution could be inevitable. Thus, it is wise to evaluate the non-uniqueness and uncertainties of the solution in the application of inversion results. Bayes theorem provides a way to solve such problems. This method is illustrately discussed in a single formation and achieve plausible results. In the end, the traditional least squares inversion method is used to process raw logging data, the calculated oil saturation increased 20 percent than that not be proceed compared to core analysis.
Resumo:
Many problems in early vision are ill posed. Edge detection is a typical example. This paper applies regularization techniques to the problem of edge detection. We derive an optimal filter for edge detection with a size controlled by the regularization parameter $\\ lambda $ and compare it to the Gaussian filter. A formula relating the signal-to-noise ratio to the parameter $\\lambda $ is derived from regularization analysis for the case of small values of $\\lambda$. We also discuss the method of Generalized Cross Validation for obtaining the optimal filter scale. Finally, we use our framework to explain two perceptual phenomena: coarsely quantized images becoming recognizable by either blurring or adding noise.
Resumo:
BACKGROUND:In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.RESULTS:We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing.CONCLUSION:A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional in silico validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.
Resumo:
Spotting patterns of interest in an input signal is a very useful task in many different fields including medicine, bioinformatics, economics, speech recognition and computer vision. Example instances of this problem include spotting an object of interest in an image (e.g., a tumor), a pattern of interest in a time-varying signal (e.g., audio analysis), or an object of interest moving in a specific way (e.g., a human's body gesture). Traditional spotting methods, which are based on Dynamic Time Warping or hidden Markov models, use some variant of dynamic programming to register the pattern and the input while accounting for temporal variation between them. At the same time, those methods often suffer from several shortcomings: they may give meaningless solutions when input observations are unreliable or ambiguous, they require a high complexity search across the whole input signal, and they may give incorrect solutions if some patterns appear as smaller parts within other patterns. In this thesis, we develop a framework that addresses these three problems, and evaluate the framework's performance in spotting and recognizing hand gestures in video. The first contribution is a spatiotemporal matching algorithm that extends the dynamic programming formulation to accommodate multiple candidate hand detections in every video frame. The algorithm finds the best alignment between the gesture model and the input, and simultaneously locates the best candidate hand detection in every frame. This allows for a gesture to be recognized even when the hand location is highly ambiguous. The second contribution is a pruning method that uses model-specific classifiers to reject dynamic programming hypotheses with a poor match between the input and model. Pruning improves the efficiency of the spatiotemporal matching algorithm, and in some cases may improve the recognition accuracy. The pruning classifiers are learned from training data, and cross-validation is used to reduce the chance of overpruning. The third contribution is a subgesture reasoning process that models the fact that some gesture models can falsely match parts of other, longer gestures. By integrating subgesture reasoning the spotting algorithm can avoid the premature detection of a subgesture when the longer gesture is actually being performed. Subgesture relations between pairs of gestures are automatically learned from training data. The performance of the approach is evaluated on two challenging video datasets: hand-signed digits gestured by users wearing short sleeved shirts, in front of a cluttered background, and American Sign Language (ASL) utterances gestured by ASL native signers. The experiments demonstrate that the proposed method is more accurate and efficient than competing approaches. The proposed approach can be generally applied to alignment or search problems with multiple input observations, that use dynamic programming to find a solution.
Resumo:
As more diagnostic testing options become available to physicians, it becomes more difficult to combine various types of medical information together in order to optimize the overall diagnosis. To improve diagnostic performance, here we introduce an approach to optimize a decision-fusion technique to combine heterogeneous information, such as from different modalities, feature categories, or institutions. For classifier comparison we used two performance metrics: The receiving operator characteristic (ROC) area under the curve [area under the ROC curve (AUC)] and the normalized partial area under the curve (pAUC). This study used four classifiers: Linear discriminant analysis (LDA), artificial neural network (ANN), and two variants of our decision-fusion technique, AUC-optimized (DF-A) and pAUC-optimized (DF-P) decision fusion. We applied each of these classifiers with 100-fold cross-validation to two heterogeneous breast cancer data sets: One of mass lesion features and a much more challenging one of microcalcification lesion features. For the calcification data set, DF-A outperformed the other classifiers in terms of AUC (p < 0.02) and achieved AUC=0.85 +/- 0.01. The DF-P surpassed the other classifiers in terms of pAUC (p < 0.01) and reached pAUC=0.38 +/- 0.02. For the mass data set, DF-A outperformed both the ANN and the LDA (p < 0.04) and achieved AUC=0.94 +/- 0.01. Although for this data set there were no statistically significant differences among the classifiers' pAUC values (pAUC=0.57 +/- 0.07 to 0.67 +/- 0.05, p > 0.10), the DF-P did significantly improve specificity versus the LDA at both 98% and 100% sensitivity (p < 0.04). In conclusion, decision fusion directly optimized clinically significant performance measures, such as AUC and pAUC, and sometimes outperformed two well-known machine-learning techniques when applied to two different breast cancer data sets.
Resumo:
BACKGROUND: Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. RESULTS: We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. CONCLUSIONS: permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.
Resumo:
© Institute of Mathematical Statistics, 2014.Motivated by recent findings in the field of consumer science, this paper evaluates the causal effect of debit cards on household consumption using population-based data from the Italy Survey on Household Income and Wealth (SHIW). Within the Rubin Causal Model, we focus on the estimand of population average treatment effect for the treated (PATT). We consider three existing estimators, based on regression, mixed matching and regression, propensity score weighting, and propose a new doubly-robust estimator. Semiparametric specification based on power series for the potential outcomes and the propensity score is adopted. Cross-validation is used to select the order of the power series. We conduct a simulation study to compare the performance of the estimators. The key assumptions, overlap and unconfoundedness, are systematically assessed and validated in the application. Our empirical results suggest statistically significant positive effects of debit cards on the monthly household spending in Italy.