886 resultados para Optimal test set
Resumo:
In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.
Resumo:
We provide new analytical results concerning the spread of information or influence under the linear threshold social network model introduced by Kempe et al. in, in the information dissemination context. The seeder starts by providing the message to a set of initial nodes and is interested in maximizing the number of nodes that will receive the message ultimately. A node's decision to forward the message depends on the set of nodes from which it has received the message. Under the linear threshold model, the decision to forward the information depends on the comparison of the total influence of the nodes from which a node has received the packet with its own threshold of influence. We derive analytical expressions for the expected number of nodes that receive the message ultimately, as a function of the initial set of nodes, for a generic network. We show that the problem can be recast in the framework of Markov chains. We then use the analytical expression to gain insights into information dissemination in some simple network topologies such as the star, ring, mesh and on acyclic graphs. We also derive the optimal initial set in the above networks, and also hint at general heuristics for picking a good initial set.
Resumo:
Identical parallel-connected converters with unequal load sharing have unequal terminal voltages. The difference in terminal voltages is more pronounced in case of back-to-back connected converters, operated in power-circulation mode for the purpose of endurance tests. In this paper, a synchronous reference frame based analysis is presented to estimate the grid current distortion in interleaved, grid-connected converters with unequal terminal voltages. Influence of carrier interleaving angle on rms grid current ripple is studied theoretically as well as experimentally. Optimum interleaving angle to minimize the rms grid current ripple is investigated for different applications of parallel converters. The applications include unity power factor rectifiers, inverters for renewable energy sources, reactive power compensators, and circulating-power test set-up used for thermal testing of high-power converters. Optimum interleaving angle is shown to be a strong function of the average of the modulation indices of the two converters, irrespective of the application. The findings are verified experimentally on two parallel-connected converters, circulating reactive power of up to 150 kVA between them.
Resumo:
The characterization of a closed-cell aluminum foam with the trade name Alporas is carried out here under compression loading for a nominal cross-head speed of 1 mm/min. Foam samples in the form of cubes are tested in a UTM and the average stress-strain behavior is obtained which clearly displays a plateau strength of approximately 2 MPa. It is noted that the specific energy absorption capacity of the foam can be high despite its low strength which makes it attractive as a material for certain energy-absorbing countermeasures. The mechanical behavior of the present Alporas foam is simulated using cellular (i.e. so-called microstructure-based) and solid element-based finite element models. The efficacy of the cellular approach is shown, perhaps for the first time in published literature, in terms of prediction of both stress-strain response and inclined fold formation during axial crush under compression loading. Keeping in mind future applications under impact loads, limited results are presented when foam samples are subjected to low velocity impact in a drop-weight test set-up.
Resumo:
In this paper, we describe a method for feature extraction and classification of characters manually isolated from scene or natural images. Characters in a scene image may be affected by low resolution, uneven illumination or occlusion. We propose a novel method to perform binarization on gray scale images by minimizing energy functional. Discrete Cosine Transform and Angular Radial Transform are used to extract the features from characters after normalization for scale and translation. We have evaluated our method on the complete test set of Chars74k dataset for English and Kannada scripts consisting of handwritten and synthesized characters, as well as characters extracted from camera captured images. We utilize only synthesized and handwritten characters from this dataset as training set. Nearest neighbor classification is used in our experiments.
Resumo:
Automatic and accurate detection of the closure-burst transition events of stops and affricates serves many applications in speech processing. A temporal measure named the plosion index is proposed to detect such events, which are characterized by an abrupt increase in energy. Using the maxima of the pitch-synchronous normalized cross correlation as an additional temporal feature, a rule-based algorithm is designed that aims at selecting only those events associated with the closure-burst transitions of stops and affricates. The performance of the algorithm, characterized by receiver operating characteristic curves and temporal accuracy, is evaluated using the labeled closure-burst transitions of stops and affricates of the entire TIMIT test and training databases. The robustness of the algorithm is studied with respect to global white and babble noise as well as local noise using the TIMIT test set and on telephone quality speech using the NTIMIT test set. For these experiments, the proposed algorithm, which does not require explicit statistical training and is based on two one-dimensional temporal measures, gives a performance comparable to or better than the state-of-the-art methods. In addition, to test the scalability, the algorithm is applied on the Buckeye conversational speech corpus and databases of two Indian languages. (C) 2014 Acoustical Society of America.
Resumo:
We address the problem of multi-instrument recognition in polyphonic music signals. Individual instruments are modeled within a stochastic framework using Student's-t Mixture Models (tMMs). We impose a mixture of these instrument models on the polyphonic signal model. No a priori knowledge is assumed about the number of instruments in the polyphony. The mixture weights are estimated in a latent variable framework from the polyphonic data using an Expectation Maximization (EM) algorithm, derived for the proposed approach. The weights are shown to indicate instrument activity. The output of the algorithm is an Instrument Activity Graph (IAG), using which, it is possible to find out the instruments that are active at a given time. An average F-ratio of 0 : 7 5 is obtained for polyphonies containing 2-5 instruments, on a experimental test set of 8 instruments: clarinet, flute, guitar, harp, mandolin, piano, trombone and violin.
Resumo:
Anaplastic astrocytoma (AA; Grade III) and glioblastoma (GBM; Grade IV) are diffusely infiltrating tumors and are called malignant astrocytomas. The treatment regimen and prognosis are distinctly different between anaplastic astrocytoma and glioblastoma patients. Although histopathology based current grading system is well accepted and largely reproducible, intratumoral histologic variations often lead to difficulties in classification of malignant astrocytoma samples. In order to obtain a more robust molecular classifier, we analysed RT-qPCR expression data of 175 differentially regulated genes across astrocytoma using Prediction Analysis of Microarrays (PAM) and found the most discriminatory 16-gene expression signature for the classification of anaplastic astrocytoma and glioblastoma. The 16-gene signature obtained in the training set was validated in the test set with diagnostic accuracy of 89%. Additionally, validation of the 16-gene signature in multiple independent cohorts revealed that the signature predicted anaplastic astrocytoma and glioblastoma samples with accuracy rates of 99%, 88%, and 92% in TCGA, GSE1993 and GSE4422 datasets, respectively. The protein-protein interaction network and pathway analysis suggested that the 16-genes of the signature identified epithelial-mesenchymal transition (EMT) pathway as the most differentially regulated pathway in glioblastoma compared to anaplastic astrocytoma. In addition to identifying 16 gene classification signature, we also demonstrated that genes involved in epithelial-mesenchymal transition may play an important role in distinguishing glioblastoma from anaplastic astrocytoma.
Resumo:
In this article, we aim at reducing the error rate of the online Tamil symbol recognition system by employing multiple experts to reevaluate certain decisions of the primary support vector machine classifier. Motivated by the relatively high percentage of occurrence of base consonants in the script, a reevaluation technique has been proposed to correct any ambiguities arising in the base consonants. Secondly, a dynamic time-warping method is proposed to automatically extract the discriminative regions for each set of confused characters. Class-specific features derived from these regions aid in reducing the degree of confusion. Thirdly, statistics of specific features are proposed for resolving any confusions in vowel modifiers. The reevaluation approaches are tested on two databases (a) the isolated Tamil symbols in the IWFHR test set, and (b) the symbols segmented from a set of 10,000 Tamil words. The recognition rate of the isolated test symbols of the IWFHR database improves by 1.9 %. For the word database, the incorporation of the reevaluation step improves the symbol recognition rate by 3.5 % (from 88.4 to 91.9 %). This, in turn, boosts the word recognition rate by 11.9 % (from 65.0 to 76.9 %). The reduction in the word error rate has been achieved using a generic approach, without the incorporation of language models.
Resumo:
Glioblastomas (GBM) are largely incurable as they diffusely infiltrate adjacent brain tissues and are difficult to diagnose at early stages. Biomarkers derived from serum, which can be obtained by minimally invasive procedures, may help in early diagnosis, prognosis and treatment monitoring. To develop a serum cytokine signature, we profiled 48 cytokines in sera derived from normal healthy individuals (n = 26) and different grades of glioma patients (n = 194). We divided the normal and grade IV glioma/GBM serum samples randomly into equal sized training and test sets. In the training set, the Prediction Analysis for Microarrays (PAM) identified a panel of 18 cytokines that could discriminate GBM sera fromnormal sera with maximum accuracy (95.40%) and minimum error (4.60%). The 18-cytokine signature obtained in the training set discriminated GBM sera from normal sera in the test set as well (accuracy 96.55%; error 3.45%). Interestingly, the 18-cytokine signature also differentiated grade II/Diffuse Astrocytoma (DA) and grade III/Anaplastic Astrocytoma (AA) sera from normal sera very efficiently (DA vs. normal-accuracy 96.00%, error 4.00%; AA vs. normal-accuracy 95.83%, error 4.17%). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using 18 cytokines resulted in the enrichment of two pathways, cytokine-cytokine receptor interaction and JAK-STAT pathways with high significance. Thus our study identified an 18-cytokine signature for distinguishing glioma sera fromnormal healthy individual sera and also demonstrated the importance of their differential abundance in glioma biology.
Resumo:
This paper describes the development of the 2003 CU-HTK large vocabulary speech recognition system for Conversational Telephone Speech (CTS). The system was designed based on a multi-pass, multi-branch structure where the output of all branches is combined using system combination. A number of advanced modelling techniques such as Speaker Adaptive Training, Heteroscedastic Linear Discriminant Analysis, Minimum Phone Error estimation and specially constructed Single Pronunciation dictionaries were employed. The effectiveness of each of these techniques and their potential contribution to the result of system combination was evaluated in the framework of a state-of-the-art LVCSR system with sophisticated adaptation. The final 2003 CU-HTK CTS system constructed from some of these models is described and its performance on the DARPA/NIST 2003 Rich Transcription (RT-03) evaluation test set is discussed.
Resumo:
A global numerical model for shallow water flows on the cubed-sphere grid is proposed in this paper. The model is constructed by using the constrained interpolation profile/multi-moment finite volume method (CIP/MM FVM). Two kinds of moments, i.e. the point value (PV) and the volume-integrated average (VIA) are defined and independently updated in the present model by different numerical formulations. The Lax-Friedrichs upwind splitting is used to update the PV moment in terms of a derivative Riemann problem, and a finite volume formulation derived by integrating the governing equations over each mesh element is used to predict the VIA moment. The cubed-sphere grid is applied to get around the polar singularity and to obtain uniform grid spacing for a spherical geometry. Highly localized reconstruction in CIP/MM FVM is well suited for the cubed-sphere grid, especially in dealing with the discontinuity in the coordinates between different patches. The mass conservation is completely achieved over the whole globe. The numerical model has been verified by Williamson's standard test set for shallow water equation model on sphere. The results reveal that the present model is competitive to most existing ones. (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
Raman spectroscopy on single, living epithelial cells captured in a laser trap is shown to have diagnostic power over colorectal cancer. This new single-cell technology comprises three major components: primary culture processing of human tissue samples to produce single-cell suspensions, Raman detection on singly trapped cells, and diagnoses of the cells by artificial neural network classifications. it is compared with DNA flow cytometry for similarities and differences. Its advantages over tissue Raman spectroscopy are also discussed. In the actual construction of a diagnostic model for colorectal cancer, real patient data were taken to generate a training set of 320 Raman spectra and, a test set of 80. By incorporating outlier corrections to a conventional binary neural classifier, our network accomplished significantly better predictions than logistic regressions, with sensitivity improved from 77.5% to 86.3% and specificity improved from 81.3% to 86.3% for the training set and moderate improvements for the test set. Most important, the network approach enables a sensitivity map analysis to quantitate the relevance of each Raman band to the normal-to-cancer transform at the cell level. Our technique has direct clinic applications for diagnosing cancers and basic science potential in the study of cell dynamics of carcinogenesis. (C) 2007 Society of Photo-Optical Instrumentation Engineers.
Resumo:
Raman spectroscopy on single, living epithelial cells captured in a laser trap is shown to have diagnostic power over colorectal cancer. This new single-cell technology comprises three major components: primary culture processing of human tissue samples to produce single-cell suspensions, Raman detection on singly trapped cells, and diagnoses of the cells by artificial neural network classifications. it is compared with DNA flow cytometry for similarities and differences. Its advantages over tissue Raman spectroscopy are also discussed. In the actual construction of a diagnostic model for colorectal cancer, real patient data were taken to generate a training set of 320 Raman spectra and, a test set of 80. By incorporating outlier corrections to a conventional binary neural classifier, our network accomplished significantly better predictions than logistic regressions, with sensitivity improved from 77.5% to 86.3% and specificity improved from 81.3% to 86.3% for the training set and moderate improvements for the test set. Most important, the network approach enables a sensitivity map analysis to quantitate the relevance of each Raman band to the normal-to-cancer transform at the cell level. Our technique has direct clinic applications for diagnosing cancers and basic science potential in the study of cell dynamics of carcinogenesis. (C) 2007 Society of Photo-Optical Instrumentation Engineers.
Resumo:
O potencial eólico do Brasil, de vento firme e com viabilidade econômica de aproveitamento, é de 143 GW. Isso equivale ao dobro de toda a capacidade da geração já instalada no país. No Brasil, a energia eólica tem uma sazonalidade complementar à energia hidrelétrica, porque os períodos de melhor condição de vento coincidem com os de menor capacidade dos reservatórios. O projeto desenvolvido neste trabalho nasceu de uma chamada pública do FINEP, e sob os auspícios do recém criado CEPER. Ao projeto foi incorporado um caráter investigativo, de contribuição científica original, resultando em um produto de tecnologia inovadora para aerogeradores de baixa potência. Dentre os objetivos do projeto, destacamos a avaliação experimental de turbinas eólicas de 5000 W de potência. Mais especificamente, dentro do objetivo geral deste projeto estão incluídas análise estrutural, análise aerodinâmica e análise de viabilidade de novos materiais a serem empregados. Para cada uma das diferentes áreas de conhecimento que compõem o projeto, será adotada a metodologia mais adequada. Para a Análise aerodinâmica foi realizada uma simulação numérica preliminar seguida de ensaios experimentais em túnel de vento. A descrição dos procedimentos adotados é apresentada no Capítulo 3. O Capítulo 4 é dedicado aos testes elétricos. Nesta etapa, foi desenvolvido um banco de testes para obtenção das características específicas das máquinas-base, como curvas de potência, rendimento elétrico, análise e perdas mecânicas e elétricas, e aquecimento. Este capítulo termina com a análise crítica dos valores obtidos. Foram realizados testes de campo de todo o conjunto montado. Atualmente, o aerogerador de 5kW encontra-se em operação, instrumentado e equipado com sistema de aquisição de dados para consolidação dos testes de confiabilidade. Os testes de campo estão ocorrendo na cidade de Campos, RJ, e abrangeram as seguintes dimensões de análise; testes de eficiência para determinação da curva de potência, níveis de ruído e atuação de dispositivos de segurança. Os resultados esperados pelo projeto foram atingidos, consolidando o projeto de um aerogerador de 5000W.