925 resultados para probabilistic principal component analysis (probabilistic PCA)
Resumo:
Poder clasificar de manera precisa la aplicación o programa del que provienen los flujos que conforman el tráfico de uso de Internet dentro de una red permite tanto a empresas como a organismos una útil herramienta de gestión de los recursos de sus redes, así como la posibilidad de establecer políticas de prohibición o priorización de tráfico específico. La proliferación de nuevas aplicaciones y de nuevas técnicas han dificultado el uso de valores conocidos (well-known) en puertos de aplicaciones proporcionados por la IANA (Internet Assigned Numbers Authority) para la detección de dichas aplicaciones. Las redes P2P (Peer to Peer), el uso de puertos no conocidos o aleatorios, y el enmascaramiento de tráfico de muchas aplicaciones en tráfico HTTP y HTTPS con el fin de atravesar firewalls y NATs (Network Address Translation), entre otros, crea la necesidad de nuevos métodos de detección de tráfico. El objetivo de este estudio es desarrollar una serie de prácticas que permitan realizar dicha tarea a través de técnicas que están más allá de la observación de puertos y otros valores conocidos. Existen una serie de metodologías como Deep Packet Inspection (DPI) que se basa en la búsqueda de firmas, signatures, en base a patrones creados por el contenido de los paquetes, incluido el payload, que caracterizan cada aplicación. Otras basadas en el aprendizaje automático de parámetros de los flujos, Machine Learning, que permite determinar mediante análisis estadísticos a qué aplicación pueden pertenecer dichos flujos y, por último, técnicas de carácter más heurístico basadas en la intuición o el conocimiento propio sobre tráfico de red. En concreto, se propone el uso de alguna de las técnicas anteriormente comentadas en conjunto con técnicas de minería de datos como son el Análisis de Componentes Principales (PCA por sus siglas en inglés) y Clustering de estadísticos extraídos de los flujos procedentes de ficheros de tráfico de red. Esto implicará la configuración de diversos parámetros que precisarán de un proceso iterativo de prueba y error que permita dar con una clasificación del tráfico fiable. El resultado ideal sería aquel en el que se pudiera identificar cada aplicación presente en el tráfico en un clúster distinto, o en clusters que agrupen grupos de aplicaciones de similar naturaleza. Para ello, se crearán capturas de tráfico dentro de un entorno controlado e identificando cada tráfico con su aplicación correspondiente, a continuación se extraerán los flujos de dichas capturas. Tras esto, parámetros determinados de los paquetes pertenecientes a dichos flujos serán obtenidos, como por ejemplo la fecha y hora de llagada o la longitud en octetos del paquete IP. Estos parámetros serán cargados en una base de datos MySQL y serán usados para obtener estadísticos que ayuden, en un siguiente paso, a realizar una clasificación de los flujos mediante minería de datos. Concretamente, se usarán las técnicas de PCA y clustering haciendo uso del software RapidMiner. Por último, los resultados obtenidos serán plasmados en una matriz de confusión que nos permitirá que sean valorados correctamente. ABSTRACT. Being able to classify the applications that generate the traffic flows in an Internet network allows companies and organisms to implement efficient resource management policies such as prohibition of specific applications or prioritization of certain application traffic, looking for an optimization of the available bandwidth. The proliferation of new applications and new technics in the last years has made it more difficult to use well-known values assigned by the IANA (Internet Assigned Numbers Authority), like UDP and TCP ports, to identify the traffic. Also, P2P networks and data encapsulation over HTTP and HTTPS traffic has increased the necessity to improve these traffic analysis technics. The aim of this project is to develop a number of techniques that make us able to classify the traffic with more than the simple observation of the well-known ports. There are some proposals that have been created to cover this necessity; Deep Packet Inspection (DPI) tries to find signatures in the packets reading the information contained in them, the payload, looking for patterns that can be used to characterize the applications to which that traffic belongs; Machine Learning procedures work with statistical analysis of the flows, trying to generate an automatic process that learns from those statistical parameters and calculate the likelihood of a flow pertaining to a certain application; Heuristic Techniques, finally, are based in the intuition or the knowledge of the researcher himself about the traffic being analyzed that can help him to characterize the traffic. Specifically, the use of some of the techniques previously mentioned in combination with data mining technics such as Principal Component Analysis (PCA) and Clustering (grouping) of the flows extracted from network traffic captures are proposed. An iterative process based in success and failure will be needed to configure these data mining techniques looking for a reliable traffic classification. The perfect result would be the one in which the traffic flows of each application is grouped correctly in each cluster or in clusters that contain group of applications of similar nature. To do this, network traffic captures will be created in a controlled environment in which every capture is classified and known to pertain to a specific application. Then, for each capture, all the flows will be extracted. These flows will be used to extract from them information such as date and arrival time or the IP length of the packets inside them. This information will be then loaded to a MySQL database where all the packets defining a flow will be classified and also, each flow will be assigned to its specific application. All the information obtained from the packets will be used to generate statistical parameters in order to describe each flow in the best possible way. After that, data mining techniques previously mentioned (PCA and Clustering) will be used on these parameters making use of the software RapidMiner. Finally, the results obtained from the data mining will be compared with the real classification of the flows that can be obtained from the database. A Confusion Matrix will be used for the comparison, letting us measure the veracity of the developed classification process.
Resumo:
Averaged event-related potential (ERP) data recorded from the human scalp reveal electroencephalographic (EEG) activity that is reliably time-locked and phase-locked to experimental events. We report here the application of a method based on information theory that decomposes one or more ERPs recorded at multiple scalp sensors into a sum of components with fixed scalp distributions and sparsely activated, maximally independent time courses. Independent component analysis (ICA) decomposes ERP data into a number of components equal to the number of sensors. The derived components have distinct but not necessarily orthogonal scalp projections. Unlike dipole-fitting methods, the algorithm does not model the locations of their generators in the head. Unlike methods that remove second-order correlations, such as principal component analysis (PCA), ICA also minimizes higher-order dependencies. Applied to detected—and undetected—target ERPs from an auditory vigilance experiment, the algorithm derived ten components that decomposed each of the major response peaks into one or more ICA components with relatively simple scalp distributions. Three of these components were active only when the subject detected the targets, three other components only when the target went undetected, and one in both cases. Three additional components accounted for the steady-state brain response to a 39-Hz background click train. Major features of the decomposition proved robust across sessions and changes in sensor number and placement. This method of ERP analysis can be used to compare responses from multiple stimuli, task conditions, and subject states.
Resumo:
A method is given for determining the time course and spatial extent of consistently and transiently task-related activations from other physiological and artifactual components that contribute to functional MRI (fMRI) recordings. Independent component analysis (ICA) was used to analyze two fMRI data sets from a subject performing 6-min trials composed of alternating 40-sec Stroop color-naming and control task blocks. Each component consisted of a fixed three-dimensional spatial distribution of brain voxel values (a “map”) and an associated time course of activation. For each trial, the algorithm detected, without a priori knowledge of their spatial or temporal structure, one consistently task-related component activated during each Stroop task block, plus several transiently task-related components activated at the onset of one or two of the Stroop task blocks only. Activation patterns occurring during only part of the fMRI trial are not observed with other techniques, because their time courses cannot easily be known in advance. Other ICA components were related to physiological pulsations, head movements, or machine noise. By using higher-order statistics to specify stricter criteria for spatial independence between component maps, ICA produced improved estimates of the temporal and spatial extent of task-related activation in our data compared with principal component analysis (PCA). ICA appears to be a promising tool for exploratory analysis of fMRI data, particularly when the time courses of activation are not known in advance.
Resumo:
O presente trabalho tem por objetivo avaliar o desempenho físico-mecânico e a durabilidade de painéis de partículas de bagaço de cana-de-açúcar com resina bicomponente a base de mamona (BCP) e compará-los com painéis de partículas de madeira comerciais (Medium Density Particleboard - MDP). Os painéis de bagaço de cana de açúcar foram fabricados com um teor de resina poliuretana a base de óleo de maona de 15%. O desempenho físico e mecânico dos painéis particulados foi analisado com base nas prescrições dos documentos normativos vigentes. Ambos os materiais foram revestidos superficialmente com resina poliuretana bicomponente à base de óleo de mamona. Avaliou-se a influência do tratamento das bordas na deterioração e no desempenho dos painéis. O acompanhamento das propriedades físico-mecânicas foi realizado antes e após os ensaios de envelhecimento por exposição natural durante 3, 6 e 12 meses, envelhecimento acelerado e de intemperismo artificial. Foi feita a avaliação, da suscetibilidade ao crescimento gerada pelo ataque de fungos emboloradores e apodrecedores nos materiais durante o envelhecimento natural e no ensaio acelerado. Foi realizada a análise colorimétrica para a identificação de mudanças de cor e brilho nos materiais após os ensaios de deterioração. Foram utilizadas as técnicas de densitometria de raios X, espectroscopia por infravermelho próximo (NIR). Os resultados obtidos indicaram a selagem lateral permitiu avaliar a superfície exposta do material permitindo a entrada da água pela superfície avaliando o efeito dos agentes de deterioração. A porcentagem de retenção para o Módulo de ruptura após o ensaio de envelhecimento por imersão em agua e secagem (APA D1) foi de 87% e 3% para BCP e MDP sem revestimento respectivamente e de 90% e 3% para BCP e MDP com revestimento. A porcentagem de retenção das propriedades mecânicas em ambos os submetidos à exposição natural diminuiu em função do tempo. Entretanto o porcentagem de retenção para os materiais BCP e MDP com revestimento superficial foi de 76% e 60% para MOR. A exposição natural mostrou que os fungos emboloradores foram predominantes em ambos os materiais. Ambos os materiais com revestimento superficial apresentaram entre 1-10% de colonização com um 70% de probabilidade. Revestimento de resina de óleo de mamona reduz o crescimento de fungos em ambos os materiais no ensaio acelerado. O perfil de densitometria permitiu analisar o processo de fabricação dos painéis e permitiu identificar a deterioração gradativa do ambos os materiais após os ensaios de envelhecimento. A intepretação mediante a analise de componentes principais (ACP) na aplicação do NIR comportou a classificação das características relacionadas a cada ensaio de deterioração de ambos os materiais sem revestimento superficial. Com base nos resultados deste trabalho, foram propostas contribuições para ajustes de metodologias para a avaliação da durabilidade e do desempenho físico e mecânico dos painéis particulados, tendo em vista a sua viabilidade técnica, em sistemas construtivos da construção civil.
Resumo:
Este trabalho tem como intuito propor um modelo de inovação para a indústria da moda feminina. O modelo visa compreender o comportamento de estilos e tendências determinados e difundidos pelas empresas. A construção deste modelo é justificada pela contribuição que um estudo sobre inovação pode proporcionar à indústria da moda, a qual enfrenta baixos padrões de competitividade no mercado externo e interno. Além disso, embora existam muitos artigos sobre o assunto, poucos foram os modelos de inovação para a indústria da moda encontrados por esta pesquisa. Uma avaliação destes modelos indicou que existe espaço para a proposta de um modelo que aborde o comportamento de estilos e tendências ao longo do tempo. A estrutura de composição do modelo é sustentada por três pilares conceituais: teoria econômica neoschumpeteriana, modelos de inovação e modelos de inovação para a indústria da moda. A característica central do modelo é avaliar se existem estilos que permanecem em moda de maneira contínua ou descontínua. Como existe similaridade conceitual entre os estilos, no que se refere à identidade de gênero (androginia e feminilidade), foi efetuada uma aglutinação de alguns estilos dentro desta denominação. Nem todos os estilos se encaixaram nesta classificação. Então, estes estilos foram denominados como neutros. Como a pesquisa tem abordagem fenomenológica, qualitativa e longitudinal, foi adotada a metodologia hipotética dedutiva para a construção do modelo. Para verificação da validade das hipóteses foi usada uma análise exploratória dos dados por meio de estatística descritiva e decomposição da estrutura de variabilidade através de uma análise de componentes principais (PCA). Ambas as análises forneceram evidências a respeito das hipóteses em questão, as quais também foram testadas através de um teste binomial e de uma análise de variância multivariada por meio de permutações. Os resultados comprovaram que existem estilos que permanecem em moda de maneira contínua e que existem períodos de polarização das aglutinações de estilo.
Resumo:
Deformable Template models are first applied to track the inner wall of coronary arteries in intravascular ultrasound sequences, mainly in the assistance to angioplasty surgery. A circular template is used for initializing an elliptical deformable model to track wall deformation when inflating a balloon placed at the tip of the catheter. We define a new energy function for driving the behavior of the template and we test its robustness both in real and synthetic images. Finally we introduce a framework for learning and recognizing spatio-temporal geometric constraints based on Principal Component Analysis (eigenconstraints).
Resumo:
A combined chemometrics-metabolomics approach [excitation–emission matrix (EEM) fluorescence spectroscopy, nuclear magnetic resonance (NMR) and high performance liquid chromatography–mass spectrometry (HPLC–MS)] was used to analyse the rhizodeposition of the tritrophic system: tomato, the plant-parasitic nematode Meloidogyne javanica and the nematode-egg parasitic fungus Pochonia chlamydosporia. Exudates from M. javanica roots were sampled at root penetration (early) and gall development (late). EMM indicated that late root exudates from M. javanica treatments contained more aromatic amino acid compounds than the rest (control, P. chlamydosporia or P. chlamydosporia and M. javanica). 1H NMR showed that organic acids (acetate, lactate, malate, succinate and formic acid) and one unassigned aromatic compound (peak no. 22) were the most relevant metabolites in root exudates. Robust principal component analysis (PCA) grouped early exudates for nematode (PC1) or fungus presence (PC3). PCA found (PC1, 73.31 %) increased acetate and reduced lactate and an unassigned peak no. 22 characteristic of M. javanica root exudates resulting from nematode invasion and feeding. An increase of peak no. 22 (PC3, 4.82 %) characteristic of P. chlamydosporia exudates could be a plant “primer” defence. In late ones in PC3 (8.73 %) the presence of the nematode grouped the samples. HPLC–MS determined rhizosphere fingerprints of 16 (early) and 25 (late exudates) m/z signals, respectively. Late signals were exclusive from M. javanica exudates confirming EEM and 1H NMR results. A 235 m/z signal reduced in M. javanica root exudates (early and late) could be a repressed plant defense. This metabolomic approach and other rhizosphere -omics studies could help to improve plant growth and reduce nematode damage sustainably.
Resumo:
We examine the quantitative composition of benthic foraminiferal assemblages of Rose Bengal-stained surface samples from 37 stations in the Laptev Sea, and combine this data set with an existing data set along a transect from Spitsbergen to the central Arctic Ocean. Foraminiferal test accumulation rates, diversity, faunal composition and statistically defined foraminiferal associations are analysed for living (Rose Bengal-stained) and dead foraminifers. We compare the results of several benthic foraminiferal diversity indices and statistically defined foraminiferal associations, including Fisher's alpha and Shannon-Wiener diversity indices, Q-mode principal component analysis and correspondence analysis. Diversity and faunal density (standing stock) of living benthic foraminifers are positively correlated to trophic resources. In contrast, the accumulation rate of dead foraminifers (BFAR) shows fluctuating values depending on test disintegration processes. Foraminiferal associations defined by Q-mode principal component analysis and correspondence analysis are comparable. The factor values of the correspondence analysis allow a quantitative correlation between the foraminiferal fauna and the local carbon flux, which may be used as a tool to estimate changes in primary productivity.
Resumo:
Edaphic factors affect the quality of onions (Allium cepa). Two experiments were carried out in the field and glasshouse to investigate the effects of N (field: 0, 120 kg ha(-1); glasshouse: 0, 108 kg ha(-1)), S (field: 0, 20 kg ha(-1); glasshouse: 0, 4.35 kg ha(-1)) and soil type (clay, sandy loam) on onion quality. A conducting polymer sensor electronic nose (E-nose) was used to classify onion headspace volatiles. Relative changes in the E-nose sensor resistance ratio (%dR/R) were reduced following N and S fertilisation. A 2D Principal Component Analysis (PCA) of the E-nose data sets accounted for c. 100% of the variations in onion headspace volatiles in both experiments. For the field experiment, E-nose data set clusters for headspace volatiles for no N-added onions overlapped (D-2 = 1.0) irrespective of S treatment. Headspace volatiles of N-fertilised onions for the glasshouse sandy loam also overlapped (D-2 = 1.1) irrespective of S treatment as compared with distinct separations among clusters for the clay soil. N fertilisation significantly (P < 0.01) reduced onion bulb pyruvic acid concentration (flavour) in both experiments. S fertilisation increased pyruvic acid concentration significantly (P < 0.01) in the glasshouse experiment, especially for the clay soil, but had no effect on pyruvic acid concentration in the field. N and S fertilisation significantly (P < 0.01) increased lachrymatory potency (pungency), but reduced total soluble solids (TSS) content in the field experiment. In the glasshouse experiment, N and S had no effect on TSS. TSS content was increased on the clay by 1.2-fold as compared with the sandy loam. Onion tissue N:water-soluble SO42- ratios of between five and eight were associated with greater %dR/R and pyruvic acid concentration values. N did not affect inner bulb tissue microbial load. In contrast, S fertilisation reduced inner bulb tissue microbial load by 80% in the field experiment and between 27% (sandy loam) and 92% (clay) in the glasshouse experiment. Overall, onion bulb quality discriminated by the E-nose responded to N, S and soil type treatments, and reflected their interactions. However, the conventional analytical and sensory measures of onion quality did not correlate with %dR/R.
Resumo:
Biological wastewater treatment is a complex, multivariate process, in which a number of physical and biological processes occur simultaneously. In this study, principal component analysis (PCA) and parallel factor analysis (PARAFAC) were used to profile and characterise Lagoon 115E, a multistage biological lagoon treatment system at Melbourne Water's Western Treatment Plant (WTP) in Melbourne, Australia. In this study, the objective was to increase our understanding of the multivariate processes taking place in the lagoon. The data used in the study span a 7-year period during which samples were collected as often as weekly from the ponds of Lagoon 115E and subjected to analysis. The resulting database, involving 19 chemical and physical variables, was studied using the multivariate data analysis methods PCA and PARAFAC. With these methods, alterations in the state of the wastewater due to intrinsic and extrinsic factors could be discerned. The methods were effective in illustrating and visually representing the complex purification stages and cyclic changes occurring along the lagoon system. The two methods proved complementary, with each having its own beneficial features. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Genotype, sulphur (S) nutrition and soil-type effects on spring onion quality were assessed using a 32-conducting polymer sensor E-nose. Relative changes in sensor resistance ratio (% dR/R) varied among eight spring onion genotypes. The % dR/R was reduced by S application in four of the eight genotypes. For the other four genotypes, S application gave no change in % dR/R in three, and increased % dR/R in the other. E-nose classification of headspace volatiles by a two-dimensional principal component analysis (PCA) plot for spring onion genotypes differed for S fertilisation vs. no S fertilisation. Headspace volatiles data set clusters for cv. 'White Lisbon' grown on clay or on sandy loam overlapped when 2.9 [Mahalanobis distance value (D2) = 1.6], or 5.8-(D2 = 0.3) kg S ha-1 was added. In contrast, clear separation (D2 = 7.5) was recorded for headspace volatile clusters for 0 kg S hd-1 on clay vs. sandy loam. Addition of 5.8 kg S ha-1 increased pyruvic acid content (mmole g-1 fresh weight) by 1.7-fold on average across the eight genotypes. However, increased S from 2.9 to 5.8 kg ha-1 did not significantly (P > 0.05) influence % dR/R, % dry matter (DM) or total soluble solids (TSS) contents, but significantly (P < 0.05) increased pyruvic acid content. TSS was significantly (P < 0.05) reduced by S addition, while % DM was unaffected. In conclusion, the 32-conducting polymer E-nose discerned differences in spring onion quality that were attributable to genotype and to variations in growing conditions as shown by the significant (P < 0.05) interaction effects for % dR/R.
Resumo:
Most face recognition systems only work well under quite constrained environments. In particular, the illumination conditions, facial expressions and head pose must be tightly controlled for good recognition performance. In 2004, we proposed a new face recognition algorithm, Adaptive Principal Component Analysis (APCA) [4], which performs well against both lighting variation and expression change. But like other eigenface-derived face recognition algorithms, APCA only performs well with frontal face images. The work presented in this paper is an extension of our previous work to also accommodate variations in head pose. Following the approach of Cootes et al, we develop a face model and a rotation model which can be used to interpret facial features and synthesize realistic frontal face images when given a single novel face image. We use a Viola-Jones based face detector to detect the face in real-time and thus solve the initialization problem for our Active Appearance Model search. Experiments show that our approach can achieve good recognition rates on face images across a wide range of head poses. Indeed recognition rates are improved by up to a factor of 5 compared to standard PCA.
Resumo:
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.
Resumo:
A new principled domain independent watermarking framework is presented. The new approach is based on embedding the message in statistically independent sources of the covertext to mimimise covertext distortion, maximise the information embedding rate and improve the method's robustness against various attacks. Experiments comparing the performance of the new approach, on several standard attacks show the current proposed approach to be competitive with other state of the art domain-specific methods.
Resumo:
A novel approach to watermarking of audio signals using Independent Component Analysis (ICA) is proposed. It exploits the statistical independence of components obtained by practical ICA algorithms to provide a robust watermarking scheme with high information rate and low distortion. Numerical simulations have been performed on audio signals, showing good robustness of the watermark against common attacks with unnoticeable distortion, even for high information rates. An important aspect of the method is its domain independence: it can be used to hide information in other types of data, with minor technical adaptations.