908 resultados para NIR spectroscopy. Hair. Forensic analysis. PCA. Nicotine
Resumo:
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.
Resumo:
The use of quantitative methods has become increasingly important in the study of neuropathology and especially in neurodegenerative disease. Disorders such as Alzheimer's disease (AD) and the frontotemporal dementias (FTD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This chapter reviews the advantages and limitations of the different methods of quantifying pathological lesions in histological sections including estimates of density, frequency, coverage, and the use of semi-quantitative scores. The sampling strategies by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are described. In addition, data analysis methods commonly used to analysis quantitative data in neuropathology, including analysis of variance (ANOVA), polynomial curve fitting, multiple regression, classification trees, and principal components analysis (PCA), are discussed. These methods are illustrated with reference to quantitative studies of a variety of neurodegenerative disorders.
Resumo:
The objectives of this research are to analyze and develop a modified Principal Component Analysis (PCA) and to develop a two-dimensional PCA with applications in image processing. PCA is a classical multivariate technique where its mathematical treatment is purely based on the eigensystem of positive-definite symmetric matrices. Its main function is to statistically transform a set of correlated variables to a new set of uncorrelated variables over $\IR\sp{n}$ by retaining most of the variations present in the original variables.^ The variances of the Principal Components (PCs) obtained from the modified PCA form a correlation matrix of the original variables. The decomposition of this correlation matrix into a diagonal matrix produces a set of orthonormal basis that can be used to linearly transform the given PCs. It is this linear transformation that reproduces the original variables. The two-dimensional PCA can be devised as a two successive of one-dimensional PCA. It can be shown that, for an $m\times n$ matrix, the PCs obtained from the two-dimensional PCA are the singular values of that matrix.^ In this research, several applications for image analysis based on PCA are developed, i.e., edge detection, feature extraction, and multi-resolution PCA decomposition and reconstruction. ^
Resumo:
The Internet has become an integral part of our nation’s critical socio-economic infrastructure. With its heightened use and growing complexity however, organizations are at greater risk of cyber crimes. To aid in the investigation of crimes committed on or via the Internet, a network forensics analysis tool pulls together needed digital evidence. It provides a platform for performing deep network analysis by capturing, recording and analyzing network events to find out the source of a security attack or other information security incidents. Existing network forensics work has been mostly focused on the Internet and fixed networks. But the exponential growth and use of wireless technologies, coupled with their unprecedented characteristics, necessitates the development of new network forensic analysis tools. This dissertation fostered the emergence of a new research field in cellular and ad-hoc network forensics. It was one of the first works to identify this problem and offer fundamental techniques and tools that laid the groundwork for future research. In particular, it introduced novel methods to record network incidents and report logged incidents. For recording incidents, location is considered essential to documenting network incidents. However, in network topology spaces, location cannot be measured due to absence of a ‘distance metric’. Therefore, a novel solution was proposed to label locations of nodes within network topology spaces, and then to authenticate the identity of nodes in ad hoc environments. For reporting logged incidents, a novel technique based on Distributed Hash Tables (DHT) was adopted. Although the direct use of DHTs for reporting logged incidents would result in an uncontrollably recursive traffic, a new mechanism was introduced that overcome this recursive process. These logging and reporting techniques aided forensics over cellular and ad-hoc networks, which in turn increased their ability to track and trace attacks to their source. These techniques were a starting point for further research and development that would result in equipping future ad hoc networks with forensic components to complement existing security mechanisms.
Resumo:
This dissertation establishes a novel data-driven method to identify language network activation patterns in pediatric epilepsy through the use of the Principal Component Analysis (PCA) on functional magnetic resonance imaging (fMRI). A total of 122 subjects’ data sets from five different hospitals were included in the study through a web-based repository site designed here at FIU. Research was conducted to evaluate different classification and clustering techniques in identifying hidden activation patterns and their associations with meaningful clinical variables. The results were assessed through agreement analysis with the conventional methods of lateralization index (LI) and visual rating. What is unique in this approach is the new mechanism designed for projecting language network patterns in the PCA-based decisional space. Synthetic activation maps were randomly generated from real data sets to uniquely establish nonlinear decision functions (NDF) which are then used to classify any new fMRI activation map into typical or atypical. The best nonlinear classifier was obtained on a 4D space with a complexity (nonlinearity) degree of 7. Based on the significant association of language dominance and intensities with the top eigenvectors of the PCA decisional space, a new algorithm was deployed to delineate primary cluster members without intensity normalization. In this case, three distinct activations patterns (groups) were identified (averaged kappa with rating 0.65, with LI 0.76) and were characterized by the regions of: (1) the left inferior frontal Gyrus (IFG) and left superior temporal gyrus (STG), considered typical for the language task; (2) the IFG, left mesial frontal lobe, right cerebellum regions, representing a variant left dominant pattern by higher activation; and (3) the right homologues of the first pattern in Broca's and Wernicke's language areas. Interestingly, group 2 was found to reflect a different language compensation mechanism than reorganization. Its high intensity activation suggests a possible remote effect on the right hemisphere focus on traditionally left-lateralized functions. In retrospect, this data-driven method provides new insights into mechanisms for brain compensation/reorganization and neural plasticity in pediatric epilepsy.
Resumo:
The Internet has become an integral part of our nation's critical socio-economic infrastructure. With its heightened use and growing complexity however, organizations are at greater risk of cyber crimes. To aid in the investigation of crimes committed on or via the Internet, a network forensics analysis tool pulls together needed digital evidence. It provides a platform for performing deep network analysis by capturing, recording and analyzing network events to find out the source of a security attack or other information security incidents. Existing network forensics work has been mostly focused on the Internet and fixed networks. But the exponential growth and use of wireless technologies, coupled with their unprecedented characteristics, necessitates the development of new network forensic analysis tools. This dissertation fostered the emergence of a new research field in cellular and ad-hoc network forensics. It was one of the first works to identify this problem and offer fundamental techniques and tools that laid the groundwork for future research. In particular, it introduced novel methods to record network incidents and report logged incidents. For recording incidents, location is considered essential to documenting network incidents. However, in network topology spaces, location cannot be measured due to absence of a 'distance metric'. Therefore, a novel solution was proposed to label locations of nodes within network topology spaces, and then to authenticate the identity of nodes in ad hoc environments. For reporting logged incidents, a novel technique based on Distributed Hash Tables (DHT) was adopted. Although the direct use of DHTs for reporting logged incidents would result in an uncontrollably recursive traffic, a new mechanism was introduced that overcome this recursive process. These logging and reporting techniques aided forensics over cellular and ad-hoc networks, which in turn increased their ability to track and trace attacks to their source. These techniques were a starting point for further research and development that would result in equipping future ad hoc networks with forensic components to complement existing security mechanisms.
Resumo:
This dissertation establishes a novel data-driven method to identify language network activation patterns in pediatric epilepsy through the use of the Principal Component Analysis (PCA) on functional magnetic resonance imaging (fMRI). A total of 122 subjects’ data sets from five different hospitals were included in the study through a web-based repository site designed here at FIU. Research was conducted to evaluate different classification and clustering techniques in identifying hidden activation patterns and their associations with meaningful clinical variables. The results were assessed through agreement analysis with the conventional methods of lateralization index (LI) and visual rating. What is unique in this approach is the new mechanism designed for projecting language network patterns in the PCA-based decisional space. Synthetic activation maps were randomly generated from real data sets to uniquely establish nonlinear decision functions (NDF) which are then used to classify any new fMRI activation map into typical or atypical. The best nonlinear classifier was obtained on a 4D space with a complexity (nonlinearity) degree of 7. Based on the significant association of language dominance and intensities with the top eigenvectors of the PCA decisional space, a new algorithm was deployed to delineate primary cluster members without intensity normalization. In this case, three distinct activations patterns (groups) were identified (averaged kappa with rating 0.65, with LI 0.76) and were characterized by the regions of: 1) the left inferior frontal Gyrus (IFG) and left superior temporal gyrus (STG), considered typical for the language task; 2) the IFG, left mesial frontal lobe, right cerebellum regions, representing a variant left dominant pattern by higher activation; and 3) the right homologues of the first pattern in Broca's and Wernicke's language areas. Interestingly, group 2 was found to reflect a different language compensation mechanism than reorganization. Its high intensity activation suggests a possible remote effect on the right hemisphere focus on traditionally left-lateralized functions. In retrospect, this data-driven method provides new insights into mechanisms for brain compensation/reorganization and neural plasticity in pediatric epilepsy.
Resumo:
The trioxsalen (Tri) is a low-dose drug used in the treatment of psoriasis and other skin diseases. The aim of the study was applying the thermal analysis and complementary techniques for characterization, evaluation of the trioxsalen stability and components of manipulated pharmaceutical formulations. The thermal behavior of the Tri by TG/DTG-DTA in dynamic atmosphere of synthetic air and nitrogen showed the same profile with a melting peak followed by a volatilization-related event. From the curves TG / DTG is observed a single stage of mass loss. By heating the drug in the stove at temperatures of 80, 240 and 260 °C, it had no change in chemical structure through the techniques of XRD, HPLC, MIR, OM and SEM. From the non-isothermal and isothermal TG kinetic studies was possible to calculate the activation energy and reaction order for the Tri. The drug showed good thermal stability. Studies on drug-excipient compatibility showed interaction of trissoralen with sodium lauryl sulfate 1:1. There was no interaction with aerosol, pregelatinized starch, sodium starch glycolate, cellulose, croscarmellose sodium, magnesium stearate, lactose and mannitol.The characterization of three trioxsalen formulations at concentrations of 2.5, 5, 7.5, 10, 12.5 and 15 mg was performed by DSC, TG / DTG, XRD, NIR and MIR. The PCA classification method based on spectral data from the NIR and MIR of trissoralen formulations allows successful differentiation into three groups. The formulation 3 was the one that best showed analytical profile with the following composition of aerosil excipients, pre-gelatinized starch and cellulose. The activation energy of the volatilization process of the drug was determined in binary mixtures and formulation 3 through fitting and isoconversional methods. The binary mixture with sodium starch glycolate and lactose showed differences in kinetic parameters compared to the drug isolated. The thermoanalytical techniques (DSC and TG / DTG) were shown to be promising methodologies for quantifying trioxsalen obtained by the linearity, selectivity, no use solvents, without sample preparation, speed and practicality.
Resumo:
This work was developed with the objective of proposing a simple, fast and versatile methodological routine using near-infrared spectroscopy (NIR) combined with multivariate analysis for the determination of ash content, moisture, protein and total lipids present in the gray shrimp (Litopenaeus vannamei ) which is conventionally performed gravimetrically after ashing at 550 ° C gravimetrically after drying at 105 ° C for the determination of moisture gravimetrically after a Soxhlet extraction using volumetric and after digestion and distillation Kjedhal respectively. Was first collected the spectra of 63 samples processed boiled shrimp Litopenaeus vannamei species. Then, the determinations by conventional standard methods were carried out. The spectra centered average underwent multiplicative scattering correction of light, smoothing Saviztky-Golay 15 points and first derivative, eliminated the noisy region, the working range was from 1100,36 to 2502,37 nm. Thus, the PLS models for predicting ash showed R 0,9471; 0,1017 and RMSEP RMSEC 0,1548; Moisture R was 0,9241; 2,5483 and RMSEP RMSEC 4,1979; R protein to 0,9201; 1,9391 and RMSEP RMSEC 2,7066; for lipids R 0,8801; 0,2827 and RMSEP RMSEC 0,2329 So that the results showed that the relative errors found between the reference method and the NIR were small and satisfactory. These results are an excellent indication that you can use the NIR to these analyzes, which is quite advantageous, since conventional techniques are time consuming, they spend a lot of reagents and involve a number of professionals, which requires a reasonable runtime while after the validation of the methodology execution using NIR reduces all this time to a few minutes, saving reagents, time and without waste generation, and that this is a non-destructive technique.
Resumo:
Finite-Differences Time-Domain (FDTD) algorithms are well established tools of computational electromagnetism. Because of their practical implementation as computer codes, they are affected by many numerical artefact and noise. In order to obtain better results we propose using Principal Component Analysis (PCA) based on multivariate statistical techniques. The PCA has been successfully used for the analysis of noise and spatial temporal structure in a sequence of images. It allows a straightforward discrimination between the numerical noise and the actual electromagnetic variables, and the quantitative estimation of their respective contributions. Besides, The GDTD results can be filtered to clean the effect of the noise. In this contribution we will show how the method can be applied to several FDTD simulations: the propagation of a pulse in vacuum, the analysis of two-dimensional photonic crystals. In this last case, PCA has revealed hidden electromagnetic structures related to actual modes of the photonic crystal.
Resumo:
This work outlines the theoretical advantages of multivariate methods in biomechanical data, validates the proposed methods and outlines new clinical findings relating to knee osteoarthritis that were made possible by this approach. New techniques were based on existing multivariate approaches, Partial Least Squares (PLS) and Non-negative Matrix Factorization (NMF) and validated using existing data sets. The new techniques developed, PCA-PLS-LDA (Principal Component Analysis – Partial Least Squares – Linear Discriminant Analysis), PCA-PLS-MLR (Principal Component Analysis – Partial Least Squares –Multiple Linear Regression) and Waveform Similarity (based on NMF) were developed to address the challenging characteristics of biomechanical data, variability and correlation. As a result, these new structure-seeking technique revealed new clinical findings. The first new clinical finding relates to the relationship between pain, radiographic severity and mechanics. Simultaneous analysis of pain and radiographic severity outcomes, a first in biomechanics, revealed that the knee adduction moment’s relationship to radiographic features is mediated by pain in subjects with moderate osteoarthritis. The second clinical finding was quantifying the importance of neuromuscular patterns in brace effectiveness for patients with knee osteoarthritis. I found that brace effectiveness was more related to the patient’s unbraced neuromuscular patterns than it was to mechanics, and that these neuromuscular patterns were more complicated than simply increased overall muscle activity, as previously thought.
Resumo:
Quantitative methods can help us understand how underlying attributes contribute to movement patterns. Applying principal components analysis (PCA) to whole-body motion data may provide an objective data-driven method to identify unique and statistically important movement patterns. Therefore, the primary purpose of this study was to determine if athletes’ movement patterns can be differentiated based on skill level or sport played using PCA. Motion capture data from 542 athletes performing three sport-screening movements (i.e. bird-dog, drop jump, T-balance) were analyzed. A PCA-based pattern recognition technique was used to analyze the data. Prior to analyzing the effects of skill level or sport on movement patterns, methodological considerations related to motion analysis reference coordinate system were assessed. All analyses were addressed as case-studies. For the first case study, referencing motion data to a global (lab-based) coordinate system compared to a local (segment-based) coordinate system affected the ability to interpret important movement features. Furthermore, for the second case study, where the interpretability of PCs was assessed when data were referenced to a stationary versus a moving segment-based coordinate system, PCs were more interpretable when data were referenced to a stationary coordinate system for both the bird-dog and T-balance task. As a result of the findings from case study 1 and 2, only stationary segment-based coordinate systems were used in cases 3 and 4. During the bird-dog task, elite athletes had significantly lower scores compared to recreational athletes for principal component (PC) 1. For the T-balance movement, elite athletes had significantly lower scores compared to recreational athletes for PC 2. In both analyses the lower scores in elite athletes represented a greater range of motion. Finally, case study 4 reported differences in athletes’ movement patterns who competed in different sports, and significant differences in technique were detected during the bird-dog task. Through these case studies, this thesis highlights the feasibility of applying PCA as a movement pattern recognition technique in athletes. Future research can build on this proof-of-principle work to develop robust quantitative methods to help us better understand how underlying attributes (e.g. height, sex, ability, injury history, training type) contribute to performance.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
Microsecond long Molecular Dynamics (MD) trajectories of biomolecular processes are now possible due to advances in computer technology. Soon, trajectories long enough to probe dynamics over many milliseconds will become available. Since these timescales match the physiological timescales over which many small proteins fold, all atom MD simulations of protein folding are now becoming popular. To distill features of such large folding trajectories, we must develop methods that can both compress trajectory data to enable visualization, and that can yield themselves to further analysis, such as the finding of collective coordinates and reduction of the dynamics. Conventionally, clustering has been the most popular MD trajectory analysis technique, followed by principal component analysis (PCA). Simple clustering used in MD trajectory analysis suffers from various serious drawbacks, namely, (i) it is not data driven, (ii) it is unstable to noise and change in cutoff parameters, and (iii) since it does not take into account interrelationships amongst data points, the separation of data into clusters can often be artificial. Usually, partitions generated by clustering techniques are validated visually, but such validation is not possible for MD trajectories of protein folding, as the underlying structural transitions are not well understood. Rigorous cluster validation techniques may be adapted, but it is more crucial to reduce the dimensions in which MD trajectories reside, while still preserving their salient features. PCA has often been used for dimension reduction and while it is computationally inexpensive, being a linear method, it does not achieve good data compression. In this thesis, I propose a different method, a nonmetric multidimensional scaling (nMDS) technique, which achieves superior data compression by virtue of being nonlinear, and also provides a clear insight into the structural processes underlying MD trajectories. I illustrate the capabilities of nMDS by analyzing three complete villin headpiece folding and six norleucine mutant (NLE) folding trajectories simulated by Freddolino and Schulten [1]. Using these trajectories, I make comparisons between nMDS, PCA and clustering to demonstrate the superiority of nMDS. The three villin headpiece trajectories showed great structural heterogeneity. Apart from a few trivial features like early formation of secondary structure, no commonalities between trajectories were found. There were no units of residues or atoms found moving in concert across the trajectories. A flipping transition, corresponding to the flipping of helix 1 relative to the plane formed by helices 2 and 3 was observed towards the end of the folding process in all trajectories, when nearly all native contacts had been formed. However, the transition occurred through a different series of steps in all trajectories, indicating that it may not be a common transition in villin folding. The trajectories showed competition between local structure formation/hydrophobic collapse and global structure formation in all trajectories. Our analysis on the NLE trajectories confirms the notion that a tight hydrophobic core inhibits correct 3-D rearrangement. Only one of the six NLE trajectories folded, and it showed no flipping transition. All the other trajectories get trapped in hydrophobically collapsed states. The NLE residues were found to be buried deeply into the core, compared to the corresponding lysines in the villin headpiece, thereby making the core tighter and harder to undo for 3-D rearrangement. Our results suggest that the NLE may not be a fast folder as experiments suggest. The tightness of the hydrophobic core may be a very important factor in the folding of larger proteins. It is likely that chaperones like GroEL act to undo the tight hydrophobic core of proteins, after most secondary structure elements have been formed, so that global rearrangement is easier. I conclude by presenting facts about chaperone-protein complexes and propose further directions for the study of protein folding.