932 resultados para data reduction by factor analysis
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2016
Resumo:
Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier's feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space
Resumo:
The purpose of this study was to better understand the study behaviors and habits of university undergraduate students. It was designed to determine whether undergraduate students could be grouped based on their self-reported study behaviors and if any grouping system could be determined, whether group membership was related to students’ academic achievement. A total of 152 undergraduate students voluntarily participated in the current study by completing the Study Behavior Inventory instrument. All participants were enrolled in fall semester of 2010 at Florida International University. The Q factor analysis technique using principal components extraction and a varimax rotation was used in order to examine the participants in relation to each other and to detect a pattern of intercorrelations among participants based on their self-reported study behaviors. The Q factor analysis yielded a two factor structure representing two distinct student types among participants regarding their study behaviors. The first student type (i.e., Factor 1) describes proactive learners who organize both their study materials and study time well. Type 1 students are labeled “Proactive Learners with Well-Organized Study Behaviors”. The second type (i.e., Factor 2) represents students who are poorly organized as well as being very likely to procrastinate. Type 2 students are labeled Disorganized Procrastinators. Hierarchical linear regression was employed to examine the relationship between student type and academic achievement as measured by current grade point averages (GPAs). The results showed significant differences in GPAs between Type 1 and Type 2 students at the .05 significance level. Furthermore, student type was found to be a significant predictor of academic achievement beyond and above students’ attribute variables including sex, age, major, and enrollment status. The study has several implications for educational researchers, practitioners, and policy makers in terms of improving college students' learning behaviors and outcomes.
Resumo:
The recently proposed global monsoon hypothesis interprets monsoon systems as part of one global-scale atmospheric overturning circulation, implying a connection between the regional monsoon systems and an in-phase behaviour of all northern hemispheric monsoons on annual timescales (Trenberth et al., 2000). Whether this concept can be applied to past climates and variability on longer timescales is still under debate, because the monsoon systems exhibit different regional characteristics such as different seasonality (i.e. onset, peak, and withdrawal). To investigate the interconnection of different monsoon systems during the pre-industrial Holocene, five transient global climate model simulations have been analysed with respect to the rainfall trend and variability in different sub-domains of the Afro-Asian monsoon region. Our analysis suggests that on millennial timescales with varying orbital forcing, the monsoons do not behave as a tightly connected global system. According to the models, the Indian and North African monsoons are coupled, showing similar rainfall trend and moderate correlation in rainfall variability in all models. The East Asian monsoon changes independently during the Holocene. The dissimilarities in the seasonality of the monsoon sub-systems lead to a stronger response of the North African and Indian monsoon systems to the Holocene insolation forcing than of the East Asian monsoon and affect the seasonal distribution of Holocene rainfall variations. Within the Indian and North African monsoon domain, precipitation solely changes during the summer months, showing a decreasing Holocene precipitation trend. In the East Asian monsoon region, the precipitation signal is determined by an increasing precipitation trend during spring and a decreasing precipitation change during summer, partly balancing each other. A synthesis of reconstructions and the model results do not reveal an impact of the different seasonality on the timing of the Holocene rainfall optimum in the different sub-monsoon systems. They rather indicate locally inhomogeneous rainfall changes and show, that single palaeo-records should not be used to characterise the rainfall change and monsoon evolution for entire monsoon sub-systems.
Resumo:
Multi-frequency eddy current measurements are employed in estimating pressure tube (PT) to calandria tube (CT) gap in CANDU fuel channels, a critical inspection activity required to ensure fitness for service of fuel channels. In this thesis, a comprehensive characterization of eddy current gap data is laid out, in order to extract further information on fuel channel condition, and to identify generalized applications for multi-frequency eddy current data. A surface profiling technique, generalizable to multiple probe and conductive material configurations has been developed. This technique has allowed for identification of various pressure tube artefacts, has been independently validated (using ultrasonic measurements), and has been deployed and commissioned at Ontario Power Generation. Dodd and Deeds solutions to the electromagnetic boundary value problem associated with the PT to CT gap probe configuration were experimentally validated for amplitude response to changes in gap. Using the validated Dodd and Deeds solutions, principal components analysis (PCA) has been employed to identify independence and redundancies in multi-frequency eddy current data. This has allowed for an enhanced visualization of factors affecting gap measurement. Results of the PCA of simulation data are consistent with the skin depth equation, and are validated against PCA of physical experiments. Finally, compressed data acquisition has been realized, allowing faster data acquisition for multi-frequency eddy current systems with hardware limitations, and is generalizable to other applications where real time acquisition of large data sets is prohibitive.
Resumo:
Although the benefits of mindfulness meditation practices have been widely documented, research data suggest that there are barriers to regularly engaging in meditation behavior. In order to explore research questions pertaining to meditation initiation and adherence, psychometrically valid scales to assess barriers to meditation practice are necessary. The aim of the present study was to explore the factor structure and construct validity of the Determinants of Meditation Practice Inventory (DMPI) (Williams et al., 2011), a perceived barriers to meditation scale. Exploratory and confirmatory factor analyses along with construct validity tests were performed on data obtained from two large, community samples. Results supported the DMPI as a valid scale assessing perceived barriers with four factors, Lack of Interest, Knowledge Concerns, Pragmatic Concerns and Sociocultural Beliefs. The present study offers a DMPI-revised scale that may be reliably used to assess attitudes and beliefs that might impede meditation behavior.
Resumo:
ABSTRACT Researchers frequently have to analyze scales in which some participants have failed to respond to some items. In this paper we focus on the exploratory factor analysis of multidimensional scales (i.e., scales that consist of a number of subscales) where each subscale is made up of a number of Likert-type items, and the aim of the analysis is to estimate participants' scores on the corresponding latent traits. We propose a new approach to deal with missing responses in such a situation that is based on (1) multiple imputation of non-responses and (2) simultaneous rotation of the imputed datasets. We applied the approach in a real dataset where missing responses were artificially introduced following a real pattern of non-responses, and a simulation study based on artificial datasets. The results show that our approach (specifically, Hot-Deck multiple imputation followed of Consensus Promin rotation) was able to successfully compute factor score estimates even for participants that have missing data.
Resumo:
Some decades of research on emotional development have underlined the contribution of several domains to emotional understanding in childhood. Based on this research, Pons and colleagues (Pons & Harris, 2002; Pons, Harris & Rosnay, 2004) have proposed the Test of Emotion Comprehension (TEC) which assesses nine domains of emotional understanding, namely the recognition of emotions, based on facial expressions; the comprehension of external emotional causes; impact of desire on emotions; emotions based on beliefs; memory influence on emotions; possibility of emotional regulation; possibility of hiding an emotional state; having mixed emotions; contribution of morality to emotional experiences. This instrument was administered individually to 182 Portuguese children aged between 8 and 11 years, of 3rd and 4th grades, in public schools. Additionally, we used the Socially in Action-Peers (SAp) (Rocha, Candeias & Lopes da Silva, 2012) to assess TEC’s criterion-related validity. Mean differences results in TEC by gender and by socio-economic status (SES) were analyzed. The results of the TEC’s psychometric analysis were performed in terms of items’ sensitivity and reliability (stability, test-retest). Finally, in order to explore the theoretical structure underlying TEC a Confirmatory Factor Analysis and a Similarity Structure Analysis were computed. Implications of these findings for emotional understanding assessment and intervention in childhood are discussed.
Resumo:
In the digital age, e-health technologies play a pivotal role in the processing of medical information. As personal health data represents sensitive information concerning a data subject, enhancing data protection and security of systems and practices has become a primary concern. In recent years, there has been an increasing interest in the concept of Privacy by Design, which aims at developing a product or a service in a way that it supports privacy principles and rules. In the EU, Article 25 of the General Data Protection Regulation provides a binding obligation of implementing Data Protection by Design technical and organisational measures. This thesis explores how an e-health system could be developed and how data processing activities could be carried out to apply data protection principles and requirements from the design stage. The research attempts to bridge the gap between the legal and technical disciplines on DPbD by providing a set of guidelines for the implementation of the principle. The work is based on literature review, legal and comparative analysis, and investigation of the existing technical solutions and engineering methodologies. The work can be differentiated by theoretical and applied perspectives. First, it critically conducts a legal analysis on the principle of PbD and it studies the DPbD legal obligation and the related provisions. Later, the research contextualises the rule in the health care field by investigating the applicable legal framework for personal health data processing. Moreover, the research focuses on the US legal system by conducting a comparative analysis. Adopting an applied perspective, the research investigates the existing technical methodologies and tools to design data protection and it proposes a set of comprehensive DPbD organisational and technical guidelines for a crucial case study, that is an Electronic Health Record system.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Breast weight has great economic importance in poultry industry, and may be associated with other variables. This work aimed to estimate phenotypic correlations between performance (live body weight at 7 and 28 days, and at slaughter, and depth of the breast muscle measured by ultrasonography), carcass (eviscerated body weight and leg weight) and body composition (heart, liver and abdominal fat weight) traits in a broiler line, and quantify the direct and indirect influence of these traits on breast weight. Path analysis was used by expanding the matrix of partial correlation in coefficients which give the direct influence of one trait on another, regardless the effect of the other traits. The simultaneous maintenance of live body weight at slaughter and eviscerated body weight in the matrix of correlations might be harmful for statistical analysis involving systems of normal equations, like path analysis, due to the observed multicollinearity. The live body weight at slaughter and the depth of the breast muscle as measured by ultrasonography directly affected breast weight and were identified as the most responsible factors for the magnitude of the correlation coefficients obtained between the studied traits and breast weight. Individual pre-selection for these traits could favor an increased breast weight in the future reproducer candidates of this line if the broilers' environmental conditions and housing are maintained, since the live body weight at slaughter and the depth of breast muscle measured by ultrasonography were directly related to breast weight.
Resumo:
OBJECTIVES: We investigated the influence of sildenafil on cardiac contractility and diastolic relaxation and examined the distribution of phosphodiesterase-5 in the hearts of hypertensive rats that were treated with by NG-nitro-L-arginine methyl ester (L-NAME). METHODS: Male Wistar rats were treated with L-NAME and/or sildenafil for eight weeks. The Langendorff method was used to examine the effects of sildenafil on cardiac contractility and diastolic relaxation. The presence and location of phosphodiesterase-5 and phosphodiesterase-3 were assessed by immunohistochemistry, and cGMP plasma levels were measured by ELISA. RESULTS: In isolated hearts, sildenafil prevented the reduction of diastolic relaxation (dP/dt) that was induced by L-NAME. In addition, phosphodiesterase-5 immunoreactivity was localized in the intercalated discs between the myocardial cells. The staining intensity was reduced by L-NAME, and sildenafil treatment abolished this reduction. Consistent with these results, the plasma levels of cGMP were decreased in the L-NAME-treated rats but not in rats that were treated with L-NAME + sildenafil. CONCLUSION: The sildenafil-induced attenuation of the deleterious hemodynamic and cardiac morphological effects of L-NAME in cardiac myocytes is mediated (at least in part) by the inhibition of phosphodiesterase-5.
Resumo:
This paper presents studies of cases in power systems by Sensitivity Analysis (SA) oriented by Optimal Power Flow (OPF) problems in different operation scenarios. The studies of cases start from a known optimal solution obtained by OPF. This optimal solution is called base case, and from this solution new operation points may be evaluated by SA when perturbations occur in the system. The SA is based on Fiacco`s Theorem and has the advantage of not be an iterative process. In order to show the good performance of the proposed technique tests were carried out on the IEEE 14, 118 and 300 buses systems. (C) 2010 Elsevier Ltd. All rights reserved.