10 resultados para non-negative matrix factorization
em Digital Commons at Florida International University
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
This dissertation introduced substance abuse to the Dynamic Vulnerability Formulation (DVF) and the social competence model to determine if the relationship between schizophrenic symptomatology and coping ability in the DVF applied also to the dually diagnosed schizophrenic or if these variables needed to be modified. It compared the coping abilities of dually and singly diagnosed clients in day treatment and identified, examined, and assessed the relative influence of relevant mediating variables on two dimensions of coping ability of the dually diagnosed: coping skills and coping effort. These variables were: presence of negative and nonnegative symptoms, duration of mental illness, type of substance used, and age of first substance use.^ A priori effect sizes based on previous empirical research were used to interpret the results related to the comparison of demographic, socioeconomic, and treatment characteristics between the singly and dually diagnosed study samples. The data suggested that the singly diagnosed group had higher coping skills than the dually diagnosed group, particularly in the areas of housing stability, work affect, and total social adjustment. The dually diagnosed group had lower scores on one aspect of coping effort--agency or self-efficacy. The data supported the presence of an inverse relationship between symptom severity and coping skills, particularly for the dually diagnosed group. The data did not support the presence of an inverse relationship between symptom severity and coping effort, but did suggest a positive relationship between symptom severity and one measure of coping effort, agency, for the dually diagnosed group. Regression equations using each summary measure of coping skill--social adjustment and role functioning--yielded statistically significant F-ratios. Thirty-six percent of the variance in social adjustment and thirty-one percent of the variance in role functioning were explained by the relative influence of the relevant variables. Both negative and non-negative symptoms were the only significant predictors of social adjustment. The non-negative symptoms variable was the sole significant predictor of role functioning. The results of this study provided partial support for the use of the Dynamic Vulnerability Formulation (DVF) with the dually diagnosed. ^
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
This dissertation utilizes a cross-sectional study to examine the phenomenon of caregiving within a theoretically grounded stress, appraisal, and coping model. Hispanic and non-Hispanic caregivers were studied to examine the factors associated with variance in caregiver appraisal, coping, and outcomes of caregiving strain (depression and somatic complaints) and caregiving gain (life satisfaction, mastery, and personal gain). A purposive sampling strategy was used to recruit 204 Alzheimer's disease caregivers in South Florida. A self-report questionnaire was used to collect demographic data, and to measure stress, appraisal, coping, and psychological well-being of caregivers. Regression equations were developed to compare moderating and mediating models of appraisal and coping. Emotion-focused coping skills were found to significantly moderate the effects of stress (F [1,195] = 4.62, p < .05), explaining approximately 21% of the variance in satisfaction was found to moderate the effects of stress (F [1,195] = 7.09; p < .05), explaining approximately 27% of the variance in personal gain and approximately 8% of the variance in life satisfaction (F [1,195] = 4.14; p < .05). Appraisal of Burden was found to significantly mediate the effects of stress, explaining approximately 30% of the variance in somatic complaints (F [1,196] = 31.60; p < .001) and 32% of the variance in depression (F [1,196] = 38.18; p < .001). The results of the analyses indicate that appraisal and coping skills are important variables in the stress process. The results of this study underscore the importance of accounting for positive and negative outcomes in providing a fuller understanding of the stress, appraisal and coping process of Alzheimer's Disease caregivers. ^
Resumo:
A comprehensive investigation of sensitive ecosystems in South Florida with the main goal of determining the identity, spatial distribution, and sources of both organic biocides and trace elements in different environmental compartments is reported. This study presents the development and validation of a fractionation and isolation method of twelve polar acidic herbicides commonly applied in the vicinity of the study areas, including e.g. 2,4-D, MCPA, dichlorprop, mecroprop, picloram in surface water. Solid phase extraction (SPE) was used to isolate the analytes from abiotic matrices containing large amounts of dissolved organic material. Atmospheric-pressure ionization (API) with electrospray ionization in negative mode (ESP-) in a Quadrupole Ion Trap mass spectrometer was used to perform the characterization of the herbicides of interest. ^ The application of Laser Ablation-ICP-MS methodology in the analysis of soils and sediments is reported in this study. The analytical performance of the method was evaluated on certified standards and real soil and sediment samples. Residential soils were analyzed to evaluate feasibility of using the powerful technique as a routine and rapid method to monitor potential contaminated sites. Forty eight sediments were also collected from semi pristine areas in South Florida to conduct screening of baseline levels of bioavailable elements in support of risk evaluation. The LA-ICP-MS data were used to perform a statistical evaluation of the elemental composition as a tool for environmental forensics. ^ A LA-ICP-MS protocol was also developed and optimized for the elemental analysis of a wide range of elements in polymeric filters containing atmospheric dust. A quantitative strategy based on internal and external standards allowed for a rapid determination of airborne trace elements in filters containing both contemporary African dust and local dust emissions. These distributions were used to qualitative and quantitative assess differences of composition and to establish provenance and fluxes to protected regional ecosystems such as coral reefs and national parks. ^
Resumo:
Non-native predators may have negative impacts on native communities, and these effects may be dependent on interactions among multiple non-native predators. Sequential invasions by predators can enhance risk for native prey. Prey have a limited ability to respond to multiple threats since appropriate responses may conflict, and interactions with recent invaders may be novel. We examined predator–prey interactions among two non-native predators, a recent invader, the African jewelfish, and the longer-established Mayan cichlid, and a native Florida Everglades prey assemblage. Using field enclosures and laboratory aquaria, we compared predatory effects and antipredator responses across five prey taxa. Total predation rates were higher for Mayan cichlids, which also targeted more prey types. The cichlid invaders had similar microhabitat use, but varied in foraging styles, with African jewelfish being more active. The three prey species that experienced predation were those that overlapped in habitat use with predators. Flagfish were consumed by both predators, while riverine grass shrimp and bluefin killifish were eaten only by Mayan cichlids. In mixed predator treatments, we saw no evidence of emergent effects, since interactions between the two cichlid predators were low. Prey responded to predator threats by altering activity but not vertical distribution. Results suggest that prey vulnerability is affected by activity and habitat domain overlap with predators and may be lower to newly invading predators, perhaps due to novelty in the interaction.
Resumo:
The purpose of this study was to determine whether there was a relationship between pressure to perform on state mandated, high-stakes tests and the rate of student escape behavior defined as the number of school suspensions and absences. The state assigned grade of a school was used as a surrogate measure of pressure with the assumption that pressure increased as the school grade decreased. Student attendance and suspension data were gathered from all 33 of the regular public high schools in Miami-Dade County Public Schools. The research questions were: Is the number of suspensions highest in the third quarter, when most FCAT preparation takes place for each of the 3 school years 2007-08 through 2009-10? How accurately does the high school's grade predict the number of suspensions and number of absences during each of the 4 school years 2005-06 through 2008-09? The research questions were answered using repeated measures analysis of variance for research question #1 and non-linear multiple regression for research question #2. No significant difference could be found between the numbers of suspensions in each of the grading periods nor was there a relationship between the number of suspensions and school grade. A statistically significant relationship was found between student attendance and school grade. When plotted, this relationship was found to be quadratic in nature and formed a loose inverted U for each of the four years during which data were collected. This indicated that students in very high and very low performing schools had low levels of absences while those in the midlevel of the distribution of school performance (C schools) had the greatest rates of absence. Identifying a relationship between the pressures associated with high stakes testing and student escape behavior suggests that it might be useful for building administrators to reevaluate test preparation activities and procedures being used in their building and to include anxiety reducing strategies. As a relationship was found, it sets the foundation for future studies to identify whether testing related activities are impacting some students emotionally and are causing unintended consequences of testing mandates.
Resumo:
A comprehensive investigation of sensitive ecosystems in South Florida with the main goal of determining the identity, spatial distribution, and sources of both organic biocides and trace elements in different environmental compartments is reported. This study presents the development and validation of a fractionation and isolation method of twelve polar acidic herbicides commonly applied in the vicinity of the study areas, including e.g. 2,4-D, MCPA, dichlorprop, mecroprop, picloram in surface water. Solid phase extraction (SPE) was used to isolate the analytes from abiotic matrices containing large amounts of dissolved organic material. Atmospheric-pressure ionization (API) with electrospray ionization in negative mode (ESP-) in a Quadrupole Ion Trap mass spectrometer was used to perform the characterization of the herbicides of interest. The application of Laser Ablation-ICP-MS methodology in the analysis of soils and sediments is reported in this study. The analytical performance of the method was evaluated on certified standards and real soil and sediment samples. Residential soils were analyzed to evaluate feasibility of using the powerful technique as a routine and rapid method to monitor potential contaminated sites. Forty eight sediments were also collected from semi pristine areas in South Florida to conduct screening of baseline levels of bioavailable elements in support of risk evaluation. The LA-ICP-MS data were used to perform a statistical evaluation of the elemental composition as a tool for environmental forensics. A LA-ICP-MS protocol was also developed and optimized for the elemental analysis of a wide range of elements in polymeric filters containing atmospheric dust. A quantitative strategy based on internal and external standards allowed for a rapid determination of airborne trace elements in filters containing both contemporary African dust and local dust emissions. These distributions were used to qualitative and quantitative assess differences of composition and to establish provenance and fluxes to protected regional ecosystems such as coral reefs and national parks.
Resumo:
The purpose of this study was to determine whether there was a relationship between pressure to perform on state mandated, high-stakes tests and the rate of student escape behavior defined as the number of school suspensions and absences. The state assigned grade of a school was used as a surrogate measure of pressure with the assumption that pressure increased as the school grade decreased. Student attendance and suspension data were gathered from all 33 of the regular public high schools in Miami-Dade County Public Schools. The research questions were: Is the number of suspensions highest in the third quarter, when most FCAT preparation takes place for each of the 3 school years 2007-08 through 2009-10? How accurately does the high school’s grade predict the number of suspensions and number of absences during each of the 4 school years 2005-06 through 2008-09? The research questions were answered using repeated measures analysis of variance for research question #1 and non-linear multiple regression for research question #2. No significant difference could be found between the numbers of suspensions in each of the grading periods nor was there a relationship between the number of suspensions and school grade. A statistically significant relationship was found between student attendance and school grade. When plotted, this relationship was found to be quadratic in nature and formed a loose inverted U for each of the four years during which data were collected. This indicated that students in very high and very low performing schools had low levels of absences while those in the midlevel of the distribution of school performance (C schools) had the greatest rates of absence. Identifying a relationship between the pressures associated with high stakes testing and student escape behavior suggests that it might be useful for building administrators to reevaluate test preparation activities and procedures being used in their building and to include anxiety reducing strategies. As a relationship was found, it sets the foundation for future studies to identify whether testing related activities are impacting some students emotionally and are causing unintended consequences of testing mandates.