844 resultados para Failure time data analysis
Resumo:
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.
Resumo:
Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.
Resumo:
With the latest development in computer science, multivariate data analysis methods became increasingly popular among economists. Pattern recognition in complex economic data and empirical model construction can be more straightforward with proper application of modern softwares. However, despite the appealing simplicity of some popular software packages, the interpretation of data analysis results requires strong theoretical knowledge. This book aims at combining the development of both theoretical and applicationrelated data analysis knowledge. The text is designed for advanced level studies and assumes acquaintance with elementary statistical terms. After a brief introduction to selected mathematical concepts, the highlighting of selected model features is followed by a practice-oriented introduction to the interpretation of SPSS1 outputs for the described data analysis methods. Learning of data analysis is usually time-consuming and requires efforts, but with tenacity the learning process can bring about a significant improvement of individual data analysis skills.
Resumo:
Regulatory focus theory (RFT) proposes two different social-cognitive motivational systems for goal pursuit: a promotion system, which is organized around strategic approach behaviors and "making good things happen," and a prevention system, which is organized around strategic avoidance and "keeping bad things from happening." The promotion and prevention systems have been extensively studied in behavioral paradigms, and RFT posits that prolonged perceived failure to make progress in pursuing promotion or prevention goals can lead to ineffective goal pursuit and chronic distress (Higgins, 1997).
Research has begun to focus on uncovering the neural correlates of the promotion and prevention systems in an attempt to differentiate them at the neurobiological level. Preliminary research suggests that the promotion and prevention systems have both distinct and overlapping neural correlates (Eddington, Dolcos, Cabeza, Krishnan, & Strauman, 2007; Strauman et al., 2013). However, little research has examined how individual differences in regulatory focus develop and manifest. The development of individual differences in regulatory focus is particularly salient during adolescence, a crucial topic to explore given the dramatic neurodevelopmental and psychosocial changes that take place during this time, especially with regard to self-regulatory abilities. A number of questions remain unexplored, including the potential for goal-related neural activation to be modulated by (a) perceived proximity to goal attainment, (b) individual differences in regulatory orientation, specifically general beliefs about one's success or failure in attaining the two kinds of goals, (c) age, with a particular focus on adolescence, and (d) homozygosity for the Met allele of the catechol-O-methyltransferase (COMT) Val158Met polymorphism, a naturally occurring genotype which has been shown to impact prefrontal cortex activation patterns associated with goal pursuit behaviors.
This study explored the neural correlates of the promotion and prevention systems through the use of a priming paradigm involving rapid, brief, masked presentation of individually selected promotion and prevention goals to each participant while being scanned. The goals used as priming stimuli varied with regard to whether participants reported that they were close to or far away from achieving them (i.e. a "match" versus a "mismatch" representing perceived success or failure in personal goal pursuit). The study also assessed participants' overall beliefs regarding their relative success or failure in attaining promotion and prevention goals, and all participants were genotyped for the COMT Val158Met polymorphism.
A number of significant findings emerged. Both promotion and prevention priming were associated with activation in regions associated with self-referential cognition, including the left medial prefrontal cortex, cuneus, and lingual gyrus. Promotion and prevention priming were also associated with distinct patterns of neural activation; specifically, left middle temporal gyrus activation was found to be significantly greater during prevention priming. Activation in response to promotion and prevention goals was found to be modulated by self-reports of both perceived proximity to goal achievement and goal orientation. Age also had a significant effect on activation, such that activation in response to goal priming became more robust in the prefrontal cortex and in default mode network regions as a function of increasing age. Finally, COMT genotype also modulated the neural response to goal priming both alone and through interactions with regulatory focus and age. Overall, these findings provide further clarification of the neural underpinnings of the promotion and prevention systems as well as provide information about the role of development and individual differences at the personality and genetic level on activity in these neural systems.
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
Image processing offers unparalleled potential for traffic monitoring and control. For many years engineers have attempted to perfect the art of automatic data abstraction from sequences of video images. This paper outlines a research project undertaken at Napier University by the authors in the field of image processing for automatic traffic analysis. A software based system implementing TRIP algorithms to count cars and measure vehicle speed has been developed by members of the Transport Engineering Research Unit (TERU) at the University. The TRIP algorithm has been ported and evaluated on an IBM PC platform with a view to hardware implementation of the pre-processing routines required for vehicle detection. Results show that a software based traffic counting system is realisable for single window processing. Due to the high volume of data required to be processed for full frames or multiple lanes, system operations in real time are limited. Therefore specific hardware is required to be designed. The paper outlines a hardware design for implementation of inter-frame and background differencing, background updating and shadow removal techniques. Preliminary results showing the processing time and counting accuracy for the routines implemented in software are presented and a real time hardware pre-processing architecture is described.
Resumo:
For a long time, electronic data analysis has been associated with quantitative methods. However, Computer Assisted Qualitative Data Analysis Software (CAQDAS) are increasingly being developed. Although the CAQDAS has been there for decades, very few qualitative health researchers report using it. This may be due to the difficulties that one has to go through to master the software and the misconceptions that are associated with using CAQDAS. While the issue of mastering CAQDAS has received ample attention, little has been done to address the misconceptions associated with CAQDAS. In this paper, the author reflects on his experience of interacting with one of the popular CAQDAS (NVivo) in order to provide evidence-based implications of using the software. The key message is that unlike statistical software, the main function of CAQDAS is not to analyse data but rather to aid the analysis process, which the researcher must always remain in control of. In other words, researchers must equally know that no software can analyse qualitative data. CAQDAS are basically data management packages, which support the researcher during analysis.
Resumo:
Analyzing large-scale gene expression data is a labor-intensive and time-consuming process. To make data analysis easier, we developed a set of pipelines for rapid processing and analysis poplar gene expression data for knowledge discovery. Of all pipelines developed, differentially expressed genes (DEGs) pipeline is the one designed to identify biologically important genes that are differentially expressed in one of multiple time points for conditions. Pathway analysis pipeline was designed to identify the differentially expression metabolic pathways. Protein domain enrichment pipeline can identify the enriched protein domains present in the DEGs. Finally, Gene Ontology (GO) enrichment analysis pipeline was developed to identify the enriched GO terms in the DEGs. Our pipeline tools can analyze both microarray gene data and high-throughput gene data. These two types of data are obtained by two different technologies. A microarray technology is to measure gene expression levels via microarray chips, a collection of microscopic DNA spots attached to a solid (glass) surface, whereas high throughput sequencing, also called as the next-generation sequencing, is a new technology to measure gene expression levels by directly sequencing mRNAs, and obtaining each mRNA’s copy numbers in cells or tissues. We also developed a web portal (http://sys.bio.mtu.edu/) to make all pipelines available to public to facilitate users to analyze their gene expression data. In addition to the analyses mentioned above, it can also perform GO hierarchy analysis, i.e. construct GO trees using a list of GO terms as an input.
The Optimal Smoothing of the Wigner-Ville Distribution for Real-Life Signals Time-Frequency Analysis