845 resultados para mining data streams
MINING AND VERIFICATION OF TEMPORAL EVENTS WITH APPLICATIONS IN COMPUTER MICRO-ARCHITECTURE RESEARCH
Resumo:
Computer simulation programs are essential tools for scientists and engineers to understand a particular system of interest. As expected, the complexity of the software increases with the depth of the model used. In addition to the exigent demands of software engineering, verification of simulation programs is especially challenging because the models represented are complex and ridden with unknowns that will be discovered by developers in an iterative process. To manage such complexity, advanced verification techniques for continually matching the intended model to the implemented model are necessary. Therefore, the main goal of this research work is to design a useful verification and validation framework that is able to identify model representation errors and is applicable to generic simulators. The framework that was developed and implemented consists of two parts. The first part is First-Order Logic Constraint Specification Language (FOLCSL) that enables users to specify the invariants of a model under consideration. From the first-order logic specification, the FOLCSL translator automatically synthesizes a verification program that reads the event trace generated by a simulator and signals whether all invariants are respected. The second part consists of mining the temporal flow of events using a newly developed representation called State Flow Temporal Analysis Graph (SFTAG). While the first part seeks an assurance of implementation correctness by checking that the model invariants hold, the second part derives an extended model of the implementation and hence enables a deeper understanding of what was implemented. The main application studied in this work is the validation of the timing behavior of micro-architecture simulators. The study includes SFTAGs generated for a wide set of benchmark programs and their analysis using several artificial intelligence algorithms. This work improves the computer architecture research and verification processes as shown by the case studies and experiments that have been conducted.
Resumo:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
Resumo:
Purpose – The purpose of this paper is to contribute to the ongoing debate on governance, accountability, transparency and corporate social responsibility (CSR) in the mining sector of a developing country context. It examines the reporting practices of the two largest transnational gold-mining companies in Tanzania in order to draw attention to the role played by local government regulations and advocacy and campaigning by nationally organised non-governmental organisations (NGOs) with respect to promoting corporate social reporting practices. Design/methodology/approach – The paper takes a political economy perspective to consider the serious implications of the neo-liberal ideologies of the global capitalist economy, as manifested in Tanzania’s regulatory framework and in NGO activism, for the corporate disclosure, accountability and responsibility of transnational companies (TNCs). A qualitative field case study methodology is adopted to locate the largely unfamiliar issues of CSR in the Tanzanian mining sector within a more familiar literature on social accounting. Data for the case study were obtained from interviews and from analysis of documents such as annual reports, social responsibility reports, newspapers, NGO reports and other publicly available documents. Findings – Analysis of interviews, press clips and NGO reports draws attention to social and environmental problems in the Tanzanian mining sector, which are arguably linked to the manifestation of the broader crisis of neo-liberal agendas. While these issues have serious impacts on local populations in the mining areas, they often remain invisible in mining companies’ social disclosures. Increasing evidence of social and environmental ills raises serious questions about the effectiveness of the regulatory frameworks, as well as the roles played by NGOs and other pressure groups in Tanzania. Practical implications – By empowering local NGOs through educational, capacity building, technological and other support, NGOs’ advocacy, campaigning and networking with other civil society groups can play a pivotal role in encouraging corporations, especially TNCs, to adopt more socially and environmentally responsible business practices and to adhere to international and local standards, which in turn may help to improve the lives of many poor people living in developing countries in general, and Tanzania in particular. Originality/value – This paper contributes insights from gold-mining activities in Tanzania to the existing literature on CSR in the mining sector. It also contributes to political economy theory by locating CSR reporting within the socio-political and regulatory context in which mining operations take place in Tanzania. It is argued that, for CSR reporting to be effective, robust regulations and enforcement and stronger political pressure must be put in place.
Resumo:
Les métaheuristiques sont très utilisées dans le domaine de l'optimisation discrète. Elles permettent d’obtenir une solution de bonne qualité en un temps raisonnable, pour des problèmes qui sont de grande taille, complexes, et difficiles à résoudre. Souvent, les métaheuristiques ont beaucoup de paramètres que l’utilisateur doit ajuster manuellement pour un problème donné. L'objectif d'une métaheuristique adaptative est de permettre l'ajustement automatique de certains paramètres par la méthode, en se basant sur l’instance à résoudre. La métaheuristique adaptative, en utilisant les connaissances préalables dans la compréhension du problème, des notions de l'apprentissage machine et des domaines associés, crée une méthode plus générale et automatique pour résoudre des problèmes. L’optimisation globale des complexes miniers vise à établir les mouvements des matériaux dans les mines et les flux de traitement afin de maximiser la valeur économique du système. Souvent, en raison du grand nombre de variables entières dans le modèle, de la présence de contraintes complexes et de contraintes non-linéaires, il devient prohibitif de résoudre ces modèles en utilisant les optimiseurs disponibles dans l’industrie. Par conséquent, les métaheuristiques sont souvent utilisées pour l’optimisation de complexes miniers. Ce mémoire améliore un procédé de recuit simulé développé par Goodfellow & Dimitrakopoulos (2016) pour l’optimisation stochastique des complexes miniers stochastiques. La méthode développée par les auteurs nécessite beaucoup de paramètres pour fonctionner. Un de ceux-ci est de savoir comment la méthode de recuit simulé cherche dans le voisinage local de solutions. Ce mémoire implémente une méthode adaptative de recherche dans le voisinage pour améliorer la qualité d'une solution. Les résultats numériques montrent une augmentation jusqu'à 10% de la valeur de la fonction économique.
Resumo:
Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.
Resumo:
Digital forensics as a field has progressed alongside technological advancements over the years, just as digital devices have gotten more robust and sophisticated. However, criminals and attackers have devised means for exploiting the vulnerabilities or sophistication of these devices to carry out malicious activities in unprecedented ways. Their belief is that electronic crimes can be committed without identities being revealed or trails being established. Several applications of artificial intelligence (AI) have demonstrated interesting and promising solutions to seemingly intractable societal challenges. This thesis aims to advance the concept of applying AI techniques in digital forensic investigation. Our approach involves experimenting with a complex case scenario in which suspects corresponded by e-mail and deleted, suspiciously, certain communications, presumably to conceal evidence. The purpose is to demonstrate the efficacy of Artificial Neural Networks (ANN) in learning and detecting communication patterns over time, and then predicting the possibility of missing communication(s) along with potential topics of discussion. To do this, we developed a novel approach and included other existing models. The accuracy of our results is evaluated, and their performance on previously unseen data is measured. Second, we proposed conceptualizing the term “Digital Forensics AI” (DFAI) to formalize the application of AI in digital forensics. The objective is to highlight the instruments that facilitate the best evidential outcomes and presentation mechanisms that are adaptable to the probabilistic output of AI models. Finally, we enhanced our notion in support of the application of AI in digital forensics by recommending methodologies and approaches for bridging trust gaps through the development of interpretable models that facilitate the admissibility of digital evidence in legal proceedings.
Resumo:
Mine drainage is an important environmental disturbance that affects the chemical and biological components in natural resources. However, little is known about the effects of neutral mine drainage on the soil bacteria community. Here, a high-throughput 16S rDNA pyrosequencing approach was used to evaluate differences in composition, structure, and diversity of bacteria communities in samples from a neutral drainage channel, and soil next to the channel, at the Sossego copper mine in Brazil. Advanced statistical analyses were used to explore the relationships between the biological and chemical data. The results showed that the neutral mine drainage caused changes in the composition and structure of the microbial community, but not in its diversity. The Deinococcus/Thermus phylum, especially the Meiothermus genus, was in large part responsible for the differences between the communities, and was positively associated with the presence of copper and other heavy metals in the environmental samples. Other important parameters that influenced the bacterial diversity and composition were the elements potassium, sodium, nickel, and zinc, as well as pH. The findings contribute to the understanding of bacterial diversity in soils impacted by neutral mine drainage, and demonstrate that heavy metals play an important role in shaping the microbial population in mine environments.
Resumo:
Acid drainage influence on the water and sediment quality was investigated in a coal mining area (southern Brazil). Mine drainage showed pH between 3.2 and 4.6 and elevated concentrations of sulfate, As and metals, of which, Fe, Mn and Zn exceeded the limits for the emission of effluents stated in the Brazilian legislation. Arsenic also exceeded the limit, but only slightly. Groundwater monitoring wells from active mines and tailings piles showed pH interval and chemical concentrations similar to those of mine drainage. However, the river and ground water samples of municipal public water supplies revealed a pH range from 7.2 to 7.5 and low chemical concentrations, although Cd concentration slightly exceeded the limit adopted by Brazilian legislation for groundwater. In general, surface waters showed large pH range (6 to 10.8), and changes caused by acid drainage in the chemical composition of these waters were not very significant. Locally, acid drainage seemed to have dissolved carbonate rocks present in the local stratigraphic sequence, attenuating the dispersion of metals and As. Stream sediments presented anomalies of these elements, which were strongly dependent on the proximity of tailings piles and abandoned mines. We found that precipitation processes in sediments and the dilution of dissolved phases were responsible for the attenuation of the concentrations of the metals and As in the acid drainage and river water mixing zone. In general, a larger influence of mining activities on the chemical composition of the surface waters and sediments was observed when enrichment factors in relation to regional background levels were used.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.
Resumo:
Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
To assess the completeness and reliability of the Information System on Live Births (Sinasc) data. A cross-sectional analysis of the reliability and completeness of Sinasc's data was performed using a sample of Live Birth Certificate (LBC) from 2009, related to births from Campinas, Southeast Brazil. For data analysis, hospitals were grouped according to category of service (Unified National Health System, private or both), 600 LBCs were randomly selected and the data were collected in LBC-copies through mothers and newborns' hospital records and by telephone interviews. The completeness of LBCs was evaluated, calculating the percentage of blank fields, and the LBCs agreement comparing the originals with the copies was evaluated by Kappa and intraclass correlation coefficients. The percentage of completeness of LBCs ranged from 99.8%-100%. For the most items, the agreement was excellent. However, the agreement was acceptable for marital status, maternal education and newborn infants' race/color, low for prenatal visits and presence of birth defects, and very low for the number of deceased children. The results showed that the municipality Sinasc is reliable for most of the studied variables. Investments in training of the professionals are suggested in an attempt to improve system capacity to support planning and implementation of health activities for the benefit of maternal and child population.
Resumo:
Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.
Resumo:
Patients with obstructive sleep apnea syndrome usually present with changes in upper airway morphology and/or body fat distribution, which may occur throughout life and increase the severity of obstructive sleep apnea syndrome with age. To correlate cephalometric and anthropometric measures with obstructive sleep apnea syndrome severity in different age groups. A retrospective study of cephalometric and anthropometric measures of 102 patients with obstructive sleep apnea syndrome was analyzed. Patients were divided into three age groups (≥20 and <40 years, ≥40 and <60 years, and ≥60 years). Pearson's correlation was performed for these measures with the apnea-hypopnea index in the full sample, and subsequently by age group. The cephalometric measures MP-H (distance between the mandibular plane and the hyoid bone) and PNS-P (distance between the posterior nasal spine and the tip of the soft palate) and the neck and waist circumferences showed a statistically significant correlation with apnea-hypopnea index in both the full sample and in the ≥40 and <60 years age group. These variables did not show any significant correlation with the other two age groups (<40 and ≥60 years). Cephalometric measurements MP-H and PNS-P and cervical and waist circumferences correlated with obstructive sleep apnea syndrome severity in patients in the ≥40 and <60 age group.