821 resultados para event sequences
Resumo:
Pattern discovery in temporal event sequences is of great importance in many application domains, such as telecommunication network fault analysis. In reality, not every type of event has an accurate timestamp. Some of them, defined as inaccurate events may only have an interval as possible time of occurrence. The existence of inaccurate events may cause uncertainty in event ordering. The traditional support model cannot deal with this uncertainty, which would cause some interesting patterns to be missing. A new concept, precise support, is introduced to evaluate the probability of a pattern contained in a sequence. Based on this new metric, we define the uncertainty model and present an algorithm to discover interesting patterns in the sequence database that has one type of inaccurate event. In our model, the number of types of inaccurate events can be extended to k readily, however, at a cost of increasing computational complexity.
Resumo:
Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.
Resumo:
A major task of traditional temporal event sequence mining is to find all frequent event patterns from a long temporal sequence. In many real applications, however, events are often grouped into different types, and not all types are of equal importance. In this paper, we consider the problem of efficient mining of temporal event sequences which lead to an instance of a specific type of event. Temporal constraints are used to ensure sensibility of the mining results. We will first generalise and formalise the problem of event-oriented temporal sequence data mining. After discussing some unique issues in this new problem, we give a set of criteria, which are adapted from traditional data mining techniques, to measure the quality of patterns to be discovered. Finally we present an algorithm to discover potentially interesting patterns.
Resumo:
This Master´s thesis investigates the performance of the Olkiluoto 1 and 2 APROS model in case of fast transients. The thesis includes a general description of the Olkiluoto 1 and 2 nuclear power plants and of the most important safety systems. The theoretical background of the APROS code as well as the scope and the content of the Olkiluoto 1 and 2 APROS model are also described. The event sequences of the anticipated operation transients considered in the thesis are presented in detail as they will form the basis for the analysis of the APROS calculation results. The calculated fast operational transient situations comprise loss-of-load cases and two cases related to a inadvertent closure of one main steam isolation valve. As part of the thesis work, the inaccurate initial data values found in the original 1-D reactor core model were corrected. The input data needed for the creation of a more accurate 3-D core model were defined. The analysis of the APROS calculation results showed that while the main results were in good accordance with the measured plant data, also differences were detected. These differences were found to be caused by deficiencies and uncertainties related to the calculation model. According to the results the reactor core and the feedwater systems cause most of the differences between the calculated and measured values. Based on these findings, it will be possible to develop the APROS model further to make it a reliable and accurate tool for the analysis of the operational transients and possible plant modifications.
Resumo:
We tested amnesic patients, patients with frontal lobe lesions, and control subjects with the deferred imitation task, a nonverbal test used to demonstrate memory abilities in human infants. On day 1, subjects were given sets of objects to obtain a baseline measure of their spontaneous performance of target actions. Then different event sequences were modeled with the object sets. On day 2, the objects were given to the subjects again, first without any instructions to imitate the sequences, and then with explicit instructions to imitate the actions exactly as they had been modeled. Control subjects and frontal lobe patients reproduced the events under both uninstructed and instructed conditions. In contrast, performance by the amnesic patients did not significantly differ from that of a second control group who had the same opportunities to handle the objects but were not shown the modeled actions. These findings suggest that deferred imitation is dependent on the brain structures essential for declarative memory that are damaged in amnesia, and they support the view that infants who imitate actions after long delays have an early capacity for long-term declarative memory.
Resumo:
Two shallow water late Cenomanian to early Turonian sequences of NE Egypt have been investigated to evaluate the response to OAE2. Age control based on calcareous nannoplankton, planktic foraminifera and ammonite biostratigraphies integrated with delta(13)C stratigraphy is relatively good despite low diversity and sporadic occurrences. Planktic and benthic foraminiferal faunas are characterized by dysoxic, brackish and mesotrophic conditions, as indicated by low species diversity, low oxygen and low salinity tolerant planktic and benthic species, along with oyster-rich limestone layers. In these subtidal to inner neritic environments the OAE2 delta(13)C excursion appears comparable and coeval to that of open marine environments. However, in contrast to open marine environments where anoxic conditions begin after the first delta(13)C peak and end at or near the Cenomanian-Turonian boundary, in shallow coastal environments anoxic conditions do not appear until the early Turonian. This delay in anoxia appears to be related to the sea-level transgression that reached its maximum in the early Turonian, as observed in shallow water sections from Egypt to Morocco. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
The Integrated Safety Assessment (ISA) methodology, developed by the Spanish Nuclear Safety Council (CSN), has been applied to a thermo-hydraulical analysis of a Westinghouse 3-loop PWR plant by means of the dynamic event trees (DET) for Steam Generator Tube Rupture (SGTR) sequences. The ISA methodology allows obtaining the SGTR Dynamic Event Tree taking into account the operator actuation times. Simulations are performed with SCAIS (Simulation Code system for Integrated Safety Assessment), which includes a dynamic coupling with MAAP thermal hydraulic code. The results show the capability of the ISA methodology and SCAIS platform to obtain the DET of complex sequences.
Resumo:
Pattern discovery in a long temporal event sequence is of great importance in many application domains. Most of the previous work focuses on identifying positive associations among time stamped event types. In this paper, we introduce the problem of defining and discovering negative associations that, as positive rules, may also serve as a source of knowledge discovery. In general, an event-oriented pattern is a pattern that associates with a selected type of event, called a target event. As a counter-part of previous research, we identify patterns that have a negative relationship with the target events. A set of criteria is defined to evaluate the interestingness of patterns associated with such negative relationships. In the process of counting the frequency of a pattern, we propose a new approach, called unique minimal occurrence, which guarantees that the Apriori property holds for all patterns in a long sequence. Based on the interestingness measures, algorithms are proposed to discover potentially interesting patterns for this negative rule problem. Finally, the experiment is made for a real application.
Resumo:
The evolution of event time and size statistics in two heterogeneous cellular automaton models of earthquake behavior are studied and compared to the evolution of these quantities during observed periods of accelerating seismic energy release Drier to large earthquakes. The two automata have different nearest neighbor laws, one of which produces self-organized critical (SOC) behavior (PSD model) and the other which produces quasi-periodic large events (crack model). In the PSD model periods of accelerating energy release before large events are rare. In the crack model, many large events are preceded by periods of accelerating energy release. When compared to randomized event catalogs, accelerating energy release before large events occurs more often than random in the crack model but less often than random in the PSD model; it is easier to tell the crack and PSD model results apart from each other than to tell either model apart from a random catalog. The evolution of event sizes during the accelerating energy release sequences in all models is compared to that of observed sequences. The accelerating energy release sequences in the crack model consist of an increase in the rate of events of all sizes, consistent with observations from a small number of natural cases, however inconsistent with a larger number of cases in which there is an increase in the rate of only moderate-sized events. On average, no increase in the rate of events of any size is seen before large events in the PSD model.
Resumo:
Many of our everyday tasks require the control of the serial order and the timing of component actions. Using the dynamic neural field (DNF) framework, we address the learning of representations that support the performance of precisely time action sequences. In continuation of previous modeling work and robotics implementations, we ask specifically the question how feedback about executed actions might be used by the learning system to fine tune a joint memory representation of the ordinal and the temporal structure which has been initially acquired by observation. The perceptual memory is represented by a self-stabilized, multi-bump activity pattern of neurons encoding instances of a sensory event (e.g., color, position or pitch) which guides sequence learning. The strength of the population representation of each event is a function of elapsed time since sequence onset. We propose and test in simulations a simple learning rule that detects a mismatch between the expected and realized timing of events and adapts the activation strengths in order to compensate for the movement time needed to achieve the desired effect. The simulation results show that the effector-specific memory representation can be robustly recalled. We discuss the impact of the fast, activation-based learning that the DNF framework provides for robotics applications.
Resumo:
One of the main implications of the efficient market hypothesis (EMH) is that expected future returns on financial assets are not predictable if investors are risk neutral. In this paper we argue that financial time series offer more information than that this hypothesis seems to supply. In particular we postulate that runs of very large returns can be predictable for small time periods. In order to prove this we propose a TAR(3,1)-GARCH(1,1) model that is able to describe two different types of extreme events: a first type generated by large uncertainty regimes where runs of extremes are not predictable and a second type where extremes come from isolated dread/joy events. This model is new in the literature in nonlinear processes. Its novelty resides on two features of the model that make it different from previous TAR methodologies. The regimes are motivated by the occurrence of extreme values and the threshold variable is defined by the shock affecting the process in the preceding period. In this way this model is able to uncover dependence and clustering of extremes in high as well as in low volatility periods. This model is tested with data from General Motors stocks prices corresponding to two crises that had a substantial impact in financial markets worldwide; the Black Monday of October 1987 and September 11th, 2001. By analyzing the periods around these crises we find evidence of statistical significance of our model and thereby of predictability of extremes for September 11th but not for Black Monday. These findings support the hypotheses of a big negative event producing runs of negative returns in the first case, and of the burst of a worldwide stock market bubble in the second example. JEL classification: C12; C15; C22; C51 Keywords and Phrases: asymmetries, crises, extreme values, hypothesis testing, leverage effect, nonlinearities, threshold models
Resumo:
Electron microscopic analysis of heteroduplexes between the most distantly related Xenopus vitellogenin genes (A genes X B genes) has revealed the distribution of homologous regions that have been preferentially conserved after the duplication events that gave rise to the multigene family in Xenopus laevis. DNA sequence analysis was limited to the region downstream of the transcription initiation site of the Xenopus genes A1, B1 and B2 and a comparison with the Xenopus A2 and the major chicken vitellogenin gene is presented. Within the coding regions of the first three exons, nucleotide substitutions resulting in amino acid changes accumulate at a rate similar to that observed in globin genes. This suggests that the duplication event which led to the formation of the A and B ancestral genes in Xenopus laevis occurred about 150 million years ago. Homologous exons of the A1-A2 and B1-B2 gene pairs, which formed about 30 million years ago, show a quite similar sequence divergence. In contrast, A1-A2 homologous introns seem to have evolved much faster than their B1-B2 counterparts.
Resumo:
When dealing with multi-angular image sequences, problems of reflectance changes due either to illumination and acquisition geometry, or to interactions with the atmosphere, naturally arise. These phenomena interplay with the scene and lead to a modification of the measured radiance: for example, according to the angle of acquisition, tall objects may be seen from top or from the side and different light scatterings may affect the surfaces. This results in shifts in the acquired radiance, that make the problem of multi-angular classification harder and might lead to catastrophic results, since surfaces with the same reflectance return significantly different signals. In this paper, rather than performing atmospheric or bi-directional reflection distribution function (BRDF) correction, a non-linear manifold learning approach is used to align data structures. This method maximizes the similarity between the different acquisitions by deforming their manifold, thus enhancing the transferability of classification models among the images of the sequence.
Resumo:
The Borborema Province in northeastern South America is a typical Brasiliano-Pan-African branching system of Neoproterozoic orogens that forms part of the Western Gondwana assembly. The province is positioned between the Sao Luis-West Africa craton to the north and the Sao Francisco (Congo-Kasai) craton to the south. For this province the main characteristics are (a) its subdivision into five major tectonic domains, bounded mostly by long shear zones, as follows: Medio Coreau, Ceara Central, Rio Grande do Norte, Transversal, and Southern; (b) the alternation of supracrustal belts with reworked basement inliers (Archean nuclei + Paleoproterozoic belts); and (c) the diversity of granitic plutonism, from Neoproterozoic to Early Cambrian ages, that affect supracrustal rocks as well as basement inliers. Recently, orogenic rock assemblages of early Tonian (1000-920 Ma) orogenic evolution have been recognized, which are restricted to the Transversal and Southern domains of the Province. Within the Transversal Zone, the Alto Pajeu terrane locally includes some remnants of oceanic crust along with island arc and continental arc rock assemblages, but the dominant supracrustal rocks are mature and immature pelitic metasedimentary and metavolcaniclastic rocks. Contiguous and parallel to the Alto Pajeu terrane, the Riacho Gravata subterrane consists mainly of low-grade metamorphic successions of metarhythmites, some of which are clearly turbiditic in origin, metaconglomerates, and sporadic marbles, along with interbedded metarhyolitic and metadacitic volcanic or metavolcaniclastic rocks. Both terrane and subterrane are cut by syn-contractional intrusive sheets of dominantly peraluminous high-K calc-alkaline, granititic to granodioritic metaplutonic rocks. The geochemical patterns of both supracrustal and intrusive rocks show similarities with associations of mature continental arc volcano-sedimentary sequences, but some subordinate intra-plate characteristics are also found. In both the Alto Pajeu and Riacho Gravata terranes, TIMS and SHRIMP U-Pb isotopic data from zircons from both metavolcanic and metaplutonic rocks yield ages between 1.0 and 0.92 Ga, which define the time span for an event of orogenic character, the Cariris Velhos event. Less extensive occurrences of rocks of Cariris Velhos age are recognized mainly in the southernmost domains of the Province, as for example in the Polo Redondo-Maranco terrane, where arc-affinity migmatite-granitic and meta-volcano-sedimentary rocks show U-Pb ages (SHRIMP data) around 0.98-0.97 Ga. For all these domains, Sm-Nd data exhibit Tom model ages between 1.9 and 1.1 Ga with corresponding slightly negative to slightly positive epsilon(Nd)(t) values. These domains, along with the Borborema Province as a whole, were significantly affected by tectonic and magmatic events of the Brasiliano Cycle (0.7-0.5 Ga), so that it is possible that there are some other early Tonian rock assemblages which were completely masked and hidden by these later Brasiliano events. Cariris Velhos processes are younger than the majority of orogenic systems at the end of Mesoproterozoic Era and beginning of Neoproterozoic throughout the world, e.g. Irumide belt, Kibaride belt and Namaqua-Natal belt, and considerably younger than those of the youngest orogenic process (Ottawan) in the Grenvillian System. Therefore, they were probably not associated with the proposed assembly of Rodinia. We suggest, instead, that Cariris Velhos magmatism and tectonism could have been related to a continental margin magmatic arc, with possible back-arc associations, and that this margin may have been a short-lived (<100 m.y.) leading edge of the newly assembled Rodinia supercontinent. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Intra-and inter-population genetic variability and the demographic history of Heliothis virescens (F.) populations were evaluated by using mtDNA markers (coxI, coxII and nad6) with samples from the major cotton-and soybean-producing regions in Brazil in the growing seasons 2007/08, 2008/09 and 2009/10. AMOVA indicated low and non-significant genetic structure, regardless of geographical scale, growing season or crop, with most of genetic variation occurring within populations. Clustering analyzes also indicated low genetic differentiation. The haplotype network obtained with combined datasets resulted in 35 haplotypes, with 28 exclusive occurrences, four of them sampled only from soybean fields. The minimum spanning network showed star-shaped structures typical of populations that underwent a recent demographic expansion. The recent expansion was supported by other demographic analyzes, such as the Bayesian skyline plot, the unimodal distribution of paired differences among mitochondrial sequences, and negative and significant values of neutrality tests for the Tajima's D and Fu's F-S parameters. In addition, high values of haplotype diversity ((H) over cap) and low values of nucleotide diversity (pi), combined with a high number of low frequency haplotypes and values of theta(pi)<theta(W), suggested a recent demographic expansion of H. virescens populations in Brazil. This demographic event could be responsible for the low genetic structure currently found; however, haplotypes present uniquely at the same geographic regions and from one specific host plant suggest an initial differentiation among H. virescens populations within Brazil.