991 resultados para log analysis
Resumo:
The South Carolina Department of Consumer Affairs publishes an annual mortgage log report as a requirement of the South Carolina Mortgage Lending Act, which became effective on January 1, 2010. The mortgage log report analyzes the following data, concerning all mortgage loan applications taken: the borrower’s credit score, term of the loan, annual percentage rate, type of rate, and appraised value of the property. The mortgage log report also analyzes data required by the Home Mortgage Disclosure Act, including the following information: the loan type, property type, purpose of the loan, owner/occupancy status, loan amount, action taken, reason for denial, property location, gross annual income, purchaser of the loan, rate spread, HOEPA status, and lien status as well as the applicant and co-applicant’s race, ethnicity, and gender.
Resumo:
La aplicación Log2XML tiene como objeto principal la transformación de archivos log en formato texto con separador de campos a un formato XML estandarizado. Para permitir que la aplicación pueda trabajar con logs de diferentes sistemas o aplicaciones, dispone de un sistema de plantillas (indicación de orden de campos y carácter separador) que permite definir la estructura mínima para poder extraer la información de cualquier tipo de log que se base en separadores de campo. Por último, la aplicación permite el procesamiento de la información extraída para la generación de informes y estadísticas.Por otro lado, en el proyecto se profundiza en la tecnología Grails.
Resumo:
L'esperimento ATLAS, come gli altri esperimenti che operano al Large Hadron Collider, produce Petabytes di dati ogni anno, che devono poi essere archiviati ed elaborati. Inoltre gli esperimenti si sono proposti di rendere accessibili questi dati in tutto il mondo. In risposta a questi bisogni è stato progettato il Worldwide LHC Computing Grid che combina la potenza di calcolo e le capacità di archiviazione di più di 170 siti sparsi in tutto il mondo. Nella maggior parte dei siti del WLCG sono state sviluppate tecnologie per la gestione dello storage, che si occupano anche della gestione delle richieste da parte degli utenti e del trasferimento dei dati. Questi sistemi registrano le proprie attività in logfiles, ricchi di informazioni utili agli operatori per individuare un problema in caso di malfunzionamento del sistema. In previsione di un maggiore flusso di dati nei prossimi anni si sta lavorando per rendere questi siti ancora più affidabili e uno dei possibili modi per farlo è lo sviluppo di un sistema in grado di analizzare i file di log autonomamente e individuare le anomalie che preannunciano un malfunzionamento. Per arrivare a realizzare questo sistema si deve prima individuare il metodo più adatto per l'analisi dei file di log. In questa tesi viene studiato un approccio al problema che utilizza l'intelligenza artificiale per analizzare i logfiles, più nello specifico viene studiato l'approccio che utilizza dell'algoritmo di clustering K-means.
Resumo:
O presente estudo centra‐se na conceptualização de pormenor do modelo de circulação do aquífero termomineral de Chaves na sua zona de descarga no campo hidromineral e geotérmico de Chaves (Norte de Portugal). Para tal, utilizam‐se as informações resultantes da bibliografia disponível, da construção das captações de água termomineral AC1 e AC2, já existentes, e, principalmente, as informações obtidas a partir dos trabalhos de pesquisa, prospecção e construção da captação CC3, obra desenvolvida no ano de 2014 para reforço de caudal das Caldas de Chaves. Incidindo fundamentalmente na análise de perfis geofísicos, de parâmetros de perfuração, “cuttings” e diagrafias diferidas inerentes à sondagem de pesquisa CC3, e da sua posterior transformação em captação, integrando‐se posteriormente toda a informação daí resultante, o estudo culmina na apresentação de um modelo conceptual de pormenor para a circulação de águas tipo “Chaves” na zona de descarga que permite, com considerável robustez, à luz dos dados conhecidos à presente data, o melhor conhecimento do aquífero para futuros projectos de desenvolvimento dos recursos hidrominerais e geotérmicos de Chaves.
Resumo:
During the past decades testing has matured from ad-hoc activity into being an integral part of the development process. The benefits of testing are obvious for modern communication systems, which operate in heterogeneous environments amongst devices from various manufacturers. The increased demand for testing also creates demand for tools and technologies that support and automate testing activities. This thesis discusses applicability of visualization techniques in the result analysis part of the testing process. Particularly, the primary focus of this work is visualization of test execution logs produced by a TTCN-3 test system. TTCN-3 is an internationally standardized test specification and implementation language. The TTCN-3 standard suite includes specification of a test logging interface and a graphical presentation format, but no immediate relationship between them. This thesis presents a technique for mapping the log events to the graphical presentation format along with a concrete implementation, which is integrated with the Eclipse Platform and the OpenTTCN Tester toolchain. Results of this work indicate that for majority of the log events, a visual representation may be derived from the TTCN-3 standard suite. The remaining events were analysed and three categories relevant in either log analysis or implementation of the visualization tool were identified: events indicating insertion of something into the incoming queue of a port, events indicating a mismatch and events describing the control flow during the execution. Applicability of the results is limited into the domain of TTCN-3, but the developed mapping and the implementation may be utilized with any TTCN-3 tool that is able to produce the execution log in the standardized XML format.
Resumo:
Analysis of firewall and antivirus log files without any kind of log analysis tool could be very difficult for normal computer user. In log files every event is organized according to time, but reading those with understanding without any kind of log analysis tool requires expert knowledge. In this Bachelor’s Thesis I put together a software packet for normal private computer user and this software packet allows user to analyze log files in Windows environment without any additional effort. Most of the private computer users don’t have much of experience about computers and data security so this Bachelor’s Thesis can be also used as a manual for analysis tool used in this work.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Tässä työssä käsitellään kävijäseurannan menetelmiä ja toteutetaan niitä käytännössä. Web-analytiikkaohjelmistojen toimintaan tutustutaan, pääasiassa keskittyen Google Analyticsiin. Tavoitteena on selvittää Lappeenrannan matkailulaitepäätteiden käyttömääriä ja eriyttää niitä laitekohtaisesti. Web-analytiikasta tehdään kirjallisuuskatsaus ja kävijäseurantadataa analysoidaan sekä vertaillaan kahdesta eri verkkosivustosta. Lisäksi matkailulaitepäätteiden verkkosivuston lokeja tarkastellaan tiedonlouhinnan keinoin tarkoitusta varten kehitetyllä Python-sovelluksella. Työn pohjalta voidaan todeta, ettei matkailulaitepäätteiden käyttömääriä voida nykyisen toteutuksen perusteella eriyttää laitekohtaisesti. Istuntojen määrää ja tapahtumia voidaan kuitenkin seurata. Matkailulaitepäätteiden kävijäseurannassa tunnistetaan useita ongelmia, kuten päätteiden automaattisen verkkosivunpäivityksen tuloksia vääristävä vaikutus, osittainen Google Analytics -integraatio ja tärkeimpänä päätteen yksilöivän tunnistetiedon puuttuminen. Työssä ehdotetaan ratkaisuja, joilla mahdollistetaan kävijäseurannan tehokas käyttö ja laitekohtainen seuranta. Saadut tulokset korostavat kävijäseurannan toteutuksen suunnitelmallisuuden tärkeyttä.
Resumo:
In spite of significant study and exploration of Potiguar Basin, easternmost Brazilian equatorial margin, by the oil industry, its still provides an interesting discussion about its origin and the mechanisms of hydrocarbon trapping. The mapping and interpretation of 3D seismic reflection data of Baixa Grande Fault, SW portion of Umbuzeiro Graben, points as responsible for basin architecture configuration an extensional deformational process. The fault geometry is the most important deformation boundary condition of the rift stata. The development of flat-ramp geometries is responsible for the formation of important extensional anticline folds, many of then hydrocarbon traps in this basin segment. The dominant extensional deformation in the studied area, marked by the development of normal faults developments, associated with structures indicative of obliquity suggests variations on the former regime of Potiguar Basin through a multiphase process. The changes in structural trend permits the generation of local transpression and transtension zones, which results in a complex deformation pattern displayed by the Potiguar basin sin-rift strata. Sismostratigraphic and log analysis show that the Baixa Grande Fault acts as listric growing fault at the sedimentation onset. The generation of a relay ramp between Baixa Grande Fault and Carnaubais Fault was probably responsible for the balance between subsidence and sedimentary influx taxes, inhibiting its growing behaviour. The sismosequences analysis s indicates that the extensional folds generation its diachronic, and then the folds can be both syn- and post-depositional
Resumo:
A ferramenta de propagação eletromagnética (EPT) fornece o tempo de propagação (Tpl) e a atenuação (A) de uma onda eletromagnética que se propaga num meio com perdas. Estas respostas da EPT são funções da permissividade dielétrica do meio. Existem vários modelos e fórmulas de misturas sobre a permissividade dielétrica de rochas reservatório que podem ser utilizados na interpretação da ferramenta de alta frequência. No entanto, as fórmulas de mistura não consideram a distribuição e a geometria do espaço poroso, e estes parâmetros são essenciais para que sejam obtidas respostas dielétricas mais próximas de uma rocha real. Foi selecionado um modelo baseado nos parâmetros descritos acima e este foi aplicado à dados dielétricos disponíveis na literatura. Foi obtida uma boa concordância entre as curvas teóricas e os dados experimentais, comprovando assim que a distribuição e a geometria dos poros têm que ser levadas em conta no desenvolvimento de um modelo realista. Foram conseguidas também funções de distribuição de razão de aspecto de poros, através das quais geramos várias curvas relacionando as respostas da EPT com diversas saturações de óleo/gás. Estas curvas foram aplicadas na análise de perfis. Como o modelo selecionado ajusta-se bem aos dados dielétricos disponíveis na literatura, torna-se atraente aplicá-lo à dados experimentais obtidos em rochas de campos brasileiros produtores de hidrocarbonetos para interpretação da EPT corrida em poços destes campos petrolíferos.
Resumo:
The textural and compositional characteristics of the 400 m sequence of Pleistocene wackestones and packstones intersected at Ocean Drilling Program (ODP) Site 820 reflect deposition controlled by fluctuations in sea-level, and by variations in the rate of sediment supply. The development of an effective reefal barrier adjacent to Site 820, between 760 k.y. and 1.01 Ma, resulted in a marked reduction in sediment accumulation rates on the central Great Barrier Reef outermost shelf and upper slope. This marked change corresponds with the transition from sigmoidal prograding seismic geometry in the lower 254 m of the sequence, to aggradational geometry in the top 146 m. The reduction in the rate of sediment accumulation that followed development of the reefal barrier also caused a fundamental change in the way in which fluctuations in sea-level controlled sediment deposition. In the lower, progradational portion of the sequence, sea-level cyclicity is represented by superimposed coarsening-upward cycles. Although moderately calcareous throughout (mostly 35%-75% CaCO3), the depositional system acted in a similar manner to siliciclastic shelf depositional systems. Relative sea-level rises resulted in deposition of more condensed, less calcareous, fine, muddy wackestones at the base of each cycle. Sea-level highstands resulted in increased sedimentation rates and greater influx of coarse bioclastic material. Continued high rates of sedimentation of both coarse bioclastic material and mixed carbonate and terrigenous mud marked falling and low sea-levels. This lower part of the sequence therefore is dominated by coarse packstones, with only thin wackestone intervals representing transgressions. In contrast, sea-level fluctuations following formation of an effective reefal barrier produced a markedly different sedimentary record. The more slowly deposited aggradational sequence is characterized by discrete thin interbeds of relatively coarse packstone within a predominantly fine wackestone sequence. These thin packstone beds resulted from relatively low sedimentation rates during falling and low sea-levels, with much higher rates of muddy sediment accumulation during rising and high sea-levels. The transition from progradational to aggradational sequence geometry therefore corresponds to a transition from a "siliciclastic-type" to a "carbonate-type" depositional system.
Collection-Level Subject Access in Aggregations of Digital Collections: Metadata Application and Use
Resumo:
Problems in subject access to information organization systems have been under investigation for a long time. Focusing on item-level information discovery and access, researchers have identified a range of subject access problems, including quality and application of metadata, as well as the complexity of user knowledge required for successful subject exploration. While aggregations of digital collections built in the United States and abroad generate collection-level metadata of various levels of granularity and richness, no research has yet focused on the role of collection-level metadata in user interaction with these aggregations. This dissertation research sought to bridge this gap by answering the question “How does collection-level metadata mediate scholarly subject access to aggregated digital collections?” This goal was achieved using three research methods: • in-depth comparative content analysis of collection-level metadata in three large-scale aggregations of cultural heritage digital collections: Opening History, American Memory, and The European Library • transaction log analysis of user interactions, with Opening History, and • interview and observation data on academic historians interacting with two aggregations: Opening History and American Memory. It was found that subject-based resource discovery is significantly influenced by collection-level metadata richness. The richness includes such components as: 1) describing collection’s subject matter with mutually-complementary values in different metadata fields, and 2) a variety of collection properties/characteristics encoded in the free-text Description field, including types and genres of objects in a digital collection, as well as topical, geographic and temporal coverage are the most consistently represented collection characteristics in free-text Description fields. Analysis of user interactions with aggregations of digital collections yields a number of interesting findings. Item-level user interactions were found to occur more often than collection-level interactions. Collection browse is initiated more often than search, while subject browse (topical and geographic) is used most often. Majority of collection search queries fall within FRBR Group 3 categories: object, concept, and place. Significantly more object, concept, and corporate body searches and less individual person, event and class of persons searches were observed in collection searches than in item searches. While collection search is most often satisfied by Description and/or Subjects collection metadata fields, it would not retrieve a significant proportion of collection records without controlled-vocabulary subject metadata (Temporal Coverage, Geographic Coverage, Subjects, and Objects), and free-text metadata (the Description field). Observation data shows that collection metadata records in Opening History and American Memory aggregations are often viewed. Transaction log data show a high level of engagement with collection metadata records in Opening History, with the total page views for collections more than 4 times greater than item page views. Scholars observed viewing collection records valued descriptive information on provenance, collection size, types of objects, subjects, geographic coverage, and temporal coverage information. They also considered the structured display of collection metadata in Opening History more useful than the alternative approach taken by other aggregations, such as American Memory, which displays only the free-text Description field to the end-user. The results extend the understanding of the value of collection-level subject metadata, particularly free-text metadata, for the scholarly users of aggregations of digital collections. The analysis of the collection metadata created by three large-scale aggregations provides a better understanding of collection-level metadata application patterns and suggests best practices. This dissertation is also the first empirical research contribution to test the FRBR model as a conceptual and analytic framework for studying collection-level subject access.
Resumo:
Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.
Resumo:
Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.
Resumo:
In this paper, we compare three residuals to assess departures from the error assumptions as well as to detect outlying observations in log-Burr XII regression models with censored observations. These residuals can also be used for the log-logistic regression model, which is a special case of the log-Burr XII regression model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the modified martingale-type residual in log-Burr XII regression models with censored data.