893 resultados para data driven approach


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this work, we further extend the recently developed adaptive data analysis method, the Sparse Time-Frequency Representation (STFR) method. This method is based on the assumption that many physical signals inherently contain AM-FM representations. We propose a sparse optimization method to extract the AM-FM representations of such signals. We prove the convergence of the method for periodic signals under certain assumptions and provide practical algorithms specifically for the non-periodic STFR, which extends the method to tackle problems that former STFR methods could not handle, including stability to noise and non-periodic data analysis. This is a significant improvement since many adaptive and non-adaptive signal processing methods are not fully capable of handling non-periodic signals. Moreover, we propose a new STFR algorithm to study intrawave signals with strong frequency modulation and analyze the convergence of this new algorithm for periodic signals. Such signals have previously remained a bottleneck for all signal processing methods. Furthermore, we propose a modified version of STFR that facilitates the extraction of intrawaves that have overlaping frequency content. We show that the STFR methods can be applied to the realm of dynamical systems and cardiovascular signals. In particular, we present a simplified and modified version of the STFR algorithm that is potentially useful for the diagnosis of some cardiovascular diseases. We further explain some preliminary work on the nature of Intrinsic Mode Functions (IMFs) and how they can have different representations in different phase coordinates. This analysis shows that the uncertainty principle is fundamental to all oscillating signals.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

O presente trabalho utilizou métodos multivariados e matemáticos para integrar dados químicos e ecotoxicológicos obtidos para o Sistema Estuarino de Santos e para a região próxima à zona de lançamento do emissário submarino de Santos, com a finalidade de estabelecer com maior exatidão os riscos ambientais, e assim identificar áreas prioritárias e orientar programas de controle e políticas públicas. Para ambos os conjuntos de dados, as violações de valores numéricos de qualidade de sedimento tenderam a estar associadas com a ocorrência de toxicidade. Para o estuário, essa tendência foi corroborada pelas correlações entre a toxicidade e as concentrações de HPAs e Cu, enquanto para a região do emissário, pela correlação entre toxicidade e conteúdo de mercúrio no sedimento. Valores normalizados em relação às medias foram calculados para cada amostra, permitindo classificá-las de acordo com a toxicidade e a contaminação. As análises de agrupamento confirmaram os resultados das classificações. Para os dados de sistema estuarino, houve a separação das amostras em três categorias: as estações SSV-2, SSV-3 e SSV-4 encontram-se sob maior risco, seguidas da estação SSV-6. As estações SSV-1 e SSV-5 demonstraram melhores condições. Já em relação ao emissário, as amostras 1 e 2 apresentaram melhores condições, enquanto a estação 5 pareceu apresentar um maior risco, seguida das estações 3 e 4 que tiveram apenas alguns indícios de alteração.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Tourist accommodation expenditure is a widely investigated topic as it represents a major contribution to the total tourist expenditure. The identification of the determinant factors is commonly based on supply-driven applications while little research has been made on important travel characteristics. This paper proposes a demand-driven analysis of tourist accommodation price by focusing on data generated from room bookings. The investigation focuses on modeling the relationship between key travel characteristics and the price paid to book the accommodation. To accommodate the distributional characteristics of the expenditure variable, the analysis is based on the estimation of a quantile regression model. The findings support the econometric approach used and enable the elaboration of relevant managerial implications.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity. We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The uncertainty of the future of a firm has to be modelled and incorporated into the evaluation of companies outside their explicit period of analysis, i.e., in the continuing or terminal value considered within valuation models. However, there is a multiplicity of factors that influence the continuing value of businesses which are not currently being considered within valuation models. In fact, ignoring these factors may cause significant errors of judgment, which can lead models to values of goodwill or badwill, far from the substantial value of the inherent assets. Consequently, these results provided will be markedly different from market values. So, why not consider alternative models incorporating life expectancy of companies, as well as the influence of other attributes of the company in order to get a smoother adjustment between market price and valuation methods? This study aims to provide a contribution towards this area, having as its main objective the analysis of potential determinants of firm value in the long term. Using a sample of 714 listed companies, belonging to 15 European countries, and a panel data for the period between 1992 and 2011, our results show that continuing value cannot be regarded as the current value of a constant or growth perpetuity of a particular attribute of the company, but instead be according to a set of attributes such as free cash flow, net income, the average life expectancy of the company, investment in R&D, capabilities and quality of management, liquidity and financing structure.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The high cost of maize in Kenya is basically driven by East African regional commodity demand forces and agricultural drought. The production of maize, which is a common staple food in Kenya, is greatly affected by agricultural drought. However, calculations of drought risk and impact on maize production in Kenya is limited by the scarcity of reliable rainfall data. The objective of this study was to apply a novel hyperspectral remote sensing method to modelling temporal fluctuations of maize production and prices in five markets in Kenya. SPOT-VEGETATION NDVI time series were corrected for seasonal effects by computing the standardized NDVI anomalies. The maize residual price time series was further related to the NDVI seasonal anomalies using a multiple linear regression modelling approach. The result shows a moderately strong positive relationship (0.67) between residual price series and global maize prices. Maize prices were high during drought periods (i.e. negative NDVI anomalies) and low during wet seasons (i.e. positive NDVI anomalies). This study concludes that NDVI is a good index for monitoring the evolution of maize prices and food security emergency planning in Kenya. To obtain a very strong correlation for the relationship between the wholesale maize price and the global maize price, future research could consider adding other price-driving factors into the regression models.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Understanding the fluctuations in population abundance is a central question in fisheries. Sardine fisheries is of great importance to Portugal and is data-rich and of primary concern to fisheries managers. In Portugal, sub-stocks of Sardina pilchardus (sardine) are found in different regions: the Northwest (IXaCN), Southwest (IXaCS) and the South coast (IXaS-Algarve). Each of these sardine sub-stocks is affected differently by a unique set of climate and ocean conditions, mainly during larval development and recruitment, which will consequently affect sardine fisheries in the short term. Taking this hypothesis into consideration we examined the effects of hydrographic (river discharge), sea surface temperature, wind driven phenomena, upwelling, climatic (North Atlantic Oscillation) and fisheries variables (fishing effort) on S. pilchardus catch rates (landings per unit effort, LPUE, as a proxy for sardine biomass). A 20-year time series (1989-2009) was used, for the different subdivisions of the Portuguese coast (sardine sub-stocks). For the purpose of this analysis a multi-model approach was used, applying different time series models for data fitting (Dynamic Factor Analysis, Generalised Least Squares), forecasting (Autoregressive Integrated Moving Average), as well as Surplus Production stock assessment models. The different models were evaluated, compared and the most important variables explaining changes in LPUE were identified. The type of relationship between catch rates of sardine and environmental variables varied across regional scales due to region-specific recruitment responses. Seasonality plays an important role in sardine variability within the three study regions. In IXaCN autumn (season with minimum spawning activity, larvae and egg concentrations) SST, northerly wind and wind magnitude were negatively related with LPUE. In IXaCS none of the explanatory variables tested was clearly related with LPUE. In IXaS-Algarve (South Portugal) both spring (period when large abundances of larvae are found) northerly wind and wind magnitude were negatively related with LPUE, revealing that environmental effects match with the regional peak in spawning time. Overall, results suggest that management of small, short-lived pelagic species, such as sardine quotas/sustainable yields, should be adapted to a regional scale because of regional environmental variability.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The fisheries for mackerel scad, Decapterus macarellus, are particularly important in Cape Verde, constituting almost 40% of total catches at the peak of the fishery in 1997 and 1998 ( 3700 tonnes). Catches have been stable at a much lower level of about 2 100 tonnes in recent years. Given the importance of mackerel scad in terms of catch weight and local food security, there is an urgent need for updated assessment. Stock assessment was carried out using a Bayesian approach to biomass dynamic modelling. In order to tackle the problem of a non-informative CPUE series, the intrinsic rate of increase, r, was estimated separately, and the ratio B-0/X, initial biomass relative to carrying capacity, was assumed based on available information. The results indicated that the current level of fishing is sustainable. The probability of collapse is low, particularly in the short-term, and it is likely that biomass may increase further above B-msy, indicating a healthy stock level. It would appear that it is relatively safe to increase catches even up to 4000 tonnes. However, the marginal posterior of r was almost identical to the prior, indicating that there is relatively low information content in CPUE. This was also the case in relation to B-0/X There have been substantial increases in fishing efficiency, which have not been adequately captured by the measure used for effort (days or trips), implying that the results may be overly optimistic and should be considered preliminary. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The amount of data collected from an individual player during a football match has increased significantly in recent years, following technological evolution in positional tracking. However, given the short time that separates competitions, the common analysis of these data focuses on the magnitude of actions of each player, while considering either technical or physical perform- ance. This focus leads to a considerable amount of information not being taken into account in performance optimization, particularly while considering a sequence of different matches of the same team. In this presentation, we will present a tactical performance indicator that considers players’ overall positioning and their level of coordination during the match. This performance indicator will be applied in different time scales, with a particular focus on possible practical applications.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Underactuated cable-driven parallel robots (UACDPRs) shift a 6-degree-of-freedom end-effector (EE) with fewer than 6 cables. This thesis proposes a new automatic calibration technique that is applicable for under-actuated cable-driven parallel robots. The purpose of this work is to develop a method that uses free motion as an exciting trajectory for the acquisition of calibration data. The key point of this approach is to find a relationship between the unknown parameters to be calibrated (the lengths of the cables) and the parameters that could be measured by sensors (the swivel pulley angles measured by the encoders and roll-and-pitch angles measured by inclinometers on the platform). The equations involved are the geometrical-closure equations and the finite-difference velocity equations, solved using the least-squares algorithm. Simulations are performed on a parallel robot driven by 4 cables for validation. The final purpose of the calibration method is, still, the determination of the platform initial pose. As a consequence of underactuation, the EE is underconstrained and, for assigned cable lengths, the EE pose cannot be obtained by means of forward kinematics only. Hence, a direct-kinematics algorithm for a 4-cable UACDPR using redundant sensor measurements is proposed. The proposed method measures two orientation parameters of the EE besides cable lengths, in order to determine the other four pose variables, namely 3 position coordinates and one additional orientation parameter. Then, we study the performance of the direct-kinematics algorithm through the computation of the sensitivity of the direct-kinematics solution to measurement errors. Furthermore, position and orientation error upper limits are computed for bounded cable lengths errors resulting from the calibration procedure, and roll and pitch angles errors which are due to inclinometer inaccuracies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This PhD was driven by an interest for inclusive and participatory approaches. The methodology that bridges science and society is known as 'citizen science' and is experiencing a huge upsurge worldwide, in the scientific and humanities fields. In this thesis, I have focused on three topics: i) assessing the reliability of data collected by volunteers; ii) evaluating the impact of environmental education activities in tourist facilities; and iii) monitoring marine biodiversity through citizen science. In addition to these topics, during my research stay abroad, I developed a questionnaire to investigate people's perceptions of natural areas to promote the implementation of co-management. The results showed that volunteers are not only able to collect sufficiently reliable data, but that during their participation in this type of project, they can also increase their knowledge of marine biology and ecology and their awareness of the impact of human behaviour on the environment. The short-term analysis has shown that volunteers are able to retain what they have learned. In the long term, knowledge is usually forgotten, but awareness is retained. Increased awareness could lead to a change in behaviour and in this case a more environmentally friendly attitude. This aspect could be of interest for the development of environmental education projects in tourism facilities to reduce the impact of tourism on the environment while adding a valuable service to the tourism offer. We also found that nature experiences in childhood are important to connect to nature in adulthood. The results also suggest that membership or volunteering in an environmental education association could be a predictor of people's interest in more participatory approaches to nature management. In most cases, the COVID -19 pandemic had not changed participants' perceptions of the natural environment.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation proposes an analysis of the governance of the European scientific research, focusing on the emergence of the Open Science paradigm: a new way of doing science, oriented towards the openness of every phase of the scientific research process, able to take full advantage of the digital ICTs. The emergence of this paradigm is relatively recent, but in the last years it has become increasingly relevant. The European institutions expressed a clear intention to embrace the Open Science paradigm (eg., think about the European Open Science Cloud, EOSC; or the establishment of the Horizon Europe programme). This dissertation provides a conceptual framework for the multiple interventions of the European institutions in the field of Open Science, addressing the major legal challenges of its implementation. The study investigates the notion of Open Science, proposing a definition that takes into account all its dimensions related to the human and fundamental rights framework in which Open Science is grounded. The inquiry addresses the legal challenges related to the openness of research data, in light of the European Open Data framework and the impact of the GDPR on the context of Open Science. The last part of the study is devoted to the infrastructural dimension of the Open Science paradigm, exploring the e-infrastructures. The focus is on a specific type of computational infrastructure: the High Performance Computing (HPC) facility. The adoption of HPC for research is analysed from the European perspective, investigating the EuroHPC project, and the local perspective, proposing the case study of the HPC facility of the University of Luxembourg, the ULHPC. This dissertation intends to underline the relevance of the legal coordination approach, between all actors and phases of the process, in order to develop and implement the Open Science paradigm, adhering to the underlying human and fundamental rights.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this work an Underactuated Cable-Driven Parallel Robot (UACDPR) that operates in the three dimensional Euclidean space is considered. The End-Effector has 6 degrees of freedom and is actuated by 4 cables, therefore from a mechanical point of view the robot is defined underconstrained. However, considering only three controlled pose variables, the degree of redundancy for the control theory can be considered one. The aim of this thesis is to design a feedback controller for a point-to-point motion that satisfies the transient requirements, and is capable of reducing oscillations that derive from the reduced number of constraints. A force control is chosen for the positioning of the End-Effector, and error with respect to the reference is computed through data measure of several sensors (load cells, encoders and inclinometers) such as cable lengths, tension and orientation of the platform. In order to express the relation between pose and cable tension, the inverse model is derived from the kinematic and dynamic model of the parallel robot. The intrinsic non-linear nature of UACDPRs systems introduces an additional level of complexity in the development of the controller, as a result the control law is composed by a partial feedback linearization, and damping injection to reduce orientation instability. The fourth cable allows to satisfy a further tension distribution constraint, ensuring positive tension during all the instants of motion. Then simulations with different initial conditions are presented in order to optimize control parameters, and lastly an experimental validation of the model is carried out, the results are analysed and limits of the presented approach are defined.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Xanthomonas citri subsp. citri (X. citri) is the causative agent of the citrus canker, a disease that affects several citrus plants in Brazil and across the world. Although many studies have demonstrated the importance of genes for infection and pathogenesis in this bacterium, there are no data related to phosphate uptake and assimilation pathways. To identify the proteins that are involved in the phosphate response, we performed a proteomic analysis of X. citri extracts after growth in three culture media with different phosphate concentrations. Using mass spectrometry and bioinformatics analysis, we showed that X. citri conserved orthologous genes from Pho regulon in Escherichia coli, including the two-component system PhoR/PhoB, ATP binding cassette (ABC transporter) Pst for phosphate uptake, and the alkaline phosphatase PhoA. Analysis performed under phosphate starvation provided evidence of the relevance of the Pst system for phosphate uptake, as well as both periplasmic binding proteins, PhoX and PstS, which were formed in high abundance. The results from this study are the first evidence of the Pho regulon activation in X. citri and bring new insights for studies related to the bacterial metabolism and physiology. Biological significance Using proteomics and bioinformatics analysis we showed for the first time that the phytopathogenic bacterium X. citri conserves a set of proteins that belong to the Pho regulon, which are induced during phosphate starvation. The most relevant in terms of conservation and up-regulation were the periplasmic-binding proteins PstS and PhoX from the ABC transporter PstSBAC for phosphate, the two-component system composed by PhoR/PhoB and the alkaline phosphatase PhoA.