893 resultados para data driven approach
Resumo:
This article focuses on the deviations from normality of stock returns before and after a financial liberalisation reform, and shows the extent to which inference based on statistical measures of stock market efficiency can be affected by not controlling for breaks. Drawing from recent advances in the econometrics of structural change, it compares the distribution of the returns of five East Asian emerging markets when breaks in the mean and variance are either (i) imposed using certain official liberalisation dates or (ii) detected non-parametrically using a data-driven procedure. The results suggest that measuring deviations from normality of stock returns with no provision for potentially existing breaks incorporates substantial bias. This is likely to severely affect any inference based on the corresponding descriptive or test statistics.
Resumo:
The deployment of bioenergy technologies is a key part of UK and European renewable energy policy. A key barrier to the deployment of bioenergy technologies is the management of biomass supply chains including the evaluation of suppliers and the contracting of biomass. In the undeveloped biomass for energy market buyers of biomass are faced with three major challenges during the development of new bioenergy projects. What characteristics will a certain supply of biomass have, how to evaluate biomass suppliers and which suppliers to contract with in order to provide a portfolio of suppliers that best satisfies the needs of the project and its stakeholder group whilst also satisfying crisp and non-crisp technological constraints. The problem description is taken from the situation faced by the industrial partner in this research, Express Energy Ltd. This research tackles these three areas separately then combines them to form a decision framework to assist biomass buyers with the strategic sourcing of biomass. The BioSS framework. The BioSS framework consists of three modes which mirror the development stages of bioenergy projects. BioSS.2 mode for early stage development, BioSS.3 mode for financial close stage and BioSS.Op for the operational phase of the project. BioSS is formed of a fuels library, a supplier evaluation module and an order allocation module, a Monte-Carlo analysis module is also included to evaluate the accuracy of the recommended portfolios. In each mode BioSS can recommend which suppliers should be contracted with and how much material should be purchased from each. The recommended blend should have chemical characteristics within the technological constraints of the conversion technology and also best satisfy the stakeholder group. The fuels library is made up from a wide variety of sources and contains around 100 unique descriptions of potential biomass sources that a developer may encounter. The library takes a wide data collection approach and has the aim of allowing for estimates to be made of biomass characteristics without expensive and time consuming testing. The supplier evaluation part of BioSS uses a QFD-AHP method to give importance weightings to 27 different evaluating criteria. The evaluating criteria have been compiled from interviews with stakeholders and policy and position documents and the weightings have been assigned using a mixture of workshops and expert interview. The weighted importance scores allow potential suppliers to better tailor their business offering and provides a robust framework for decision makers to better understand the requirements of the bioenergy project stakeholder groups. The order allocation part of BioSS uses a chance-constrained programming approach to assign orders of material between potential suppliers based on the chemical characteristics of those suppliers and the preference score of those suppliers. The optimisation program finds the portfolio of orders to allocate to suppliers to give the highest performance portfolio in the eyes of the stakeholder group whilst also complying with technological constraints. The technological constraints can be breached if the decision maker requires by setting the constraint as a chance-constraint. This allows a wider range of biomass sources to be procured and allows a greater overall performance to be realised than considering crisp constraints or using deterministic programming approaches. BioSS is demonstrated against two scenarios faced by UK bioenergy developers. The first is a large scale combustion power project, the second a small scale gasification project. The Bioss is applied in each mode for both scenarios and is shown to adapt the solution to the stakeholder group importance and the different constraints of the different conversion technologies whilst finding a globally optimal portfolio for stakeholder satisfaction.
Resumo:
This paper emphasizes on the concept of innovation which is more and more nowadays recognized as of significant importance for all companies across different business sectors. The paper initially provides a review of the innovation literature in terms of types, classifications, and sources of innovation that have been proposed over time. Then, innovation in the context of the food industry is examined and it is attempted to identify innovation strategies followed by Greek food companies based on a value driven approach of innovation. The paper finally, provides insights from eight Greek food companies, which were selected from four subsectors: fruit and vegetables, dairy products, meat products (cured meats), and bakery products. The criterion used for the selection was market success and outstanding performance (e.g. market share, achieved results). Evidence indicates that companies tend to innovate along the dimension of offerings, which is more related to the traditional view of product and process innovation.
Resumo:
Different types of ontologies and knowledge or metaknowledge connected to them are considered and analyzed aiming at realization in contemporary information security systems (ISS) and especially the case of intrusion detection systems (IDS) or intrusion prevention systems (IPS). Human-centered methods INCONSISTENCY, FUNNEL, CALEIDOSCOPE and CROSSWORD are algorithmic or data-driven methods based on ontologies. All of them interact on a competitive principle ‘survival of the fittest’. They are controlled by a Synthetic MetaMethod SMM. It is shown that the data analysis frequently needs an act of creation especially if it is applied to knowledge-poor environments. It is shown that human-centered methods are very suitable for resolutions in case, and often they are based on the usage of dynamic ontologies
Resumo:
As torrents of new data now emerge from microbial genomics, bioinformatic prediction of immunogenic epitopes remains challenging but vital. In silico methods often produce paradoxically inconsistent results: good prediction rates on certain test sets but not others. The inherent complexity of immune presentation and recognition processes complicates epitope prediction. Two encouraging developments – data driven artificial intelligence sequence-based methods for epitope prediction and molecular modeling methods based on three-dimensional protein structures – offer hope for the future.
Resumo:
A model of the cognitive process of natural language processing has been developed using the formalism of generalized nets. Following this stage-simulating model, the treatment of information inevitably includes phases, which require joint operations in two knowledge spaces – language and semantics. In order to examine and formalize the relations between the language and the semantic levels of treatment, the language is presented as an information system, conceived on the bases of human cognitive resources, semantic primitives, semantic operators and language rules and data. This approach is applied for modeling a specific grammatical rule – the secondary predication in Russian. Grammatical rules of the language space are expressed as operators in the semantic space. Examples from the linguistics domain are treated and several conclusions for the semantics of the modeled rule are made. The results of applying the information system approach to the language turn up to be consistent with the stages of treatment modeled with the generalized net.
Resumo:
In non-linear random effects some attention has been very recently devoted to the analysis ofsuitable transformation of the response variables separately (Taylor 1996) or not (Oberg and Davidian 2000) from the transformations of the covariates and, as far as we know, no investigation has been carried out on the choice of link function in such models. In our study we consider the use of a random effect model when a parameterized family of links (Aranda-Ordaz 1981, Prentice 1996, Pregibon 1980, Stukel 1988 and Czado 1997) is introduced. We point out the advantages and the drawbacks associated with the choice of this data-driven kind of modeling. Difficulties in the interpretation of regression parameters, and therefore in understanding the influence of covariates, as well as problems related to loss of efficiency of estimates and overfitting, are discussed. A case study on radiotherapy usage in breast cancer treatment is discussed.
Resumo:
Napjaink informatikai világának talán legkeresettebb hívó szava a cloud computing, vagy magyar fordításban, a számítási felhő. A fordítás forrása az EU-s (Digitális Menetrend magyar változata, 2010) A számítási felhő üzleti modelljének részletes leírását adja (Bőgel, 2009). Bőgel György ismerteti az új, közműszerű informatikai szolgáltatás kialakulását és gazdasági előnyeit, nagy jövőt jósolva a számítási felhőnek az üzleti modellek versenyében. A szerző – a számítási felhő üzleti előnyei mellett – nagyobb hangsúlyt fektet dolgozatában a gyors elterjedést gátló tényezőkre, és arra, hogy mit jelentenek az előnyök és a hátrányok egy üzleti, informatikai vagy megfelelőségi vezető számára. Nem csökkentve a cloud modell gazdasági jelentőségét, fontosnak tartja, hogy a problémákról és a kockázatokról is szóljon. Kiemeli, hogy a kockázatokban – különösen a biztonsági és adatvédelmi kockázatokban – lényeges különbségek vannak az Európai Gazdasági Térség és a világ többi része, pl. az Amerikai Egyesült Államok között. A cikkben rámutat ezekre a különbségekre, és az olvasó magyarázatot kap arra is, hogy miért várható a számítási felhő lassabb terjedése Európában, mint a világ más részein. Bemutatja az EU erőfeszítéseit is a számítási felhő európai terjedésének elősegítésére, tekintettel a modell versenyképességet növelő hatására. / === / One of the most popular concept of the recent web searches is cloud computing. Several authors present detailed description of the new service model and it's business benefits and cite the optimistic prognoses of the cloud experts regarding the competition of information system service models. The author analyses the operational benefits of the cloud application and give a detailed description of the inhibitors of the fast expansion of the service modell. He also analyses the pros and cons of the cloud for a business manager, an information and a compliance officer. When understanding the advantages of the cloud, it is equally important to review the problems and risks associated with the model. The paper gives a list of the expected cloud-specific risks. It also explains the differences in security and data protection approach between the European Economic Area and the rest of the world, including the USA. The explains why slower expansion of the cloud modell is expected in Europe than in the rest of the world. The efforts of the EU Committee in helping to spread the cloud model is also presented, as the EU's officers consider the model as an important element of competitiveness.
Resumo:
The necessity of elemental analysis techniques to solve forensic problems continues to expand as the samples collected from crime scenes grow in complexity. Laser ablation ICP-MS (LA-ICP-MS) has been shown to provide a high degree of discrimination between samples that originate from different sources. In the first part of this research, two laser ablation ICP-MS systems were compared, one using a nanosecond laser and another a femtosecond laser source for the forensic analysis of glass. The results showed that femtosecond LA-ICP-MS did not provide significant improvements in terms of accuracy, precision and discrimination, however femtosecond LA-ICP-MS did provide lower detection limits. In addition, it was determined that even for femtosecond LA-ICP-MS an internal standard should be utilized to obtain accurate analytical results for glass analyses. In the second part, a method using laser induced breakdown spectroscopy (LIBS) for the forensic analysis of glass was shown to provide excellent discrimination for a glass set consisting of 41 automotive fragments. The discrimination power was compared to two of the leading elemental analysis techniques, μXRF and LA-ICP-MS, and the results were similar; all methods generated >99% discrimination and the pairs found indistinguishable were similar. An extensive data analysis approach for LIBS glass analyses was developed to minimize Type I and II errors en route to a recommendation of 10 ratios to be used for glass comparisons. Finally, a LA-ICP-MS method for the qualitative analysis and discrimination of gel ink sources was developed and tested for a set of ink samples. In the first discrimination study, qualitative analysis was used to obtain 95.6% discrimination for a blind study consisting of 45 black gel ink samples provided by the United States Secret Service. A 0.4% false exclusion (Type I) error rate and a 3.9% false inclusion (Type II) error rate was obtained for this discrimination study. In the second discrimination study, 99% discrimination power was achieved for a black gel ink pen set consisting of 24 self collected samples. The two pairs found to be indistinguishable came from the same source of origin (the same manufacturer and type of pen purchased in different locations). It was also found that gel ink from the same pen, regardless of the age, was indistinguishable as were gel ink pens (four pens) originating from the same pack.
Resumo:
Modern IT infrastructures are constructed by large scale computing systems and administered by IT service providers. Manually maintaining such large computing systems is costly and inefficient. Service providers often seek automatic or semi-automatic methodologies of detecting and resolving system issues to improve their service quality and efficiency. This dissertation investigates several data-driven approaches for assisting service providers in achieving this goal. The detailed problems studied by these approaches can be categorized into the three aspects in the service workflow: 1) preprocessing raw textual system logs to structural events; 2) refining monitoring configurations for eliminating false positives and false negatives; 3) improving the efficiency of system diagnosis on detected alerts. Solving these problems usually requires a huge amount of domain knowledge about the particular computing systems. The approaches investigated by this dissertation are developed based on event mining algorithms, which are able to automatically derive part of that knowledge from the historical system logs, events and tickets. ^ In particular, two textual clustering algorithms are developed for converting raw textual logs into system events. For refining the monitoring configuration, a rule based alert prediction algorithm is proposed for eliminating false alerts (false positives) without losing any real alert and a textual classification method is applied to identify the missing alerts (false negatives) from manual incident tickets. For system diagnosis, this dissertation presents an efficient algorithm for discovering the temporal dependencies between system events with corresponding time lags, which can help the administrators to determine the redundancies of deployed monitoring situations and dependencies of system components. To improve the efficiency of incident ticket resolving, several KNN-based algorithms that recommend relevant historical tickets with resolutions for incoming tickets are investigated. Finally, this dissertation offers a novel algorithm for searching similar textual event segments over large system logs that assists administrators to locate similar system behaviors in the logs. Extensive empirical evaluation on system logs, events and tickets from real IT infrastructures demonstrates the effectiveness and efficiency of the proposed approaches.^
Resumo:
Research endeavors on spoken dialogue systems in the 1990s and 2000s have led to the deployment of commercial spoken dialogue systems (SDS) in microdomains such as customer service automation, reservation/booking and question answering systems. Recent research in SDS has been focused on the development of applications in different domains (e.g. virtual counseling, personal coaches, social companions) which requires more sophistication than the previous generation of commercial SDS. The focus of this research project is the delivery of behavior change interventions based on the brief intervention counseling style via spoken dialogue systems. ^ Brief interventions (BI) are evidence-based, short, well structured, one-on-one counseling sessions. Many challenges are involved in delivering BIs to people in need, such as finding the time to administer them in busy doctors' offices, obtaining the extra training that helps staff become comfortable providing these interventions, and managing the cost of delivering the interventions. Fortunately, recent developments in spoken dialogue systems make the development of systems that can deliver brief interventions possible. ^ The overall objective of this research is to develop a data-driven, adaptable dialogue system for brief interventions for problematic drinking behavior, based on reinforcement learning methods. The implications of this research project includes, but are not limited to, assessing the feasibility of delivering structured brief health interventions with a data-driven spoken dialogue system. Furthermore, while the experimental system focuses on harmful alcohol drinking as a target behavior in this project, the produced knowledge and experience may also lead to implementation of similarly structured health interventions and assessments other than the alcohol domain (e.g. obesity, drug use, lack of exercise), using statistical machine learning approaches. ^ In addition to designing a dialog system, the semantic and emotional meanings of user utterances have high impact on interaction. To perform domain specific reasoning and recognize concepts in user utterances, a named-entity recognizer and an ontology are designed and evaluated. To understand affective information conveyed through text, lexicons and sentiment analysis module are developed and tested.^
Resumo:
Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.
Resumo:
The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors. In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually. To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.
Resumo:
Wireless Sensor and Actuator Networks (WSAN) are a key component in Ubiquitous Computing Systems and have many applications in different knowledge domains. Programming for such networks is very hard and requires developers to know the available sensor platforms specificities, increasing the learning curve for developing WSAN applications. In this work, an MDA (Model-Driven Architecture) approach for WSAN applications development called ArchWiSeN is proposed. The goal of such approach is to facilitate the development task by providing: (i) A WSAN domain-specific language, (ii) a methodology for WSAN application development; and (iii) an MDA infrastructure composed of several software artifacts (PIM, PSMs and transformations). ArchWiSeN allows the direct contribution of domain experts in the WSAN application development without the need of specialized knowledge on WSAN platforms and, at the same time, allows network experts to manage the application requirements without the need for specific knowledge of the application domain. Furthermore, this approach also aims to enable developers to express and validate functional and non-functional requirements of the application, incorporate services offered by WSAN middleware platforms and promote reuse of the developed software artifacts. In this sense, this Thesis proposes an approach that includes all WSAN development stages for current and emerging scenarios through the proposed MDA infrastructure. An evaluation of the proposal was performed by: (i) a proof of concept encompassing three different scenarios performed with the usage of the MDA infrastructure to describe the WSAN development process using the application engineering process, (ii) a controlled experiment to assess the use of the proposed approach compared to traditional method of WSAN application development, (iii) the analysis of ArchWiSeN support of middleware services to ensure that WSAN applications using such services can achieve their requirements ; and (iv) systematic analysis of ArchWiSeN in terms of desired characteristics for MDA tool when compared with other existing MDA tools for WSAN.
Resumo:
Research endeavors on spoken dialogue systems in the 1990s and 2000s have led to the deployment of commercial spoken dialogue systems (SDS) in microdomains such as customer service automation, reservation/booking and question answering systems. Recent research in SDS has been focused on the development of applications in different domains (e.g. virtual counseling, personal coaches, social companions) which requires more sophistication than the previous generation of commercial SDS. The focus of this research project is the delivery of behavior change interventions based on the brief intervention counseling style via spoken dialogue systems. Brief interventions (BI) are evidence-based, short, well structured, one-on-one counseling sessions. Many challenges are involved in delivering BIs to people in need, such as finding the time to administer them in busy doctors' offices, obtaining the extra training that helps staff become comfortable providing these interventions, and managing the cost of delivering the interventions. Fortunately, recent developments in spoken dialogue systems make the development of systems that can deliver brief interventions possible. The overall objective of this research is to develop a data-driven, adaptable dialogue system for brief interventions for problematic drinking behavior, based on reinforcement learning methods. The implications of this research project includes, but are not limited to, assessing the feasibility of delivering structured brief health interventions with a data-driven spoken dialogue system. Furthermore, while the experimental system focuses on harmful alcohol drinking as a target behavior in this project, the produced knowledge and experience may also lead to implementation of similarly structured health interventions and assessments other than the alcohol domain (e.g. obesity, drug use, lack of exercise), using statistical machine learning approaches. In addition to designing a dialog system, the semantic and emotional meanings of user utterances have high impact on interaction. To perform domain specific reasoning and recognize concepts in user utterances, a named-entity recognizer and an ontology are designed and evaluated. To understand affective information conveyed through text, lexicons and sentiment analysis module are developed and tested.