963 resultados para count data models
Resumo:
In this thesis a semi-automated cell analysis system is described through image processing. To achieve this, an image processing algorithm was studied in order to segment cells in a semi-automatic way. The main goal of this analysis is to increase the performance of cell image segmentation process, without affecting the results in a significant way. Even though, a totally manual system has the ability of producing the best results, it has the disadvantage of taking too long and being repetitive, when a large number of images need to be processed. An active contour algorithm was tested in a sequence of images taken by a microscope. This algorithm, more commonly known as snakes, allowed the user to define an initial region in which the cell was incorporated. Then, the algorithm would run several times, making the initial region contours to converge to the cell boundaries. With the final contour, it was possible to extract region properties and produce statistical data. This data allowed to say that this algorithm produces similar results to a purely manual system but at a faster rate. On the other hand, it is slower than a purely automatic way but it allows the user to adjust the contour, making it more versatile and tolerant to image variations.
Resumo:
Theoretical epidemiology aims to understand the dynamics of diseases in populations and communities. Biological and behavioral processes are abstracted into mathematical formulations which aim to reproduce epidemiological observations. In this thesis a new system for the self-reporting of syndromic data — Influenzanet — is introduced and assessed. The system is currently being extended to address greater challenges of monitoring the health and well-being of tropical communities.(...)
Resumo:
Nowadays, a significant increase on the demand for interoperable systems for exchanging data in business collaborative environments has been noticed. Consequently, cooperation agreements between each of the involved enterprises have been brought to light. However, due to the fact that even in a same community or domain, there is a big variety of knowledge representation not semantically coincident, which embodies the existence of interoperability problems in the enterprises information systems that need to be addressed. Moreover, in relation to this, most organizations face other problems about their information systems, as: 1) domain knowledge not being easily accessible by all the stakeholders (even intra-enterprise); 2) domain knowledge not being represented in a standard format; 3) and even if it is available in a standard format, it is not supported by semantic annotations or described using a common and understandable lexicon. This dissertation proposes an approach for the establishment of an enterprise reference lexicon from business models. It addresses the automation in the information models mapping for the reference lexicon construction. It aggregates a formal and conceptual representation of the business domain, with a clear definition of the used lexicon to facilitate an overall understanding by all the involved stakeholders, including non-IT personnel.
Resumo:
The computational power is increasing day by day. Despite that, there are some tasks that are still difficult or even impossible for a computer to perform. For example, while identifying a facial expression is easy for a human, for a computer it is an area in development. To tackle this and similar issues, crowdsourcing has grown as a way to use human computation in a large scale. Crowdsourcing is a novel approach to collect labels in a fast and cheap manner, by sourcing the labels from the crowds. However, these labels lack reliability since annotators are not guaranteed to have any expertise in the field. This fact has led to a new research area where we must create or adapt annotation models to handle these weaklylabeled data. Current techniques explore the annotators’ expertise and the task difficulty as variables that influences labels’ correction. Other specific aspects are also considered by noisy-labels analysis techniques. The main contribution of this thesis is the process to collect reliable crowdsourcing labels for a facial expressions dataset. This process consists in two steps: first, we design our crowdsourcing tasks to collect annotators labels; next, we infer the true label from the collected labels by applying state-of-art crowdsourcing algorithms. At the same time, a facial expression dataset is created, containing 40.000 images and respective labels. At the end, we publish the resulting dataset.
Resumo:
The assessment of wind energy resource for the development of deep offshore wind plants requires the use of every possible source of data and, in many cases, includes data gathered at meteorological stations installed at islands, islets or even oil platforms—all structures that interfere with, and change, the flow characteristics. This work aims to contribute to the evaluation of such changes in the flow by developing a correction methodology and applying it to the case of Berlenga island, Portugal. The study is performed using computational fluid dynamic simulations (CFD) validated by wind tunnel tests. In order to simulate the incoming offshore flow with CFD models a wind profile, unknown a priori, was established using observations from two coastal wind stations and a power law wind profile was fitted to the existing data (a=0.165). The results show that the resulting horizontal wind speed at 80 m above sea level is 16% lower than the wind speed at 80 m above the island for the dominant wind direction sector.
Resumo:
The life of humans and most living beings depend on sensation and perception for the best assessment of the surrounding world. Sensorial organs acquire a variety of stimuli that are interpreted and integrated in our brain for immediate use or stored in memory for later recall. Among the reasoning aspects, a person has to decide what to do with available information. Emotions are classifiers of collected information, assigning a personal meaning to objects, events and individuals, making part of our own identity. Emotions play a decisive role in cognitive processes as reasoning, decision and memory by assigning relevance to collected information. The access to pervasive computing devices, empowered by the ability to sense and perceive the world, provides new forms of acquiring and integrating information. But prior to data assessment on its usefulness, systems must capture and ensure that data is properly managed for diverse possible goals. Portable and wearable devices are now able to gather and store information, from the environment and from our body, using cloud based services and Internet connections. Systems limitations in handling sensorial data, compared with our sensorial capabilities constitute an identified problem. Another problem is the lack of interoperability between humans and devices, as they do not properly understand human’s emotional states and human needs. Addressing those problems is a motivation for the present research work. The mission hereby assumed is to include sensorial and physiological data into a Framework that will be able to manage collected data towards human cognitive functions, supported by a new data model. By learning from selected human functional and behavioural models and reasoning over collected data, the Framework aims at providing evaluation on a person’s emotional state, for empowering human centric applications, along with the capability of storing episodic information on a person’s life with physiologic indicators on emotional states to be used by new generation applications.
Resumo:
Grasslands in semi-arid regions, like Mongolian steppes, are facing desertification and degradation processes, due to climate change. Mongolia’s main economic activity consists on an extensive livestock production and, therefore, it is a concerning matter for the decision makers. Remote sensing and Geographic Information Systems provide the tools for advanced ecosystem management and have been widely used for monitoring and management of pasture resources. This study investigates which is the higher thematic detail that is possible to achieve through remote sensing, to map the steppe vegetation, using medium resolution earth observation imagery in three districts (soums) of Mongolia: Dzag, Buutsagaan and Khureemaral. After considering different thematic levels of detail for classifying the steppe vegetation, the existent pasture types within the steppe were chosen to be mapped. In order to investigate which combination of data sets yields the best results and which classification algorithm is more suitable for incorporating these data sets, a comparison between different classification methods were tested for the study area. Sixteen classifications were performed using different combinations of estimators, Landsat-8 (spectral bands and Landsat-8 NDVI-derived) and geophysical data (elevation, mean annual precipitation and mean annual temperature) using two classification algorithms, maximum likelihood and decision tree. Results showed that the best performing model was the one that incorporated Landsat-8 bands with mean annual precipitation and mean annual temperature (Model 13), using the decision tree. For maximum likelihood, the model that incorporated Landsat-8 bands with mean annual precipitation (Model 5) and the one that incorporated Landsat-8 bands with mean annual precipitation and mean annual temperature (Model 13), achieved the higher accuracies for this algorithm. The decision tree models consistently outperformed the maximum likelihood ones.
Resumo:
As huge amounts of data become available in organizations and society, specific data analytics skills and techniques are needed to explore this data and extract from it useful patterns, tendencies, models or other useful knowledge, which could be used to support the decision-making process, to define new strategies or to understand what is happening in a specific field. Only with a deep understanding of a phenomenon it is possible to fight it. In this paper, a data-driven analytics approach is used for the analysis of the increasing incidence of fatalities by pneumonia in the Portuguese population, characterizing the disease and its incidence in terms of fatalities, knowledge that can be used to define appropriate strategies that can aim to reduce this phenomenon, which has increased more than 65% in a decade.
Resumo:
This paper aims at developing a collision prediction model for three-leg junctions located in national roads (NR) in Northern Portugal. The focus is to identify factors that contribute for collision type crashes in those locations, mainly factors related to road geometric consistency, since literature is scarce on those, and to research the impact of three modeling methods: generalized estimating equations, random-effects negative binomial models and random-parameters negative binomial models, on the factors of those models. The database used included data published between 2008 and 2010 of 177 three-leg junctions. It was split in three groups of contributing factors which were tested sequentially for each of the adopted models: at first only traffic, then, traffic and the geometric characteristics of the junctions within their area of influence; and, lastly, factors which show the difference between the geometric characteristics of the segments boarding the junctionsâ area of influence and the segment included in that area were added. The choice of the best modeling technique was supported by the result of a cross validation made to ascertain the best model for the three sets of researched contributing factors. The models fitted with random-parameters negative binomial models had the best performance in the process. In the best models obtained for every modeling technique, the characteristics of the road environment, including proxy measures for the geometric consistency, along with traffic volume, contribute significantly to the number of collisions. Both the variables concerning junctions and the various national highway segments in their area of influence, as well as variations from those characteristics concerning roadway segments which border the already mentioned area of influence have proven their relevance and, therefore, there is a rightful need to incorporate the effect of geometric consistency in the three-leg junctions safety studies.
Resumo:
Earthworks tasks aim at levelling the ground surface at a target construction area and precede any kind of structural construction (e.g., road and railway construction). It is comprised of sequential tasks, such as excavation, transportation, spreading and compaction, and it is strongly based on heavy mechanical equipment and repetitive processes. Under this context, it is essential to optimize the usage of all available resources under two key criteria: the costs and duration of earthwork projects. In this paper, we present an integrated system that uses two artificial intelligence based techniques: data mining and evolutionary multi-objective optimization. The former is used to build data-driven models capable of providing realistic estimates of resource productivity, while the latter is used to optimize resource allocation considering the two main earthwork objectives (duration and cost). Experiments held using real-world data, from a construction site, have shown that the proposed system is competitive when compared with current manual earthwork design.
Resumo:
Developing and implementing data-oriented workflows for data migration processes are complex tasks involving several problems related to the integration of data coming from different schemas. Usually, they involve very specific requirements - every process is almost unique. Having a way to abstract their representation will help us to better understand and validate them with business users, which is a crucial step for requirements validation. In this demo we present an approach that provides a way to enrich incrementally conceptual models in order to support an automatic way for producing their correspondent physical implementation. In this demo we will show how B2K (Business to Kettle) system works transforming BPMN 2.0 conceptual models into Kettle data-integration executable processes, approaching the most relevant aspects related to model design and enrichment, model to system transformation, and system execution.
Resumo:
ETL conceptual modeling is a very important activity in any data warehousing system project implementation. Owning a high-level system representation allowing for a clear identification of the main parts of a data warehousing system is clearly a great advantage, especially in early stages of design and development. However, the effort to model conceptually an ETL system rarely is properly rewarded. Translating ETL conceptual models directly into something that saves work and time on the concrete implementation of the system process it would be, in fact, a great help. In this paper we present and discuss a hybrid approach to this problem, combining the simplicity of interpretation and power of expression of BPMN on ETL systems conceptualization with the use of ETL patterns to produce automatically an ETL skeleton, a first prototype system, which has the ability to be executed in a commercial ETL tool like Kettle.
Resumo:
Worldwide, around 9% of the children are born with less than 37 weeks of labour, causing risk to the premature child, whom it is not prepared to develop a number of basic functions that begin soon after the birth. In order to ensure that those risk pregnancies are being properly monitored by the obstetricians in time to avoid those problems, Data Mining (DM) models were induced in this study to predict preterm births in a real environment using data from 3376 patients (women) admitted in the maternal and perinatal care unit of Centro Hospitalar of Oporto. A sensitive metric to predict preterm deliveries was developed, assisting physicians in the decision-making process regarding the patients’ observation. It was possible to obtain promising results, achieving sensitivity and specificity values of 96% and 98%, respectively.
Resumo:
In Maternity Care, a quick decision has to be made about the most suitable delivery type for the current patient. Guidelines are followed by physicians to support that decision; however, those practice recommendations are limited and underused. In the last years, caesarean delivery has been pursued in over 28% of pregnancies, and other operative techniques regarding specific problems have also been excessively employed. This study identifies obstetric and pregnancy factors that can be used to predict the most appropriate delivery technique, through the induction of data mining models using real data gathered in the perinatal and maternal care unit of Centro Hospitalar of Oporto (CHP). Predicting the type of birth envisions high-quality services, increased safety and effectiveness of specific practices to help guide maternity care decisions and facilitate optimal outcomes in mother and child. In this work was possible to acquire good results, achieving sensitivity and specificity values of 90.11% and 80.05%, respectively, providing the CHP with a model capable of correctly identify caesarean sections and vaginal deliveries.
Resumo:
Rockburst is characterized by a violent explosion of a block causing a sudden rupture in the rock and is quite common in deep tunnels. It is critical to understand the phenomenon of rockburst, focusing on the patterns of occurrence so these events can be avoided and/or managed saving costs and possibly lives. The failure mechanism of rockburst needs to be better understood. Laboratory experiments are undergoing at the Laboratory for Geomechanics and Deep Underground Engineering (SKLGDUE) of Beijing and the system is described. A large number of rockburst tests were performed and their information collected, stored in a database and analyzed. Data Mining (DM) techniques were applied to the database in order to develop predictive models for the rockburst maximum stress (σRB) and rockburst risk index (IRB) that need the results of such tests to be determined. With the developed models it is possible to predict these parameters with high accuracy levels using data from the rock mass and specific project.