811 resultados para Data-driven analysis
Resumo:
In recent years, rapid advances in information technology have led to various data collection systems which are enriching the sources of empirical data for use in transport systems. Currently, traffic data are collected through various sensors including loop detectors, probe vehicles, cell-phones, Bluetooth, video cameras, remote sensing and public transport smart cards. It has been argued that combining the complementary information from multiple sources will generally result in better accuracy, increased robustness and reduced ambiguity. Despite the fact that there have been substantial advances in data assimilation techniques to reconstruct and predict the traffic state from multiple data sources, such methods are generally data-driven and do not fully utilize the power of traffic models. Furthermore, the existing methods are still limited to freeway networks and are not yet applicable in the urban context due to the enhanced complexity of the flow behavior. The main traffic phenomena on urban links are generally caused by the boundary conditions at intersections, un-signalized or signalized, at which the switching of the traffic lights and the turning maneuvers of the road users lead to shock-wave phenomena that propagate upstream of the intersections. This paper develops a new model-based methodology to build up a real-time traffic prediction model for arterial corridors using data from multiple sources, particularly from loop detectors and partial observations from Bluetooth and GPS devices.
Resumo:
Since 2006, we have been conducting urban informatics research that we define as “the study, design, and practice of urban experiences across different urban contexts that are created by new opportunities of real-time, ubiquitous technology and the augmentation that mediates the physical and digital layers of people networks and urban infrastructures” [1]. Various new research initiatives under the label “urban informatics” have been started since then by universities (e.g., NYU’s Center for Urban Science and Progress) and industry (e.g., Arup, McKinsey) worldwide. Yet, many of these new initiatives are limited to what Townsend calls, “data-driven approaches to urban improvement” [2]. One of the key challenges is that any quantity of aggregated data does not easily translate directly into quality insights to better understand cities. In this talk, I will raise questions about the purpose of urban informatics research beyond data, and show examples of media architecture, participatory city making, and citizen activism. I argue for (1) broadening the disciplinary foundations that urban science approaches draw on; (2) maintaining a hybrid perspective that considers both the bird’s eye view as well as the citizen’s view, and; (3) employing design research to not be limited to just understanding, but to bring about actionable knowledge that will drive change for good.
Resumo:
Big Data and predictive analytics have received significant attention from the media and academic literature throughout the past few years, and it is likely that these emerging technologies will materially impact the mining sector. This short communication argues, however, that these technological forces will probably unfold differently in the mining industry than they have in many other sectors because of significant differences in the marginal cost of data capture and storage. To this end, we offer a brief overview of what Big Data and predictive analytics are, and explain how they are bringing about changes in a broad range of sectors. We discuss the “N=all” approach to data collection being promoted by many consultants and technology vendors in the marketplace but, by considering the economic and technical realities of data acquisition and storage, we then explain why a “n « all” data collection strategy probably makes more sense for the mining sector. Finally, towards shaping the industry’s policies with regards to technology-related investments in this area, we conclude by putting forward a conceptual model for leveraging Big Data tools and analytical techniques that is a more appropriate fit for the mining sector.
Resumo:
Robust estimation often relies on a dispersion function that is more slowly varying at large values than the square function. However, the choice of tuning constant in dispersion functions may impact the estimation efficiency to a great extent. For a given family of dispersion functions such as the Huber family, we suggest obtaining the "best" tuning constant from the data so that the asymptotic efficiency is maximized. This data-driven approach can automatically adjust the value of the tuning constant to provide the necessary resistance against outliers. Simulation studies show that substantial efficiency can be gained by this data-dependent approach compared with the traditional approach in which the tuning constant is fixed. We briefly illustrate the proposed method using two datasets.
Resumo:
The aim of the study was to get acquainted with the activity of Näppärät Mummot, a Lahti-based crafts society, and its importance to the wellness of the members of the group. The selected aim, i.e., analyzing the wellness, largely affected the whole research process and its results. According to earlier studies in the field, different forms of craft and expressional activity promote one's wellness as well as support the work for one's identity. Based on my theoretical knowledge, my research was set out to: 1) form a general view of crafts culture within Näppärät Mummot and 2) find out how recollective craft that promotes wellness is perceived through communality, experiential activity, work for one's identity, and divided as well as undivided craft. Qualitative field work was governed by ethnographic research strategy, according to which I set out to get thoroughly familiar with the society I was studying. The methods I used to collect data were participant observation and thematic interview. I used a field diary for writing down all data I acquired through the observation. The interviewee group was formed by seven members of Näppärät Mummot. An mp3 recorder was used to record the interviews, which I transcribed later. The method for data analysis was qualitative content analysis, for which I used Weft QDA, a qualitative analysis software application. I formed themes that shed light on research tasks from the data using coding and theory-driven analysis. I kept literature and data I collected in cooperation through the whole analysis process. Lastly, drawing from the classes of meaning of therapeutic craft that I sketched by means of summarizing and classifying, I presented the central concepts that describe the main results of the study. The main results were six concepts that describe Näppärät Mummot's crafts culture and recollective craft with its wellness-beneficial effect: 1) autobiographical craft, 2) shared work for one's identity, 3) shared intention for craft, 4) craft as a partner, 5) individual manner of craft, and 6) shared improvement. Craft promoted wellness in many ways. It was used to promote inner life management in difficult times and it also provided sensations of empowerment through pleasure from craft. Expressional, shared craft also served as means of reinforcing one's identity in various ways. Expressional work for one's identity through autobiographical themes of craft represented rearranging one's life through holistic craft. A personal way of doing things also served as expressional action and work for one's identity even with divided craft. Shared work for identities meant reinforcing the identities of the members through discources of craft and interaction with their close ones. What proves the interconnection between communality and craft as well as their shared meaning is that communality motivated the members to work on their craft projects, while craft served as the means of communication between the members: communication through craft was easier than lingual communication. The results can not be generalized to apply to other groups: they are used to describe the versatile means of recollective craft to promote the well-being among the crafts society Näppärät Mummot. However, the results do introduce a new perspective to the social discussion on how cultural activities promote well-being.
Resumo:
Data-driven approaches such as Gaussian Process (GP) regression have been used extensively in recent robotics literature to achieve estimation by learning from experience. To ensure satisfactory performance, in most cases, multiple learning inputs are required. Intuitively, adding new inputs can often contribute to better estimation accuracy, however, it may come at the cost of a new sensor, larger training dataset and/or more complex learning, some- times for limited benefits. Therefore, it is crucial to have a systematic procedure to determine the actual impact each input has on the estimation performance. To address this issue, in this paper we propose to analyse the impact of each input on the estimate using a variance-based sensitivity analysis method. We propose an approach built on Analysis of Variance (ANOVA) decomposition, which can characterise how the prediction changes as one or more of the input changes, and also quantify the prediction uncertainty as attributed from each of the inputs in the framework of dependent inputs. We apply the proposed approach to a terrain-traversability estimation method we proposed in prior work, which is based on multi-task GP regression, and we validate this implementation experimentally using a rover on a Mars-analogue terrain.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.
Resumo:
The purpose of this research is to examine whether short-term communication training can have an impact on the improvement of communication capacity of working communities, and what are prerequisites for the creation of such capacity. Subjects of this research were short-term communication trainings aimed at the managerial and expert levels of enterprises and communities. The research endeavors to find out how communication trainings with an impact should be devised and implemented, and what this requires from the client and provider of the training service. The research data is mostly comprised of quantitative feed-back collected at the end of a training day, as well as delayed interviews. The evaluations have been based on a stakeholder approach, and those concerned were participants to the trainings, clients having commissioned the trainings and communication trainers. The principal method of the qualitative analysis is that of a data-driven content analysis. Two research instruments have been constructed for the analysis and for the presentation of the results: an evaluation circle for the purposes of a holistic evaluation and a development matrix for the structuring of an effective training. The core concept of the matrix is a carrier wave effect, which is needed to carry the abstractions from the training into concrete functions in the everyday life. The relevance of the results has been tested in a pilot organization. The immediate assessment and delayed evaluations gave a very differing picture of the trainings. The immediate feedback was of nearly commendable level, but the effects carried forward into the everyday situations of the working community were small and that the learning rarely was applied into practice. A training session that receives good feedback does not automatically result in the development of individual competence, let alone that of the community. The results show that even short-term communication training can promote communication competence that eventually changes the working culture on an organizational level, provided that the training is designed into a process and that the connections into the participants’ work are ensured. It is essential that all eight elements of the carrier wave effect are taken into account. The entire purchaser-provider -process must function while not omitting the contribution of the participants themselves. The research illustrates the so called bow tie -model of an effective communication training based on the carrier wave effect. Testing the results in pilot trainings showed that a rather small change in the training approach may have a signi¬ficant effect on the outcome of the training as well as those effects that are carried on into the working community. The evaluation circle proved to be a useful tool, which can be used while planning, executing and evaluating training in practice. The development matrix works as a tool for those producing the training service, those using the service as well as those deciding on the purchase of the service in planning and evaluating training that sustainably improves communication capacity. Thus the evaluation circle also works to support and ensure the long-term effects of short-term trainings. In addition to communication trainings, the tools developed for this research are useable for many such needs, where an organization is looking to improve its operations and profitability through training.
Resumo:
This paper presents a fast algorithm for data exchange in a network of processors organized as a reconfigurable tree structure. For a given data exchange table, the algorithm generates a sequence of tree configurations in which the data exchanges are to be executed. A significant feature of the algorithm is that each exchange is executed in a tree configuration in which the source and destination nodes are adjacent to each other. It has been proved in a theorem that for every pair of nodes in the reconfigurable tree structure, there always exists two and only two configurations in which these two nodes are adjacent to each other. The algorithm utilizes this fact and determines the solution so as to optimize both the number of configurations required and the time to perform the data exchanges. Analysis of the algorithm shows that it has linear time complexity, and provides a large reduction in run-time as compared to a previously proposed algorithm. This is well-confirmed from the experimental results obtained by executing a large number of randomly-generated data exchange tables. Another significant feature of the algorithm is that the bit-size of the routing information code is always two bits, irrespective of the number of nodes in the tree. This not only increases the speed of the algorithm but also results in simpler hardware inside each node.
Resumo:
Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.
Resumo:
Partial differential equations (PDEs) with multiscale coefficients are very difficult to solve due to the wide range of scales in the solutions. In the thesis, we propose some efficient numerical methods for both deterministic and stochastic PDEs based on the model reduction technique.
For the deterministic PDEs, the main purpose of our method is to derive an effective equation for the multiscale problem. An essential ingredient is to decompose the harmonic coordinate into a smooth part and a highly oscillatory part of which the magnitude is small. Such a decomposition plays a key role in our construction of the effective equation. We show that the solution to the effective equation is smooth, and could be resolved on a regular coarse mesh grid. Furthermore, we provide error analysis and show that the solution to the effective equation plus a correction term is close to the original multiscale solution.
For the stochastic PDEs, we propose the model reduction based data-driven stochastic method and multilevel Monte Carlo method. In the multiquery, setting and on the assumption that the ratio of the smallest scale and largest scale is not too small, we propose the multiscale data-driven stochastic method. We construct a data-driven stochastic basis and solve the coupled deterministic PDEs to obtain the solutions. For the tougher problems, we propose the multiscale multilevel Monte Carlo method. We apply the multilevel scheme to the effective equations and assemble the stiffness matrices efficiently on each coarse mesh grid. In both methods, the $\KL$ expansion plays an important role in extracting the main parts of some stochastic quantities.
For both the deterministic and stochastic PDEs, numerical results are presented to demonstrate the accuracy and robustness of the methods. We also show the computational time cost reduction in the numerical examples.
Resumo:
O presente trabalho trata do estudo, por meio de simulação de Monte Carlo, de correlações entre variáveis cinemáticas nas topologias de difração simples e de dupla troca de pomeron com vista a delimitar e estudar o espaço de fase referente às topologias citadas, em especial no que se refere á produção inclusiva de dijatos no contexto do experimento CMS/LHC. Será também apresentada uma análise da produção, por difração simples, de dijatos inclusivos a energia no centro de massa √s = 14 TeV (também por simulação de Monte Carlo), na qual estabelecemos um procedimento, a ser usado com dados, para a observação desse tipo de processo. Ainda analisamos a influência de diversos valores da probabilidade de sobrevivência do intervalo de rapidez, [|S|], nos resultados, de forma que com 10 pb -1 de dados acumulados, uma simples observação da produção de dijatos difrativos inclusivos, pelo método proposto, pode vir a excluir valores muito pequenos de [|S|].
Resumo:
The retrieval of DNA from ancient human specimens is not always successful owing to DNA deterioration and contamination although it is vital to provide new insights into the genetic structure of ancient people and to reconstruct the past history. Normally, only short DNA fragments can be retrieved from the ancient specimens. How to identify the authenticity of DNA obtained and to uncover the information it contained are difficult. We employed the ancient mtDNAs reported from Central Asia (including Xinjiang, China) as an example to discern potentially extraneous DNA contamination based on the updated mtDNA phylogeny derived from mtDNA control region, coding region, as well as complete sequence information. Our results demonstrated that many mtDNAs reported are more or less problematic. Starting from a reliable mtDNA phylogeney and combining the available modern data into analysis, one can ascertain the authenticity of the ancient DNA, distinguish the potential errors in a data set, and efficiently decipher the meager information it harbored. The reappraisal of the mtDNAs with the age of more than 2000 years from Central Asia gave support to the suggestion of extensively (pre)historical gene admixture in this region.
Resumo:
A baseline survey for the project which had been conducted in 2009 had gaps that could not allow assessment of project performance in the outcome and impact indicators to be made. This study was, therefore, commissioned to reconstruct the baseline data, aligned to the impact and outcome indicators on the project logframe and results framework, against which project achievements could be assessed. The purpose and scope of the study was to reconstruct the baseline data and analysis describing the situation prior to QAFM Project inception, taking 2008 as the baseline year, which was aligned to the project logframe outcome and impact indicators; to collect data on current status to compare project outcome (and where possible impact) in improved fish handling sites in comparison with the baseline as well as with comparable non-improved fish landing sites as control group. The study was conducted through secondary data search from sources at NaFIRRI, DFR and ICEIDA. Field data collection was carried out using a sample survey covering 312 respondents including boat and gear owners, crew members, processors and traders at eight project and two control landing sites. Key Informant Interviews were conducted with DFOs and BMU leaders in the study districts and landing sites respectively.