10 resultados para Data-driven analysis

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The aim of the study was to get acquainted with the activity of Näppärät Mummot, a Lahti-based crafts society, and its importance to the wellness of the members of the group. The selected aim, i.e., analyzing the wellness, largely affected the whole research process and its results. According to earlier studies in the field, different forms of craft and expressional activity promote one's wellness as well as support the work for one's identity. Based on my theoretical knowledge, my research was set out to: 1) form a general view of crafts culture within Näppärät Mummot and 2) find out how recollective craft that promotes wellness is perceived through communality, experiential activity, work for one's identity, and divided as well as undivided craft. Qualitative field work was governed by ethnographic research strategy, according to which I set out to get thoroughly familiar with the society I was studying. The methods I used to collect data were participant observation and thematic interview. I used a field diary for writing down all data I acquired through the observation. The interviewee group was formed by seven members of Näppärät Mummot. An mp3 recorder was used to record the interviews, which I transcribed later. The method for data analysis was qualitative content analysis, for which I used Weft QDA, a qualitative analysis software application. I formed themes that shed light on research tasks from the data using coding and theory-driven analysis. I kept literature and data I collected in cooperation through the whole analysis process. Lastly, drawing from the classes of meaning of therapeutic craft that I sketched by means of summarizing and classifying, I presented the central concepts that describe the main results of the study. The main results were six concepts that describe Näppärät Mummot's crafts culture and recollective craft with its wellness-beneficial effect: 1) autobiographical craft, 2) shared work for one's identity, 3) shared intention for craft, 4) craft as a partner, 5) individual manner of craft, and 6) shared improvement. Craft promoted wellness in many ways. It was used to promote inner life management in difficult times and it also provided sensations of empowerment through pleasure from craft. Expressional, shared craft also served as means of reinforcing one's identity in various ways. Expressional work for one's identity through autobiographical themes of craft represented rearranging one's life through holistic craft. A personal way of doing things also served as expressional action and work for one's identity even with divided craft. Shared work for identities meant reinforcing the identities of the members through discources of craft and interaction with their close ones. What proves the interconnection between communality and craft as well as their shared meaning is that communality motivated the members to work on their craft projects, while craft served as the means of communication between the members: communication through craft was easier than lingual communication. The results can not be generalized to apply to other groups: they are used to describe the versatile means of recollective craft to promote the well-being among the crafts society Näppärät Mummot. However, the results do introduce a new perspective to the social discussion on how cultural activities promote well-being.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this research is to examine whether short-term communication training can have an impact on the improvement of communication capacity of working communities, and what are prerequisites for the creation of such capacity. Subjects of this research were short-term communication trainings aimed at the managerial and expert levels of enterprises and communities. The research endeavors to find out how communication trainings with an impact should be devised and implemented, and what this requires from the client and provider of the training service. The research data is mostly comprised of quantitative feed-back collected at the end of a training day, as well as delayed interviews. The evaluations have been based on a stakeholder approach, and those concerned were participants to the trainings, clients having commissioned the trainings and communication trainers. The principal method of the qualitative analysis is that of a data-driven content analysis. Two research instruments have been constructed for the analysis and for the presentation of the results: an evaluation circle for the purposes of a holistic evaluation and a development matrix for the structuring of an effective training. The core concept of the matrix is a carrier wave effect, which is needed to carry the abstractions from the training into concrete functions in the everyday life. The relevance of the results has been tested in a pilot organization. The immediate assessment and delayed evaluations gave a very differing picture of the trainings. The immediate feedback was of nearly commendable level, but the effects carried forward into the everyday situations of the working community were small and that the learning rarely was applied into practice. A training session that receives good feedback does not automatically result in the development of individual competence, let alone that of the community. The results show that even short-term communication training can promote communication competence that eventually changes the working culture on an organizational level, provided that the training is designed into a process and that the connections into the participants’ work are ensured. It is essential that all eight elements of the carrier wave effect are taken into account. The entire purchaser-provider -process must function while not omitting the contribution of the participants themselves. The research illustrates the so called bow tie -model of an effective communication training based on the carrier wave effect. Testing the results in pilot trainings showed that a rather small change in the training approach may have a signi¬ficant effect on the outcome of the training as well as those effects that are carried on into the working community. The evaluation circle proved to be a useful tool, which can be used while planning, executing and evaluating training in practice. The development matrix works as a tool for those producing the training service, those using the service as well as those deciding on the purchase of the service in planning and evaluating training that sustainably improves communication capacity. Thus the evaluation circle also works to support and ensure the long-term effects of short-term trainings. In addition to communication trainings, the tools developed for this research are useable for many such needs, where an organization is looking to improve its operations and profitability through training.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Various reasons, such as ethical issues in maintaining blood resources, growing costs, and strict requirements for safe blood, have increased the pressure for efficient use of resources in blood banking. The competence of blood establishments can be characterized by their ability to predict the volume of blood collection to be able to provide cellular blood components in a timely manner as dictated by hospital demand. The stochastically varying clinical need for platelets (PLTs) sets a specific challenge for balancing supply with requests. Labour has been proven a primary cost-driver and should be managed efficiently. International comparisons of blood banking could recognize inefficiencies and allow reallocation of resources. Seventeen blood centres from 10 countries in continental Europe, Great Britain, and Scandinavia participated in this study. The centres were national institutes (5), parts of the local Red Cross organisation (5), or integrated into university hospitals (7). This study focused on the departments of blood component preparation of the centres. The data were obtained retrospectively by computerized questionnaires completed via Internet for the years 2000-2002. The data were used in four original articles (numbered I through IV) that form the basis of this thesis. Non-parametric data envelopment analysis (DEA, II-IV) was applied to evaluate and compare the relative efficiency of blood component preparation. Several models were created using different input and output combinations. The focus of comparisons was on the technical efficiency (II-III) and the labour efficiency (I, IV). An empirical cost model was tested to evaluate the cost efficiency (IV). Purchasing power parities (PPP, IV) were used to adjust the costs of the working hours and to make the costs comparable among countries. The total annual number of whole blood (WB) collections varied from 8,880 to 290,352 in the centres (I). Significant variation was also observed in the annual volume of produced red blood cells (RBCs) and PLTs. The annual number of PLTs produced by any method varied from 2,788 to 104,622 units. In 2002, 73% of all PLTs were produced by the buffy coat (BC) method, 23% by aphaeresis and 4% by the platelet-rich plasma (PRP) method. The annual discard rate of PLTs varied from 3.9% to 31%. The mean discard rate (13%) remained in the same range throughout the study period and demonstrated similar levels and variation in 2003-2004 according to a specific follow-up question (14%, range 3.8%-24%). The annual PLT discard rates were, to some extent, associated with production volumes. The mean RBC discard rate was 4.5% (range 0.2%-7.7%). Technical efficiency showed marked variation (median 60%, range 41%-100%) among the centres (II). Compared to the efficient departments, the inefficient departments used excess labour resources (and probably) production equipment to produce RBCs and PLTs. Technical efficiency tended to be higher when the (theoretical) proportion of lost WB collections (total RBC+PLT loss) from all collections was low (III). The labour efficiency varied remarkably, from 25% to 100% (median 47%) when working hours were the only input (IV). Using the estimated total costs as the input (cost efficiency) revealed an even greater variation (13%-100%) and overall lower efficiency level compared to labour only as the input. In cost efficiency only, the savings potential (observed inefficiency) was more than 50% in 10 departments, whereas labour and cost savings potentials were both more than 50% in six departments. The association between department size and efficiency (scale efficiency) could not be verified statistically in the small sample. In conclusion, international evaluation of the technical efficiency in component preparation departments revealed remarkable variation. A suboptimal combination of manpower and production output levels was the major cause of inefficiency, and the efficiency did not directly relate to production volume. Evaluation of the reasons for discarding components may offer a novel approach to study efficiency. DEA was proven applicable in analyses including various factors as inputs and outputs. This study suggests that analytical models can be developed to serve as indicators of technical efficiency and promote improvements in the management of limited resources. The work also demonstrates the importance of integrating efficiency analysis into international comparisons of blood banking.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The objective was to measure productivity growth and its components in Finnish agriculture, especially in dairy farming. The objective was also to compare different methods and models - both parametric (stochastic frontier analysis) and non-parametric (data envelopment analysis) - in estimating the components of productivity growth and the sensitivity of results with respect to different approaches. The parametric approach was also applied in the investigation of various aspects of heterogeneity. A common feature of the first three of five articles is that they concentrate empirically on technical change, technical efficiency change and the scale effect, mainly on the basis of the decompositions of Malmquist productivity index. The last two articles explore an intermediate route between the Fisher and Malmquist productivity indices and develop a detailed but meaningful decomposition for the Fisher index, including also empirical applications. Distance functions play a central role in the decomposition of Malmquist and Fisher productivity indices. Three panel data sets from 1990s have been applied in the study. The common feature of all data used is that they cover the periods before and after Finnish EU accession. Another common feature is that the analysis mainly concentrates on dairy farms or their roughage production systems. Productivity growth on Finnish dairy farms was relatively slow in the 1990s: approximately one percent per year, independent of the method used. Despite considerable annual variation, productivity growth seems to have accelerated towards the end of the period. There was a slowdown in the mid-1990s at the time of EU accession. No clear immediate effects of EU accession with respect to technical efficiency could be observed. Technical change has been the main contributor to productivity growth on dairy farms. However, average technical efficiency often showed a declining trend, meaning that the deviations from the best practice frontier are increasing over time. This suggests different paths of adjustment at the farm level. However, different methods to some extent provide different results, especially for the sub-components of productivity growth. In most analyses on dairy farms the scale effect on productivity growth was minor. A positive scale effect would be important for improving the competitiveness of Finnish agriculture through increasing farm size. This small effect may also be related to the structure of agriculture and to the allocation of investments to specific groups of farms during the research period. The result may also indicate that the utilization of scale economies faces special constraints in Finnish conditions. However, the analysis of a sample of all types of farms suggested a more considerable scale effect than the analysis on dairy farms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The main objective of this study is to evaluate selected geophysical, structural and topographic methods on regional, local, and tunnel and borehole scales, as indicators of the properties of fracture zones or fractures relevant to groundwater flow. Such information serves, for example, groundwater exploration and prediction of the risk of groundwater inflow in underground construction. This study aims to address how the features detected by these methods link to groundwater flow in qualitative and semi-quantitative terms and how well the methods reveal properties of fracturing affecting groundwater flow in the studied sites. The investigated areas are: (1) the Päijänne Tunnel for water-conveyance whose study serves as a verification of structures identified on regional and local scales; (2) the Oitti fuel spill site, to telescope across scales and compare geometries of structural assessment; and (3) Leppävirta, where fracturing and hydrogeological environment have been studied on the scale of a drilled well. The methods applied in this study include: the interpretation of lineaments from topographic data and their comparison with aeromagnetic data; the analysis of geological structures mapped in the Päijänne Tunnel; borehole video surveying; groundwater inflow measurements; groundwater level observations; and information on the tunnel s deterioration as demonstrated by block falls. The study combined geological and geotechnical information on relevant factors governing groundwater inflow into a tunnel and indicators of fracturing, as well as environmental datasets as overlays for spatial analysis using GIS. Geophysical borehole logging and fluid logging were used in Leppävirta to compare the responses of different methods to fracturing and other geological features on the scale of a drilled well. Results from some of the geophysical measurements of boreholes were affected by the large diameter (gamma radiation) or uneven surface (caliper) of these structures. However, different anomalies indicating more fractured upper part of the bedrock traversed by well HN4 in Leppävirta suggest that several methods can be used for detecting fracturing. Fracture trends appear to align similarly on different scales in the zone of the Päijänne Tunnel. For example, similarities of patterns were found between the regional magnetic trends, correlating with orientations of topographic lineaments interpreted as expressions of fracture zones. The same structural orientations as those of the larger structures on local or regional scales were observed in the tunnel, even though a match could not be made in every case. The size and orientation of the observation space (patch of terrain at the surface, tunnel section, or borehole), the characterization method, with its typical sensitivity, and the characteristics of the location, influence the identification of the fracture pattern. Through due consideration of the influence of the sampling geometry and by utilizing complementary fracture characterization methods in tandem, some of the complexities of the relationship between fracturing and groundwater flow can be addressed. The flow connections demonstrated by the response of the groundwater level in monitoring wells to pressure decrease in the tunnel and the transport of MTBE through fractures in bedrock in Oitti, highlight the importance of protecting the tunnel water from a risk of contamination. In general, the largest values of drawdown occurred in monitoring wells closest to the tunnel and/or close to the topographically interpreted fracture zones. It seems that, to some degree, the rate of inflow shows a positive correlation with the level of reinforcement, as both are connected with the fracturing in the bedrock. The following geological features increased the vulnerability of tunnel sections to pollution, especially when several factors affected the same locations: (1) fractured bedrock, particularly with associated groundwater inflow; (2) thin or permeable overburden above fractured rock; (3) a hydraulically conductive layer underneath the surface soil; and (4) a relatively thin bedrock roof above the tunnel. The observed anisotropy of the geological media should ideally be taken into account in the assessment of vulnerability of tunnel sections and eventually for directing protective measures.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A sensitive framework has been developed for modelling young radiata pine survival, its growth and its size class distribution, from time of planting to age 5 or 6 years. The data and analysis refer to the Central North Island region of New Zealand. The survival function is derived from a Weibull probability density function, to reflect diminishing mortality with the passage of time in young stands. An anamorphic family of trends was used, as very little between-tree competition can be expected in young stands. An exponential height function was found to fit best the lower portion of its sigmoid form. The most appropriate basal area/ha exponential function included an allometric adjustment which resulted in compatible mean height and basal area/ha models. Each of these equations successfully represented the effects of several establishment practices by making coefficients linear functions of site factors, management activities and their interactions. Height and diameter distribution modelling techniques that ensured compatibility with stand values were employed to represent the effects of management practices on crop variation. Model parameters for this research were estimated using data from site preparation experiments in the region and were tested with some independent data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The objectives of this study were to make a detailed and systematic empirical analysis of microfinance borrowers and non-borrowers in Bangladesh and also examine how efficiency measures are influenced by the access to agricultural microfinance. In the empirical analysis, this study used both parametric and non-parametric frontier approaches to investigate differences in efficiency estimates between microfinance borrowers and non-borrowers. This thesis, based on five articles, applied data obtained from a survey of 360 farm households from north-central and north-western regions in Bangladesh. The methods used in this investigation involve stochastic frontier (SFA) and data envelopment analysis (DEA) in addition to sample selectivity and limited dependent variable models. In article I, technical efficiency (TE) estimation and identification of its determinants were performed by applying an extended Cobb-Douglas stochastic frontier production function. The results show that farm households had a mean TE of 83% with lower TE scores for the non-borrowers of agricultural microfinance. Addressing institutional policies regarding the consolidation of individual plots into farm units, ensuring access to microfinance, extension education for the farmers with longer farming experience are suggested to improve the TE of the farmers. In article II, the objective was to assess the effects of access to microfinance on household production and cost efficiency (CE) and to determine the efficiency differences between the microfinance participating and non-participating farms. In addition, a non-discretionary DEA model was applied to capture directly the influence of microfinance on farm households production and CE. The results suggested that under both pooled DEA models and non-discretionary DEA models, farmers with access to microfinance were significantly more efficient than their non-borrowing counterparts. Results also revealed that land fragmentation, family size, household wealth, on farm-training and off farm income share are the main determinants of inefficiency after effectively correcting for sample selection bias. In article III, the TE of traditional variety (TV) and high-yielding-variety (HYV) rice producers were estimated in addition to investigating the determinants of adoption rate of HYV rice. Furthermore, the role of TE as a potential determinant to explain the differences of adoption rate of HYV rice among the farmers was assessed. The results indicated that in spite of its much higher yield potential, HYV rice production was associated with lower TE and had a greater variability in yield. It was also found that TE had a significant positive influence on the adoption rates of HYV rice. In article IV, we estimated profit efficiency (PE) and profit-loss between microfinance borrowers and non-borrowers by a sample selection framework, which provided a general framework for testing and taking into account the sample selection in the stochastic (profit) frontier function analysis. After effectively correcting for selectivity bias, the mean PE of the microfinance borrowers and non-borrowers were estimated at 68% and 52% respectively. This suggested that a considerable share of profits were lost due to profit inefficiencies in rice production. The results also demonstrated that access to microfinance contributes significantly to increasing PE and reducing profit-loss per hectare land. In article V, the effects of credit constraints on TE, allocative efficiency (AE) and CE were assessed while adequately controlling for sample selection bias. The confidence intervals were determined by the bootstrap method for both samples. The results indicated that differences in average efficiency scores of credit constrained and unconstrained farms were not statistically significant although the average efficiencies tended to be higher in the group of unconstrained farms. After effectively correcting for selectivity bias, household experience, number of dependents, off-farm income, farm size, access to on farm training and yearly savings were found to be the main determinants of inefficiencies. In general, the results of the study revealed the existence substantial technical, allocative, economic inefficiencies and also considerable profit inefficiencies. The results of the study suggested the need to streamline agricultural microfinance by the microfinance institutions (MFIs), donor agencies and government at all tiers. Moreover, formulating policies that ensure greater access to agricultural microfinance to the smallholder farmers on a sustainable basis in the study areas to enhance productivity and efficiency has been recommended. Key Words: Technical, allocative, economic efficiency, DEA, Non-discretionary DEA, selection bias, bootstrapping, microfinance, Bangladesh.