98 resultados para Unstructured content search
Resumo:
Human-wildlife conflicts are today an integral part of the rural development discourse. In this research, the main focus is on the spatial explanation which is not a very common approach in the reviewed literature. My research hypothesis is based on the assumption that human-wildlife conflicts occur when a wild animal crosses a perceived borderline between the nature and culture and enters into the realms of the other. The borderline between nature and culture marks a perceived division of spatial content in our senses of place. The animal subject that crosses this border becomes a subject out of place meaning that the animal is then spatially located in a space where it should not be or where it does not belong according to tradition, custom, rules, law, public opinion, prevailing discourse or some other criteria set by human beings. An appearance of a wild animal in a domesticated space brings an uncontrolled subject into that space where humans have previously commanded total control of all other natural elements. A wild animal out of place may also threaten the biosecurity of the place in question. I carried out a case study in the Liwale district in south-eastern Tanzania to test my hypothesis during June and July 2002. I also collected documents and carried out interviews in Dar es Salaam in 2003. I studied the human-wildlife conflicts in six rural villages, where a total of 183 persons participated in the village meetings. My research methods included semi-structured interviews, participatory mapping, questionnaire survey and Q- methodology. The rural communities in the Liwale district have a long-history of co-existing with wildlife and they still have traditional knowledge of wildlife management and hunting. Wildlife conservation through the establishment of game reserves during the colonial era has escalated human-wildlife conflicts in the Liwale district. This study shows that the villagers perceive some wild animals differently in their images of the African countryside than the district and regional level civil servants do. From the small scale subsistence farmers point of views, wild animals continue to challenge the separation of the wild (the forests) and the domestics spaces (the cultivated fields) by moving across the perceived borders in search of food and shelter. As a result, the farmers may loose their crops, livestock or even their own lives in the confrontations of wild animals. Human-wildlife conflicts in the Liwale district are manifold and cannot be explained simply on the basis of attitudes or perceived images of landscapes. However, the spatial explanation of these conflicts provides us some more understanding of why human-wildlife conflicts are so widely found across the world.
Resumo:
Event-based systems are seen as good candidates for supporting distributed applications in dynamic and ubiquitous environments because they support decoupled and asynchronous many-to-many information dissemination. Event systems are widely used, because asynchronous messaging provides a flexible alternative to RPC (Remote Procedure Call). They are typically implemented using an overlay network of routers. A content-based router forwards event messages based on filters that are installed by subscribers and other routers. The filters are organized into a routing table in order to forward incoming events to proper subscribers and neighbouring routers. This thesis addresses the optimization of content-based routing tables organized using the covering relation and presents novel data structures and configurations for improving local and distributed operation. Data structures are needed for organizing filters into a routing table that supports efficient matching and runtime operation. We present novel results on dynamic filter merging and the integration of filter merging with content-based routing tables. In addition, the thesis examines the cost of client mobility using different protocols and routing topologies. We also present a new matching technique called temporal subspace matching. The technique combines two new features. The first feature, temporal operation, supports notifications, or content profiles, that persist in time. The second feature, subspace matching, allows more expressive semantics, because notifications may contain intervals and be defined as subspaces of the content space. We also present an application of temporal subspace matching pertaining to metadata-based continuous collection and object tracking.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.
Resumo:
The publish/subscribe paradigm has lately received much attention. In publish/subscribe systems, a specialized event-based middleware delivers notifications of events created by producers (publishers) to consumers (subscribers) interested in that particular event. It is considered a good approach for implementing Internet-wide distributed systems as it provides full decoupling of the communicating parties in time, space and synchronization. One flavor of the paradigm is content-based publish/subscribe which allows the subscribers to express their interests very accurately. In order to implement a content-based publish/subscribe middleware in way suitable for Internet scale, its underlying architecture must be organized as a peer-to-peer network of content-based routers that take care of forwarding the event notifications to all interested subscribers. A communication infrastructure that provides such service is called a content-based network. A content-based network is an application-level overlay network. Unfortunately, the expressiveness of the content-based interaction scheme comes with a price - compiling and maintaining the content-based forwarding and routing tables is very expensive when the amount of nodes in the network is large. The routing tables are usually partially-ordered set (poset) -based data structures. In this work, we present an algorithm that aims to improve scalability in content-based networks by reducing the workload of content-based routers by offloading some of their content routing cost to clients. We also provide experimental results of the performance of the algorithm. Additionally, we give an introduction to the publish/subscribe paradigm and content-based networking and discuss alternative ways of improving scalability in content-based networks. ACM Computing Classification System (CCS): C.2.4 [Computer-Communication Networks]: Distributed Systems - Distributed applications
Resumo:
The usual task in music information retrieval (MIR) is to find occurrences of a monophonic query pattern within a music database, which can contain both monophonic and polyphonic content. The so-called query-by-humming systems are a famous instance of content-based MIR. In such a system, the user's hummed query is converted into symbolic form to perform search operations in a similarly encoded database. The symbolic representation (e.g., textual, MIDI or vector data) is typically a quantized and simplified version of the sampled audio data, yielding to faster search algorithms and space requirements that can be met in real-life situations. In this thesis, we investigate geometric approaches to MIR. We first study some musicological properties often needed in MIR algorithms, and then give a literature review on traditional (e.g., string-matching-based) MIR algorithms and novel techniques based on geometry. We also introduce some concepts from digital image processing, namely the mathematical morphology, which we will use to develop and implement four algorithms for geometric music retrieval. The symbolic representation in the case of our algorithms is a binary 2-D image. We use various morphological pre- and post-processing operations on the query and the database images to perform template matching / pattern recognition for the images. The algorithms are basically extensions to classic image correlation and hit-or-miss transformation techniques used widely in template matching applications. They aim to be a future extension to the retrieval engine of C-BRAHMS, which is a research project of the Department of Computer Science at University of Helsinki.
Resumo:
Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.
Resumo:
Three different Norway spruce cutting clones growing in three environments with different soil and climatic conditions were studied. The purpose was to follow variation in the radial growth rate, wood properties and lignin content and to modify wood lignin with a natural monolignol, coniferyl alcohol, by making use of inherent wood peroxidases. In addition, the incorporation of chlorinated anilines into lignin was studied with synthetic model compounds and synthetic lignin preparations to show whether unnatural compounds originating from pesticides could be bound in the lignin polymer. The lignin content of heartwood, sapwood and earlywood was determined by applying Fourier transform infrared (FTIR) spectroscopy and a principal component regression (PCR) technique. Wood blocks were treated with coniferyl alcohol by using a vacuum impregnation method. The effect of impregnation was assessed by FTIR and by a fungal decay test. Trees from a fertile site showed the highest growth rate and sapwood lignin content and the lowest latewood proportion, weight density and modulus of rupture (MOR). Trees from a medium fertile site had the lowest growth rate and the highest latewood proportion, weight density, modulus of elasticity (MOE) and MOR. The most rapidly growing clone showed the lowest latewood proportion, weight density, MOE and MOR. The slowest growing clone had the lowest sapwood lignin content and the highest latewood proportion, weight density, MOE and MOR. Differences between the sites and clones were small, while fairly large variation was found between the individual trees and growing seasons. The cutting clones maintained clone-dependent wood properties in the different growing sites although variation between trees was high and climatic factors affected growth. The coniferyl alcohol impregnation increased the content of different lignin-type phenolic compounds in the wood as well as wood decay resistance against a white-rot fungus, Coriolus versicolor. During the synthetic lignin preparation 3,4-dichloroaniline became bound by a benzylamine bond to β-O-4 structures in the polymer and it could not be released by mild acid hydrolysis. The natural monolignol, coniferyl alcohol, and chlorinated anilines could be incorporated into the lignin polymer in vivo and in vitro, respectively.
Resumo:
Cord blood is a well-established alternative to bone marrow and peripheral blood stem cell transplantation. To this day, over 400 000 unrelated donor cord blood units have been stored in cord blood banks worldwide. To enable successful cord blood transplantation, recent efforts have been focused on finding ways to increase the hematopoietic progenitor cell content of cord blood units. In this study, factors that may improve the selection and quality of cord blood collections for banking were identified. In 167 consecutive cord blood units collected from healthy full-term neonates and processed at a national cord blood bank, mean platelet volume (MPV) correlated with the numbers of cord blood unit hematopoietic progenitors (CD34+ cells and colony-forming units); this is a novel finding. Mean platelet volume can be thought to represent general hematopoietic activity, as newly formed platelets have been reported to be large. Stress during delivery is hypothesized to lead to the mobilization of hematopoietic progenitor cells through cytokine stimulation. Accordingly, low-normal umbilical arterial pH, thought to be associated with perinatal stress, correlated with high cord blood unit CD34+ cell and colony-forming unit numbers. The associations were closer in vaginal deliveries than in Cesarean sections. Vaginal delivery entails specific physiological changes, which may also affect the hematopoietic system. Thus, different factors may predict cord blood hematopoietic progenitor cell numbers in the two modes of delivery. Theoretical models were created to enable the use of platelet characteristics (mean platelet volume) and perinatal factors (umbilical arterial pH and placental weight) in the selection of cord blood collections with high hematopoietic progenitor cell counts. These observations could thus be implemented as a part of the evaluation of cord blood collections for banking. The quality of cord blood units has been the focus of several recent studies. However, hemostasis activation during cord blood collection is scarcely evaluated in cord blood banks. In this study, hemostasis activation was assessed with prothrombin activation fragment 1+2 (F1+2), a direct indicator of thrombin generation, and platelet factor 4 (PF4), indicating platelet activation. Altogether three sample series were collected during the set-up of the cord blood bank as well as after changes in personnel and collection equipment. The activation decreased from the first to the subsequent series, which were collected with the bank fully in operation and following international standards, and was at a level similar to that previously reported for healthy neonates. As hemostasis activation may have unwanted effects on cord blood cell contents, it should be minimized. The assessment of hemostasis activation could be implemented as a part of process control in cord blood banks. Culture assays provide information about the hematopoietic potential of the cord blood unit. In processed cord blood units prior to freezing, megakaryocytic colony growth was evaluated in semisolid cultures with a novel scoring system. Three investigators analyzed the colony assays, and the scores were highly concordant. With such scoring systems, the growth potential of various cord blood cell lineages can be assessed. In addition, erythroid cells were observed in liquid cultures of cryostored and thawed, unseparated cord blood units without exogenous erythropoietin. This was hypothesized to be due to the erythropoietic effect of thrombopoietin, endogenous erythropoietin production, and diverse cell-cell interactions in the culture. This observation underscores the complex interactions of cytokines and supporting cells in the heterogeneous cell population of the thawed cord blood unit.
Resumo:
By detecting leading protons produced in the Central Exclusive Diffractive process, p+p → p+X+p, one can measure the missing mass, and scan for possible new particle states such as the Higgs boson. This process augments - in a model independent way - the standard methods for new particle searches at the Large Hadron Collider (LHC) and will allow detailed analyses of the produced central system, such as the spin-parity properties of the Higgs boson. The exclusive central diffractive process makes possible precision studies of gluons at the LHC and complements the physics scenarios foreseen at the next e+e− linear collider. This thesis first presents the conclusions of the first systematic analysis of the expected precision measurement of the leading proton momentum and the accuracy of the reconstructed missing mass. In this initial analysis, the scattered protons are tracked along the LHC beam line and the uncertainties expected in beam transport and detection of the scattered leading protons are accounted for. The main focus of the thesis is in developing the necessary radiation hard precision detector technology for coping with the extremely demanding experimental environment of the LHC. This will be achieved by using a 3D silicon detector design, which in addition to the radiation hardness of up to 5×10^15 neutrons/cm2, offers properties such as a high signal-to- noise ratio, fast signal response to radiation and sensitivity close to the very edge of the detector. This work reports on the development of a novel semi-3D detector design that simplifies the 3D fabrication process, but conserves the necessary properties of the 3D detector design required in the LHC and in other imaging applications.