66 resultados para Processing wikipedia data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigated the on-line processing of unaccusative and unergative sentences in a group of eight Greek-speaking individuals diagnosed with Broca aphasia and a group of language-unimpaired subjects used as the baseline. The processing of unaccusativity refers to the reactivation of the postverbal trace by retrieving the mnemonic representation of the verb’s syntactically defined antecedent provided in the early part of the sentence. Our results demonstrate that the Broca group showed selective reactivation of the antecedent for the unaccusatives. We consider several interpretations for our data, including explanations focusing on the transitivization properties of nonactive and active voice-alternating unaccusatives, the costly procedure claimed to underlie the parsing of active nonvoice-alternating unaccusatives, and the animacy of the antecedent modulating the syntactic choices of the patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This chapter introduces the latest practices and technologies in the interactive interpretation of environmental data. With environmental data becoming ever larger, more diverse and more complex, there is a need for a new generation of tools that provides new capabilities over and above those of the standard workhorses of science. These new tools aid the scientist in discovering interesting new features (and also problems) in large datasets by allowing the data to be explored interactively using simple, intuitive graphical tools. In this way, new discoveries are made that are commonly missed by automated batch data processing. This chapter discusses the characteristics of environmental science data, common current practice in data analysis and the supporting tools and infrastructure. New approaches are introduced and illustrated from the points of view of both the end user and the underlying technology. We conclude by speculating as to future developments in the field and what must be achieved to fulfil this vision.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We report findings from psycholinguistic experiments investigating the detailed timing of processing morphologically complex words by proficient adult second (L2) language learners of English in comparison to adult native (L1) speakers of English. The first study employed the masked priming technique to investigate -ed forms with a group of advanced Arabic-speaking learners of English. The results replicate previously found L1/L2 differences in morphological priming, even though in the present experiment an extra temporal delay was offered after the presentation of the prime words. The second study examined the timing of constraints against inflected forms inside derived words in English using the eye-movement monitoring technique and an additional acceptability judgment task with highly advanced Dutch L2 learners of English in comparison to adult L1 English controls. Whilst offline the L2 learners performed native-like, the eye-movement data showed that their online processing was not affected by the morphological constraint against regular plurals inside derived words in the same way as in native speakers. Taken together, these findings indicate that L2 learners are not just slower than native speakers in processing morphologically complex words, but that the L2 comprehension system employs real-time grammatical analysis (in this case, morphological information) less than the L1 system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we review the state-of-the-art techniques in mining data streams for mobile and ubiquitous environments. We start the review with a concise background of data stream processing, presenting the building blocks for mining data streams. In a wide range of applications, data streams are required to be processed on small ubiquitous devices like smartphones and sensor devices. Mobile and ubiquitous data mining target these applications with tailored techniques and approaches addressing scarcity of resources and mobility issues. Two categories can be identified for mobile and ubiquitous mining of streaming data: single-node and distributed. This survey will cover both categories. Mining mobile and ubiquitous data require algorithms with the ability to monitor and adapt the working conditions to the available computational resources. We identify the key characteristics of these algorithms and present illustrative applications. Distributed data stream mining in the mobile environment is then discussed, presenting the Pocket Data Mining framework. Mobility of users stimulates the adoption of context-awareness in this area of research. Context-awareness and collaboration are discussed in the Collaborative Data Stream Mining, where agents share knowledge to learn adaptive accurate models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The ability to match individual patients to tailored treatments has the potential to greatly improve outcomes for individuals suffering from major depression. In particular, while the vast majority of antidepressant treatments affect either serotonin or noradrenaline or a combination of these two neurotransmitters, it is not known whether there are particular patients or symptom profiles which respond preferentially to the potentiation of serotonin over noradrenaline or vice versa. Experimental medicine models suggest that the primary mode of action of these treatments may be to remediate negative biases in emotional processing. Such models may provide a useful framework for interrogating the specific actions of antidepressants. Here, we therefore review evidence from studies examining the effects of drugs which potentiate serotonin, noradrenaline or a combination of both neurotransmitters on emotional processing. These results suggest that antidepressants targeting serotonin and noradrenaline may have some specific actions on emotion and reward processing which could be used to improve tailoring of treatment or to understand the effects of dual-reuptake inhibition. Specifically, serotonin may be particularly important in alleviating distress symptoms, while noradrenaline may be especially relevant to anhedonia. The data reviewed here also suggest that noradrenergic-based treatments may have earlier effects on emotional memory that those which affect serotonin.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The WFDEI meteorological forcing data set has been generated using the same methodology as the widely used WATCH Forcing Data (WFD) by making use of the ERA-Interim reanalysis data. We discuss the specifics of how changes in the reanalysis and processing have led to improvement over the WFD. We attribute improvements in precipitation and wind speed to the latest reanalysis basis data and improved downward shortwave fluxes to the changes in the aerosol corrections. Covering 1979–2012, the WFDEI will allow more thorough comparisons of hydrological and Earth System model outputs with hydrologically and phenologically relevant satellite products than using the WFD.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, there has been an increasing interest in the adoption of emerging ubiquitous sensor network (USN) technologies for instrumentation within a variety of sustainability systems. USN is emerging as a sensing paradigm that is being newly considered by the sustainability management field as an alternative to traditional tethered monitoring systems. Researchers have been discovering that USN is an exciting technology that should not be viewed simply as a substitute for traditional tethered monitoring systems. In this study, we investigate how a movement monitoring measurement system of a complex building is developed as a research environment for USN and related decision-supportive technologies. To address the apparent danger of building movement, agent-mediated communication concepts have been designed to autonomously manage large volumes of exchanged information. In this study, we additionally detail the design of the proposed system, including its principles, data processing algorithms, system architecture, and user interface specifics. Results of the test and case study demonstrate the effectiveness of the USN-based data acquisition system for real-time monitoring of movement operations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A Canopy Height Profile (CHP) procedure presented in Harding et al. (2001) for large footprint LiDAR data was tested in a closed canopy environment as a way of extracting vertical foliage profiles from LiDAR raw-waveform. In this study, an adaptation of this method to small-footprint data has been shown, tested and validated in an Australian sparse canopy forest at plot- and site-level. Further, the methodology itself has been enhanced by implementing a dataset-adjusted reflectance ratio calculation according to Armston et al. (2013) in the processing chain, and tested against a fixed ratio of 0.5 estimated for the laser wavelength of 1550nm. As a by-product of the methodology, effective leaf area index (LAIe) estimates were derived and compared to hemispherical photography-derived values. To assess the influence of LiDAR aggregation area size on the estimates in a sparse canopy environment, LiDAR CHPs and LAIes were generated by aggregating waveforms to plot- and site-level footprints (plot/site-aggregated) as well as in 5m grids (grid-processed). LiDAR profiles were then compared to leaf biomass field profiles generated based on field tree measurements. The correlation between field and LiDAR profiles was very high, with a mean R2 of 0.75 at plot-level and 0.86 at site-level for 55 plots and the corresponding 11 sites. Gridding had almost no impact on the correlation between LiDAR and field profiles (only marginally improvement), nor did the dataset-adjusted reflectance ratio. However, gridding and the dataset-adjusted reflectance ratio were found to improve the correlation between raw-waveform LiDAR and hemispherical photography LAIe estimates, yielding the highest correlations of 0.61 at plot-level and of 0.83 at site-level. This proved the validity of the approach and superiority of dataset-adjusted reflectance ratio of Armston et al. (2013) over a fixed ratio of 0.5 for LAIe estimation, as well as showed the adequacy of small-footprint LiDAR data for LAIe estimation in discontinuous canopy forests.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Environment monitoring applications using Wireless Sensor Networks (WSNs) have had a lot of attention in recent years. In much of this research tasks like sensor data processing, environment states and events decision making and emergency message sending are done by a remote server. A proposed cross layer protocol for two different applications where, reliability for delivered data, delay and life time of the network need to be considered, has been simulated and the results are presented in this paper. A WSN designed for the proposed applications needs efficient MAC and routing protocols to provide a guarantee for the reliability of the data delivered from source nodes to the sink. A cross layer based on the design given in [1] has been extended and simulated for the proposed applications, with new features, such as routes discovery algorithms added. Simulation results show that the proposed cross layer based protocol can conserve energy for nodes and provide the required performance such as life time of the network, delay and reliability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study has compared preliminary estimates of effective leaf area index (LAI) derived from fish-eye lens photographs to those estimated from airborne full-waveform small-footprint LiDAR data for a forest dataset in Australia. The full-waveform data was decomposed and optimized using a trust-region-reflective algorithm to extract denser point clouds. LAI LiDAR estimates were derived in two ways (1) from the probability of discrete pulses reaching the ground without being intercepted (point method) and (2) from raw waveform canopy height profile processing adapted to small-footprint laser altimetry (waveform method) accounting for reflectance ratio between vegetation and ground. The best results, that matched hemispherical photography estimates, were achieved for the waveform method with a study area-adjusted reflectance ratio of 0.4 (RMSE of 0.15 and 0.03 at plot and site level, respectively). The point method generally overestimated, whereas the waveform method with an arbitrary reflectance ratio of 0.5 underestimated the fish-eye lens LAI estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human brain imaging techniques, such as Magnetic Resonance Imaging (MRI) or Diffusion Tensor Imaging (DTI), have been established as scientific and diagnostic tools and their adoption is growing in popularity. Statistical methods, machine learning and data mining algorithms have successfully been adopted to extract predictive and descriptive models from neuroimage data. However, the knowledge discovery process typically requires also the adoption of pre-processing, post-processing and visualisation techniques in complex data workflows. Currently, a main problem for the integrated preprocessing and mining of MRI data is the lack of comprehensive platforms able to avoid the manual invocation of preprocessing and mining tools, that yields to an error-prone and inefficient process. In this work we present K-Surfer, a novel plug-in of the Konstanz Information Miner (KNIME) workbench, that automatizes the preprocessing of brain images and leverages the mining capabilities of KNIME in an integrated way. K-Surfer supports the importing, filtering, merging and pre-processing of neuroimage data from FreeSurfer, a tool for human brain MRI feature extraction and interpretation. K-Surfer automatizes the steps for importing FreeSurfer data, reducing time costs, eliminating human errors and enabling the design of complex analytics workflow for neuroimage data by leveraging the rich functionalities available in the KNIME workbench.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.