963 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Classifying novel terrain or objects front sparse, complex data may require the resolution of conflicting information from sensors working at different times, locations, and scales, and from sources with different goals and situations. Information fusion methods can help resolve inconsistencies, as when evidence variously suggests that an object's class is car, truck, or airplane. The methods described here consider a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an object's class is car, vehicle, and man-made. Underlying relationships among objects are assumed to be unknown to the automated system or the human user. The ARTMAP information fusion system used distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierarchical knowledge structures. The system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Scotia Sea has been a focus of biological- and physical oceanographic study since the Discovery expeditions in the early 1900s. It is a physically energetic region with some of the highest levels of productivity in the Southern Ocean. It is also a region within which there have been greater than average levels of change in upper water column temperature. We describe the results of three cruises transecting the central Scotia Sea from south to north in consecutive years and covering spring, summer and autumn periods. We also report on some community level syntheses using both current-day and historical data from this region. A wide range of parameters were measured during the field campaigns, covering the physical oceanography of the region, air–sea CO2 fluxes, macro- and micronutrient concentrations, the composition and biomass of the nano-, micro- and mesoplankton communities, and the distribution and biomass of Antarctic krill and mesopelagic fish. Process studies examined the effect of iron-stress on the physiology of primary producers, reproduction and egestion in Antarctic krill and the transfer of stable isotopes between trophic layers, from primary consumers up to birds and seals. Community level syntheses included an examination of the biomass-spectra, food-web modelling, spatial analysis of multiple trophic layers and historical species distributions. The spatial analyses in particular identified two distinct community types: a northern warmer water community and a southern cold community, their boundary being broadly consistent with the position of the Southern Antarctic Circumpolar Current Front (SACCF). Temperature and ice cover appeared to be the dominant, over-riding factors in driving this pattern. Extensive phytoplankton blooms were a major feature of the surveys, and were persistent in areas such as South Georgia. In situ and bioassay measurements emphasised the important role of iron inputs as facilitators of these blooms. Based on seasonal DIC deficits, the South Georgia bloom was found to contain the strongest seasonal carbon uptake in the ice-free zone of the Southern Ocean. The surveys also encountered low-production, iron-limited regions, a situation more typical of the wider Southern Ocean. The response of primary and secondary consumers to spatial and temporal heterogeneity in production was complex. Many of the life-cycles of small pelagic organisms showed a close coupling to the seasonal cycle of food availability. For instance, Antarctic krill showed a dependence on early, non-ice-associated blooms to facilitate early reproduction. Strategies to buffer against environmental variability were also examined, such as the prevalence of multiyear life-cycles and variability in energy storage levels. Such traits were seen to influence the way in which Scotia Sea communities were structured, with biomass levels in the larger size classes being higher than in other ocean regions. Seasonal development also altered trophic function, with the trophic level of higher predators increasing through the course of the year as additional predator-prey interactions emerged in the lower trophic levels. Finally, our studies re-emphasised the role that the simple phytoplankton-krill-higher predator food chain plays in this Southern Ocean region, particularly south of the SACCF. To the north, alternative food chains, such as those involving copepods, macrozooplankton and mesopelagic fish, were increasingly important. Continued ocean warming in this region is likely to increase the prevalence of such alternative such food chains with Antarctic krill predicted to move southwards.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The importance and use of text extraction from camera based coloured scene images is rapidly increasing with time. Text within a camera grabbed image can contain a huge amount of meta data about that scene. Such meta data can be useful for identification, indexing and retrieval purposes. While the segmentation and recognition of text from document images is quite successful, detection of coloured scene text is a new challenge for all camera based images. Common problems for text extraction from camera based images are the lack of prior knowledge of any kind of text features such as colour, font, size and orientation as well as the location of the probable text regions. In this paper, we document the development of a fully automatic and extremely robust text segmentation technique that can be used for any type of camera grabbed frame be it single image or video. A new algorithm is proposed which can overcome the current problems of text segmentation. The algorithm exploits text appearance in terms of colour and spatial distribution. When the new text extraction technique was tested on a variety of camera based images it was found to out perform existing techniques (or something similar). The proposed technique also overcomes any problems that can arise due to an unconstraint complex background. The novelty in the works arises from the fact that this is the first time that colour and spatial information are used simultaneously for the purpose of text extraction.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the last decade, data mining has emerged as one of the most dynamic and lively areas in information technology. Although many algorithms and techniques for data mining have been proposed, they either focus on domain independent techniques or on very specific domain problems. A general requirement in bridging the gap between academia and business is to cater to general domain-related issues surrounding real-life applications, such as constraints, organizational factors, domain expert knowledge, domain adaption, and operational knowledge. Unfortunately, these either have not been addressed, or have not been sufficiently addressed, in current data mining research and development.Domain-Driven Data Mining (D3M) aims to develop general principles, methodologies, and techniques for modeling and merging comprehensive domain-related factors and synthesized ubiquitous intelligence surrounding problem domains with the data mining process, and discovering knowledge to support business decision-making. This paper aims to report original, cutting-edge, and state-of-the-art progress in D3M. It covers theoretical and applied contributions aiming to: 1) propose next-generation data mining frameworks and processes for actionable knowledge discovery, 2) investigate effective (automated, human and machine-centered and/or human-machined-co-operated) principles and approaches for acquiring, representing, modelling, and engaging ubiquitous intelligence in real-world data mining, and 3) develop workable and operational systems balancing technical significance and applications concerns, and converting and delivering actionable knowledge into operational applications rules to seamlessly engage application processes and systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study investigates face recognition with partial occlusion, illumination variation and their combination, assuming no prior information about the mismatch, and limited training data for each person. The authors extend their previous posterior union model (PUM) to give a new method capable of dealing with all these problems. PUM is an approach for selecting the optimal local image features for recognition to improve robustness to partial occlusion. The extension is in two stages. First, authors extend PUM from a probability-based formulation to a similarity-based formulation, so that it operates with as little as one single training sample to offer robustness to partial occlusion. Second, they extend this new formulation to make it robust to illumination variation, and to combined illumination variation and partial occlusion, by a novel combination of multicondition relighting and optimal feature selection. To evaluate the new methods, a number of databases with various simulated and realistic occlusion/illumination mismatches have been used. The results have demonstrated the improved robustness of the new methods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A general approach to information correction and fusion for belief functions is proposed, where not only may the information items be irrelevant, but sources may lie as well. We introduce a new correction scheme, which takes into account uncertain metaknowledge on the source’s relevance and truthfulness and that generalizes Shafer’s discounting operation. We then show how to reinterpret all connectives of Boolean logic in terms of source behavior assumptions with respect to relevance and truthfulness. We are led to generalize the unnormalized Dempster’s rule to all Boolean connectives, while taking into account the uncertainties pertaining to assumptions concerning the behavior of sources. Eventually, we further extend this approach to an even more general setting, where source behavior assumptions do not have to be restricted to relevance and truthfulness.We also establish the commutativity property between correction and fusion processes, when the behaviors of the sources are independent.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Many plain text information hiding techniques demand deep semantic processing, and so suffer in reliability. In contrast, syntactic processing is a more mature and reliable technology. Assuming a perfect parser, this paper evaluates a set of automated and reversible syntactic transforms that can hide information in plain text without changing the meaning or style of a document. A large representative collection of newspaper text is fed through a prototype system. In contrast to previous work, the output is subjected to human testing to verify that the text has not been significantly compromised by the information hiding procedure, yielding a success rate of 96% and bandwidth of 0.3 bits per sentence. © 2007 SPIE-IS&T.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Modern cancer research on prognostic and predictive biomarkers demands the integration of established and emerging high-throughput technologies. However, these data are meaningless unless carefully integrated with patient clinical outcome and epidemiological information. Integrated datasets hold the key to discovering new biomarkers and therapeutic targets in cancer. We have developed a novel approach and set of methods for integrating and interrogating phenomic, genomic and clinical data sets to facilitate cancer biomarker discovery and patient stratification. Applied to a known paradigm, the biological and clinical relevance of TP53, PICan was able to recapitulate the known biomarker status and prognostic significance at a DNA, RNA and protein levels.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With over 50 billion downloads and more than 1.3 million apps in Google’s official market, Android has continued to gain popularity amongst smartphone users worldwide. At the same time there has been a rise in malware targeting the platform, with more recent strains employing highly sophisticated detection avoidance techniques. As traditional signature based methods become less potent in detecting unknown malware, alternatives are needed for timely zero-day discovery. Thus this paper proposes an approach that utilizes ensemble learning for Android malware detection. It combines advantages of static analysis with the efficiency and performance of ensemble machine learning to improve Android malware detection accuracy. The machine learning models are built using a large repository of malware samples and benign apps from a leading antivirus vendor. Experimental results and analysis presented shows that the proposed method which uses a large feature space to leverage the power of ensemble learning is capable of 97.3 % to 99% detection accuracy with very low false positive rates.