899 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining


Relevância:

40.00% 40.00%

Publicador:

Resumo:

This presentation was given at the Digital Commons Southeastern User Group conference at Winthrop University, South Carolina on June 5, 2015. The presentation discusses how the digital collections center (DCC) at Florida International University uses Digital Commons as their tool for ingesting, editing, tracking, and publishing university theses and dissertations. The basic DCC workflow is covered as well as institutional repository promotion.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Endemic zoonotic diseases remain a serious but poorly recognised problem in affected communities in developing countries. Despite the overall burden of zoonoses on human and animal health, information about their impacts in endemic settings is lacking and most of these diseases are continuously being neglected. The non-specific clinical presentation of these diseases has been identified as a major challenge in their identification (even with good laboratory diagnosis), and control. The signs and symptoms in animals and humans respectively, are easily confused with other non-zoonotic diseases, leading to widespread misdiagnosis in areas where diagnostic capacity is limited. The communities that are mostly affected by these diseases live in close proximity with their animals which they depend on for livelihood, which further complicates the understanding of the epidemiology of zoonoses. This thesis reviewed the pattern of reporting of zoonotic pathogens that cause febrile illness in malaria endemic countries, and evaluates the recognition of animal associations among other risk factors in the transmission and management of zoonoses. The findings of the review chapter were further investigated through a laboratory study of risk factors for bovine leptospirosis, and exposure patterns of livestock coxiellosis in the subsequent chapters. A review was undertaken on 840 articles that were part of a bigger review of zoonotic pathogens that cause human fever. The review process involves three main steps: filtering and reference classification, identification of abstracts that describe risk factors, and data extraction and summary analysis of data. Abstracts of the 840 references were transferred into a Microsoft excel spread sheet, where several subsets of abstracts were generated using excel filters and text searches to classify the content of each abstract. Data was then extracted and summarised to describe geographical patterns of the pathogens reported, and determine the frequency animal related risk factors were considered among studies that investigated risk factors for zoonotic pathogen transmission. Subsequently, a seroprevalence study of bovine leptospirosis in northern Tanzania was undertaken in the second chapter of this thesis. The study involved screening of serum samples, which were obtained from an abattoir survey and cross-sectional study (Bacterial Zoonoses Project), for antibodies against Leptospira serovar Hardjo. The data were analysed using generalised linear mixed models (GLMMs), to identify risk factors for cattle infection. The final chapter was the analysis of Q fever data, which were also obtained from the Bacterial Zoonoses Project, to determine exposure patterns across livestock species using generalized linear mixed models (GLMMs). Leptospira spp. (10.8%, 90/840) and Rickettsia spp. (10.7%, 86/840) were identified as the most frequently reported zoonotic pathogens that cause febrile illness, while Rabies virus (0.4%, 3/840) and Francisella spp. (0.1%, 1/840) were least reported, across malaria endemic countries. The majority of the pathogens were reported in Asia, and the frequency of reporting seems to be higher in areas where outbreaks are mostly reported. It was also observed that animal related risk factors are not often considered among other risk factors for zoonotic pathogens that cause human fever in malaria endemic countries. The seroprevalence study indicated that Leptospira serovar Hardjo is widespread in cattle population in northern Tanzania, and animal husbandry systems and age are the two most important risk factors that influence seroprevalence. Cattle in the pastoral systems and adult cattle were significantly more likely to be seropositive compared to non-pastoral and young animals respectively, while there was no significant effect of cattle breed or sex. Exposure patterns of Coxiella burnetii appear different for each livestock species. While most risk factors were identified for goats (such as animal husbandry systems, age and sex) and sheep (animal husbandry systems and sex), there were none for cattle. In addition, there was no evidence of a significant influence of mixed livestock-keeping on animal coxiellosis. Zoonotic agents that cause human fever are common in developing countries. The role of animals in the transmission of zoonotic pathogens that cause febrile illness is not fully recognised and appreciated. Since Leptospira spp. and C. burnetii are among the most frequently reported pathogens that cause human fever across malaria endemic countries, and are also prevalent in livestock population, control and preventive measures that recognise animals as source of infection would be very important especially in livestock-keeping communities where people live in close proximity with their animals.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Este trabajo se inscribe en uno de los grandes campos de los estudios organizacionales: la estrategia. La perspectiva clásica en este campo promovió la idea de que proyectarse hacia el futuro implica diseñar un plan (una serie de acciones deliberadas). Avances posteriores mostraron que la estrategia podía ser comprendida de otras formas. Sin embargo, la evolución del campo privilegió en alguna medida la mirada clásica estableciendo, por ejemplo, múltiples modelos para ‘formular’ una estrategia, pero dejando en segundo lugar la manera en la que esta puede ‘emerger’. El propósito de esta investigación es, entonces, aportar al actual nivel de comprensión respecto a las estrategias emergentes en las organizaciones. Para hacerlo, se consideró un concepto opuesto —aunque complementario— al de ‘planeación’ y, de hecho, muy cercano en su naturaleza a ese tipo de estrategias: la improvisación. Dado que este se ha nutrido de valiosos aportes del mundo de la música, se acudió al saber propio de este dominio, recurriendo al uso de ‘la metáfora’ como recurso teórico para entenderlo y alcanzar el objetivo propuesto. Los resultados muestran que 1) las estrategias deliberadas y las emergentes coexisten y se complementan, 2) la improvisación está siempre presente en el contexto organizacional, 3) existe una mayor intensidad de la improvisación en el ‘como’ de la estrategia que en el ‘qué’ y, en oposición a la idea convencional al respecto, 4) se requiere cierta preparación para poder improvisar de manera adecuada.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The main purpose of this study is to evaluate the best set of features that automatically enables the identification of argumentative sentences from unstructured text. As corpus, we use case laws from the European Court of Human Rights (ECHR). Three kinds of experiments are conducted: Basic Experiments, Multi Feature Experiments and Tree Kernel Experiments. These experiments are basically categorized according to the type of features available in the corpus. The features are extracted from the corpus and Support Vector Machine (SVM) and Random Forest are the used as Machine learning algorithms. We achieved F1 score of 0.705 for identifying the argumentative sentences which is quite promising result and can be used as the basis for a general argument-mining framework.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Poiché la nostra conoscenza collettiva continua ad essere digitalizzata e memorizzata, diventa più difficile trovare e scoprire ciò che stiamo cercando. Abbiamo bisogno di nuovi strumenti computazionali per aiutare a organizzare, rintracciare e comprendere queste vaste quantità di informazioni. I modelli di linguaggio sono potenti strumenti che possono essere impiegati per estrarre conoscenza statisticamente significativa ed interpretabile tramite apprendimento non supervisionato, testuali o nel codice sorgente. L’obiettivo di questa tesi è impiegare una metodologia di descriptive text mining, denominata POIROT, per analizzare i rapporti medici del dataset Adverse Drug Reaction (ADE). Si vogliono stabilire delle correlazioni significative che permettano di comprendere le ragioni per cui un determinato rapporto medico fornisca o meno informazioni relative a effetti collaterali dovuti all’assunzione di determinati farmaci.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite the remarkable improvements in breast cancer (BC) characterization, accurate prediction of BC clinical behavior is often still difficult to achieve. Some studies have investigated the association between the molecular subtype, namely the basal-like BC and the pattern of relapse, however only few investigated the association between relapse pattern and immunohistochemical defined triple-negative breast cancers (TNBCs). The aim of this study was to evaluate the pattern of relapse in patients with TNBC, namely the primary distant relapse site. One-hundred twenty nine (129) invasive breast carcinomas with follow-up information were classified according to the molecular subtype using immunohistochemistry for ER, PgR and Her2. The association between TNBC and distant relapse primary site was analyzed by logistic regression. Using multivariate logistic regression analysis patients with TNBC displayed only 0.09 (95% CI: 0.00-0.74; p=0.02) the odds of the non-TNBC patients of developing bone primary relapse. Regarding visceral and lymph-node relapse, no differences between in this cohort were found. Though classically regarded as aggressive tumors, TNBCs rarely development primary relapse in bone when compared to non-TNBC, a clinical relevant fact when investigating a metastasis of an occult or non-sampled primary BC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A pterosaur bone bed with at least 47 individuals (wing spans: 0.65-2.35 m) of a new species is reported from southern Brazil from an interdunal lake deposit of a Cretaceous desert, shedding new light on several biological aspects of those flying reptiles. The material represents a new pterosaur, Caiuajara dobruskii gen. et sp. nov., that is the southermost occurrence of the edentulous clade Tapejaridae (Tapejarinae, Pterodactyloidea) recovered so far. Caiuajara dobruskii differs from all other members of this clade in several cranial features, including the presence of a ventral sagittal bony expansion projected inside the nasoantorbital fenestra, which is formed by the premaxillae; and features of the lower jaw, like a marked rounded depression in the occlusal concavity of the dentary. Ontogenetic variation of Caiuajara dobruskii is mainly reflected in the size and inclination of the premaxillary crest, changing from small and inclined (∼ 115°) in juveniles to large and steep (∼ 90°) in adults. No particular ontogenetic features are observed in postcranial elements. The available information suggests that this species was gregarious, living in colonies, and most likely precocial, being able to fly at a very young age, which might have been a general trend for at least derived pterosaurs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Seasonally dry tropical plant formations (SDTF) are likely to exhibit phylogenetic clustering owing to niche conservatism driven by a strong environmental filter (water stress), but heterogeneous edaphic environments and life histories may result in heterogeneity in degree of phylogenetic clustering. We investigated phylogenetic patterns across ecological gradients related to water availability (edaphic environment and climate) in the Caatinga, a SDTF in Brazil. Caatinga is characterized by semiarid climate and three distinct edaphic environments - sedimentary, crystalline, and inselberg -representing a decreasing gradient in soil water availability. We used two measures of phylogenetic diversity: Net Relatedness Index based on the entire phylogeny among species present in a site, reflecting long-term diversification; and Nearest Taxon Index based on the tips of the phylogeny, reflecting more recent diversification. We also evaluated woody species in contrast to herbaceous species. The main climatic variable influencing phylogenetic pattern was precipitation in the driest quarter, particularly for herbaceous species, suggesting that environmental filtering related to minimal periods of precipitation is an important driver of Caatinga biodiversity, as one might expect for a SDTF. Woody species tended to show phylogenetic clustering whereas herbaceous species tended towards phylogenetic overdispersion. We also found phylogenetic clustering in two edaphic environments (sedimentary and crystalline) in contrast to phylogenetic overdispersion in the third (inselberg). We conclude that while niche conservatism is evident in phylogenetic clustering in the Caatinga, this is not a universal pattern likely due to heterogeneity in the degree of realized environmental filtering across edaphic environments. Thus, SDTF, in spite of a strong shared environmental filter, are potentially heterogeneous in phylogenetic structuring. Our results support the need for scientifically informed conservation strategies in the Caatinga and other SDTF regions that have not previously been prioritized for conservation in order to take into account this heterogeneity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Potamotrygon tatianae sp. nov., is described from Río Madre de Díos, Peru, upper Rio Madeira basin. The new species is distinguished from all congeners by a unique combination of characters, including its dorsal color pattern formed by a relatively slender, highly convoluted, beige to dark brown vermicular pattern, a single row of dorsal tail spines, and a relatively longer tail posterior to caudal stings. Potamotrygon tatianae sp. nov., occurs sympatrically with other species of Potamotrygon (P. falkneri, P. orbignyi and P. motoro). From the similar species P. falkneri, P. tatianae sp. nov., is further distinguished by the absence of circular, reniform, and oval spots, by its proportionally much longer tail, by having dorsal tail spines in one irregular row, and by features of the ventral lateral-line canal, dermal denticles and neurocranium. From P. orbignyi, the new species is distinct by lacking a reticulate pattern on dorsal disc and by the presence of two angular cartilages. From P. motoro, P. tatianae sp. nov., is further separated by the lack of ocelli formed by strong black concentric rings, by the more flattened aspect of its head and disc, and by having smaller and more numerous teeth. The discovery of a new species that so closely resembles a congeneric form in color pattern, a feature highly variable within the latter, highlights the importance of examining large series of individuals and of detailed morphological analyses in revealing the potentially highly cryptic nature of the diversity within the family.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivated by a recently proposed biologically inspired face recognition approach, we investigated the relation between human behavior and a computational model based on Fourier-Bessel (FB) spatial patterns. We measured human recognition performance of FB filtered face images using an 8-alternative forced-choice method. Test stimuli were generated by converting the images from the spatial to the FB domain, filtering the resulting coefficients with a band-pass filter, and finally taking the inverse FB transformation of the filtered coefficients. The performance of the computational models was tested using a simulation of the psychophysical experiment. In the FB model, face images were first filtered by simulated V1- type neurons and later analyzed globally for their content of FB components. In general, there was a higher human contrast sensitivity to radially than to angularly filtered images, but both functions peaked at the 11.3-16 frequency interval. The FB-based model presented similar behavior with regard to peak position and relative sensitivity, but had a wider frequency band width and a narrower response range. The response pattern of two alternative models, based on local FB analysis and on raw luminance, strongly diverged from the human behavior patterns. These results suggest that human performance can be constrained by the type of information conveyed by polar patterns, and consequently that humans might use FB-like spatial patterns in face processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work describes the seasonal and diurnal variations of downward longwave atmospheric irradiance (LW) at the surface in Sao Paulo, Brazil, using 5-min-averaged values of LW, air temperature, relative humidity, and solar radiation observed continuously and simultaneously from 1997 to 2006 on a micrometeorological platform, located at the top of a 4-story building. An objective procedure, including 2-step filtering and dome emission effect correction, was used to evaluate the quality of the 9-yr-long LW dataset. The comparison between LW values observed and yielded by the Surface Radiation Budget project shows spatial and temporal agreement, indicating that monthly and annual average values of LW observed in one point of Sao Paulo can be used as representative of the entire metropolitan region of Sao Paulo. The maximum monthly averaged value of the LW is observed during summer (389 +/- 14 W m(-2): January), and the minimum is observed during winter (332 +/- 12 W m(-2); July). The effective emissivity follows the LW and shows a maximum in summer (0.907 +/- 0.032; January) and a minimum in winter (0.818 +/- 0.029; June). The mean cloud effect, identified objectively by comparing the monthly averaged values of the LW during clear-sky days and all-sky conditions, intensified the monthly average LW by about 32.0 +/- 3.5 W m(-2) and the atmospheric effective emissivity by about 0.088 +/- 0.024. In August, the driest month of the year in Sao Paulo, the diurnal evolution of the LW shows a minimum (325 +/- 11 W m(-2)) at 0900 LT and a maximum (345 12 W m-2) at 1800 LT, which lags behind (by 4 h) the maximum diurnal variation of the screen temperature. The diurnal evolution of effective emissivity shows a minimum (0.781 +/- 0.027) during daytime and a maximum (0.842 +/- 0.030) during nighttime. The diurnal evolution of all-sky condition and clear-sky day differences in the effective emissivity remain relatively constant (7% +/- 1%), indicating that clouds do not change the emissivity diurnal pattern. The relationship between effective emissivity and screen air temperature and between effective emissivity and water vapor is complex. During the night, when the planetary boundary layer is shallower, the effective emissivity can be estimated by screen parameters. During the day, the relationship between effective emissivity and screen parameters varies from place to place and depends on the planetary boundary layer process. Because the empirical expressions do not contain enough information about the diurnal variation of the vertical stratification of air temperature and moisture in Sao Paulo, they are likely to fail in reproducing the diurnal variation of the surface emissivity. The most accurate way to estimate the LW for clear-sky conditions in Sao Paulo is to use an expression derived from a purely empirical approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How information transmission processes between individuals are shaped by natural selection is a key question for the understanding of the evolution of acoustic communication systems. Environmental acoustics predict that signal structure will differ depending on general features of the habitat. Social features, like individual spacing and mating behavior, may also be important for the design of communication. Here we present the first experimental study investigating how a tropical rainforest bird, the white-browed warbler Basileuterus leucoblepharus, extracts various information from a received song: species-specific identity, individual identity and location of the sender. Species-specific information is encoded in a resistant acoustic feature and is thus a public signal helping males to reach a wide audience. Conversely, individual identity is supported by song features susceptible to propagation: this private signal is reserved for neighbors. Finally, the receivers can locate the singers by using propagation-induced song modifications. Thus, this communication system is well matched to the acoustic constraints of the rain forest and to the ecological requirements of the species. Our results emphasize that, in a constraining acoustic environment, the efficiency of a sound communication system results from a coding/decoding process particularly well tuned to the acoustic properties of this environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The parallel mutation-selection evolutionary dynamics, in which mutation and replication are independent events, is solved exactly in the case that the Malthusian fitnesses associated to the genomes are described by the random energy model (REM) and by a ferromagnetic version of the REM. The solution method uses the mapping of the evolutionary dynamics into a quantum Ising chain in a transverse field and the Suzuki-Trotter formalism to calculate the transition probabilities between configurations at different times. We find that in the case of the REM landscape the dynamics can exhibit three distinct regimes: pure diffusion or stasis for short times, depending on the fitness of the initial configuration, and a spin-glass regime for large times. The dynamic transition between these dynamical regimes is marked by discontinuities in the mean-fitness as well as in the overlap with the initial reference sequence. The relaxation to equilibrium is described by an inverse time decay. In the ferromagnetic REM, we find in addition to these three regimes, a ferromagnetic regime where the overlap and the mean-fitness are frozen. In this case, the system relaxes to equilibrium in a finite time. The relevance of our results to information processing aspects of evolution is discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e. g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. Results: The intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes ( targets or predictors) is also implemented in the system. Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.