896 resultados para computation- and data-intensive applications
Resumo:
The paper presents in brief the “2nd Generation Open Access Infrastructure for Research in Europe” project (http://www.openaire.eu/) and what is done in Bulgaria during the last year in the area of open access to scientific information and data.
Resumo:
This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process. ACM Computing Classification System (1998): H.2.8, H.3.3.
Resumo:
This work was supported in part by the EU „2nd Generation Open Access Infrastructure for Research in Europe" (OpenAIRE+). The autumn training school Development and Promotion of Open Access to Scientific Information and Research is organized in the frame of the Fourth International Conference on Digital Presentation and Preservation of Cultural and Scientific Heritage—DiPP2014 (September 18–21, 2014, Veliko Tarnovo, Bulgaria, http://dipp2014.math.bas.bg/), organized under the UNESCO patronage. The main organiser is the Institute of Mathematics and Informatics, Bulgarian Academy of Sciences with the support of EU project FOSTER (http://www.fosteropenscience.eu/) and the P. R. Slaveykov Regional Public Library in Veliko Tarnovo, Bulgaria.
Resumo:
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.
Resumo:
For wireless power transfer (WPT) systems, communication between the primary side and the pickup side is a challenge because of the large air gap and magnetic interferences. A novel method, which integrates bidirectional data communication into a high-power WPT system, is proposed in this paper. The power and data transfer share the same inductive link between coreless coils. Power/data frequency division multiplexing technique is applied, and the power and data are transmitted by employing different frequency carriers and controlled independently. The circuit model of the multiband system is provided to analyze the transmission gain of the communication channel, as well as the power delivery performance. The crosstalk interference between two carriers is discussed. In addition, the signal-to-noise ratios of the channels are also estimated, which gives a guideline for the design of mod/demod circuits. Finally, a 500-W WPT prototype has been built to demonstrate the effectiveness of the proposed WPT system.
Resumo:
Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier's feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space
Resumo:
We inscribe FBGs in all cores of four core fiber simultaneously and investigate their thermal, strain and bending (both direction and magnitude) responses. The influence of fiber core distance on bending sensitivity is also discussed. © 2015 OSA.
Resumo:
We have UV-inscribed fiber Bragg gratings (FBGs), long-period gratings (LPGs), and tilted fiber gratings (TFGs) into mid-IR 2μm range using three common optical fiber grating fabrication techniques (two-beam holographic, phase mask, and point-by-point). The fabricated FBGs have been evaluated for thermal and strain response. It has been revealed that the FBG devices with responses in mid-IR range are much more sensitive to temperature than that in near-IR range. To explore the unique cladding mode coupling function, we have investigated the thermal and refractive index sensitivities of LPGs and identified that the coupled cladding modes in mid-IR range are also much more sensitive to temperature and surrounding medium refractive index change. The 45° tilted fiber gratings (45°-TFGs) as polarizing devices in mid-IR have been investigated for their polarization extinction characteristics. As efficient reflection filters and in-cavity polarizers, the mid-IR FBGs and 45°-TFGs have been employed in fiber laser cavity to realize multi-wavelength 2 μm Tm-doped CW and mode locked fiber lasers, respectively.
Resumo:
Groundwater systems of different densities are often mathematically modeled to understand and predict environmental behavior such as seawater intrusion or submarine groundwater discharge. Additional data collection may be justified if it will cost-effectively aid in reducing the uncertainty of a model's prediction. The collection of salinity, as well as, temperature data could aid in reducing predictive uncertainty in a variable-density model. However, before numerical models can be created, rigorous testing of the modeling code needs to be completed. This research documents the benchmark testing of a new modeling code, SEAWAT Version 4. The benchmark problems include various combinations of density-dependent flow resulting from variations in concentration and temperature. The verified code, SEAWAT, was then applied to two different hydrological analyses to explore the capacity of a variable-density model to guide data collection. ^ The first analysis tested a linear method to guide data collection by quantifying the contribution of different data types and locations toward reducing predictive uncertainty in a nonlinear variable-density flow and transport model. The relative contributions of temperature and concentration measurements, at different locations within a simulated carbonate platform, for predicting movement of the saltwater interface were assessed. Results from the method showed that concentration data had greater worth than temperature data in reducing predictive uncertainty in this case. Results also indicated that a linear method could be used to quantify data worth in a nonlinear model. ^ The second hydrological analysis utilized a model to identify the transient response of the salinity, temperature, age, and amount of submarine groundwater discharge to changes in tidal ocean stage, seasonal temperature variations, and different types of geology. The model was compared to multiple kinds of data to (1) calibrate and verify the model, and (2) explore the potential for the model to be used to guide the collection of data using techniques such as electromagnetic resistivity, thermal imagery, and seepage meters. Results indicated that the model can be used to give insight to submarine groundwater discharge and be used to guide data collection. ^
Resumo:
This paper is based on the novel use of a very high fidelity decimation filter chain for Electrocardiogram (ECG) signal acquisition and data conversion. The multiplier-free and multi-stage structure of the proposed filters lower the power dissipation while minimizing the circuit area which are crucial design constraints to the wireless noninvasive wearable health monitoring products due to the scarce operational resources in their electronic implementation. The decimation ratio of the presented filter is 128, working in tandem with a 1-bit 3rd order Sigma Delta (ΣΔ) modulator which achieves 0.04 dB passband ripples and -74 dB stopband attenuation. The work reported here investigates the non-linear phase effects of the proposed decimation filters on the ECG signal by carrying out a comparative study after phase correction. It concludes that the enhanced phase linearity is not crucial for ECG acquisition and data conversion applications since the signal distortion of the acquired signal, due to phase non-linearity, is insignificant for both original and phase compensated filters. To the best of the authors’ knowledge, being free of signal distortion is essential as this might lead to misdiagnosis as stated in the state of the art. This article demonstrates that with their minimal power consumption and minimal signal distortion features, the proposed decimation filters can effectively be employed in biosignal data processing units.
Resumo:
La spectrométrie de masse mesure la masse des ions selon leur rapport masse sur charge. Cette technique est employée dans plusieurs domaines et peut analyser des mélanges complexes. L’imagerie par spectrométrie de masse (Imaging Mass Spectrometry en anglais, IMS), une branche de la spectrométrie de masse, permet l’analyse des ions sur une surface, tout en conservant l’organisation spatiale des ions détectés. Jusqu’à présent, les échantillons les plus étudiés en IMS sont des sections tissulaires végétales ou animales. Parmi les molécules couramment analysées par l’IMS, les lipides ont suscité beaucoup d'intérêt. Les lipides sont impliqués dans les maladies et le fonctionnement normal des cellules; ils forment la membrane cellulaire et ont plusieurs rôles, comme celui de réguler des événements cellulaires. Considérant l’implication des lipides dans la biologie et la capacité du MALDI IMS à les analyser, nous avons développé des stratégies analytiques pour la manipulation des échantillons et l’analyse de larges ensembles de données lipidiques. La dégradation des lipides est très importante dans l’industrie alimentaire. De la même façon, les lipides des sections tissulaires risquent de se dégrader. Leurs produits de dégradation peuvent donc introduire des artefacts dans l’analyse IMS ainsi que la perte d’espèces lipidiques pouvant nuire à la précision des mesures d’abondance. Puisque les lipides oxydés sont aussi des médiateurs importants dans le développement de plusieurs maladies, leur réelle préservation devient donc critique. Dans les études multi-institutionnelles où les échantillons sont souvent transportés d’un emplacement à l’autre, des protocoles adaptés et validés, et des mesures de dégradation sont nécessaires. Nos principaux résultats sont les suivants : un accroissement en fonction du temps des phospholipides oxydés et des lysophospholipides dans des conditions ambiantes, une diminution de la présence des lipides ayant des acides gras insaturés et un effet inhibitoire sur ses phénomènes de la conservation des sections au froid sous N2. A température et atmosphère ambiantes, les phospholipides sont oxydés sur une échelle de temps typique d’une préparation IMS normale (~30 minutes). Les phospholipides sont aussi décomposés en lysophospholipides sur une échelle de temps de plusieurs jours. La validation d’une méthode de manipulation d’échantillon est d’autant plus importante lorsqu’il s’agit d’analyser un plus grand nombre d’échantillons. L’athérosclérose est une maladie cardiovasculaire induite par l’accumulation de matériel cellulaire sur la paroi artérielle. Puisque l’athérosclérose est un phénomène en trois dimension (3D), l'IMS 3D en série devient donc utile, d'une part, car elle a la capacité à localiser les molécules sur la longueur totale d’une plaque athéromateuse et, d'autre part, car elle peut identifier des mécanismes moléculaires du développement ou de la rupture des plaques. l'IMS 3D en série fait face à certains défis spécifiques, dont beaucoup se rapportent simplement à la reconstruction en 3D et à l’interprétation de la reconstruction moléculaire en temps réel. En tenant compte de ces objectifs et en utilisant l’IMS des lipides pour l’étude des plaques d’athérosclérose d’une carotide humaine et d’un modèle murin d’athérosclérose, nous avons élaboré des méthodes «open-source» pour la reconstruction des données de l’IMS en 3D. Notre méthodologie fournit un moyen d’obtenir des visualisations de haute qualité et démontre une stratégie pour l’interprétation rapide des données de l’IMS 3D par la segmentation multivariée. L’analyse d’aortes d’un modèle murin a été le point de départ pour le développement des méthodes car ce sont des échantillons mieux contrôlés. En corrélant les données acquises en mode d’ionisation positive et négative, l’IMS en 3D a permis de démontrer une accumulation des phospholipides dans les sinus aortiques. De plus, l’IMS par AgLDI a mis en évidence une localisation différentielle des acides gras libres, du cholestérol, des esters du cholestérol et des triglycérides. La segmentation multivariée des signaux lipidiques suite à l’analyse par IMS d’une carotide humaine démontre une histologie moléculaire corrélée avec le degré de sténose de l’artère. Ces recherches aident à mieux comprendre la complexité biologique de l’athérosclérose et peuvent possiblement prédire le développement de certains cas cliniques. La métastase au foie du cancer colorectal (Colorectal cancer liver metastasis en anglais, CRCLM) est la maladie métastatique du cancer colorectal primaire, un des cancers le plus fréquent au monde. L’évaluation et le pronostic des tumeurs CRCLM sont effectués avec l’histopathologie avec une marge d’erreur. Nous avons utilisé l’IMS des lipides pour identifier les compartiments histologiques du CRCLM et extraire leurs signatures lipidiques. En exploitant ces signatures moléculaires, nous avons pu déterminer un score histopathologique quantitatif et objectif et qui corrèle avec le pronostic. De plus, par la dissection des signatures lipidiques, nous avons identifié des espèces lipidiques individuelles qui sont discriminants des différentes histologies du CRCLM et qui peuvent potentiellement être utilisées comme des biomarqueurs pour la détermination de la réponse à la thérapie. Plus spécifiquement, nous avons trouvé une série de plasmalogènes et sphingolipides qui permettent de distinguer deux différents types de nécrose (infarct-like necrosis et usual necrosis en anglais, ILN et UN, respectivement). L’ILN est associé avec la réponse aux traitements chimiothérapiques, alors que l’UN est associé au fonctionnement normal de la tumeur.
Resumo:
Con la crescita in complessità delle infrastrutture IT e la pervasività degli scenari di Internet of Things (IoT) emerge il bisogno di nuovi modelli computazionali basati su entità autonome capaci di portare a termine obiettivi di alto livello interagendo tra loro grazie al supporto di infrastrutture come il Fog Computing, per la vicinanza alle sorgenti dei dati, e del Cloud Computing per offrire servizi analitici complessi di back-end in grado di fornire risultati per milioni di utenti. Questi nuovi scenarii portano a ripensare il modo in cui il software viene progettato e sviluppato in una prospettiva agile. Le attività dei team di sviluppatori (Dev) dovrebbero essere strettamente legate alle attività dei team che supportano il Cloud (Ops) secondo nuove metodologie oggi note come DevOps. Tuttavia, data la mancanza di astrazioni adeguata a livello di linguaggio di programmazione, gli sviluppatori IoT sono spesso indotti a seguire approcci di sviluppo bottom-up che spesso risulta non adeguato ad affrontare la compessità delle applicazione del settore e l'eterogeneità dei compomenti software che le formano. Poichè le applicazioni monolitiche del passato appaiono difficilmente scalabili e gestibili in un ambiente Cloud con molteplici utenti, molti ritengono necessaria l'adozione di un nuovo stile architetturale, in cui un'applicazione dovrebbe essere vista come una composizione di micro-servizi, ciascuno dedicato a uno specifica funzionalità applicativa e ciascuno sotto la responsabilità di un piccolo team di sviluppatori, dall'analisi del problema al deployment e al management. Poichè al momento non si è ancora giunti a una definizione univoca e condivisa dei microservices e di altri concetti che emergono da IoT e dal Cloud, nè tantomento alla definzione di linguaggi sepcializzati per questo settore, la definzione di metamodelli custom associati alla produzione automatica del software di raccordo con le infrastrutture potrebbe aiutare un team di sviluppo ad elevare il livello di astrazione, incapsulando in una software factory aziendale i dettagli implementativi. Grazie a sistemi di produzione del sofware basati sul Model Driven Software Development (MDSD), l'approccio top-down attualmente carente può essere recuperato, permettendo di focalizzare l'attenzione sulla business logic delle applicazioni. Nella tesi viene mostrato un esempio di questo possibile approccio, partendo dall'idea che un'applicazione IoT sia in primo luogo un sistema software distribuito in cui l'interazione tra componenti attivi (modellati come attori) gioca un ruolo fondamentale.