960 resultados para k-means
Resumo:
Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.
Resumo:
Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians.
Resumo:
L'esperimento ATLAS, come gli altri esperimenti che operano al Large Hadron Collider, produce Petabytes di dati ogni anno, che devono poi essere archiviati ed elaborati. Inoltre gli esperimenti si sono proposti di rendere accessibili questi dati in tutto il mondo. In risposta a questi bisogni è stato progettato il Worldwide LHC Computing Grid che combina la potenza di calcolo e le capacità di archiviazione di più di 170 siti sparsi in tutto il mondo. Nella maggior parte dei siti del WLCG sono state sviluppate tecnologie per la gestione dello storage, che si occupano anche della gestione delle richieste da parte degli utenti e del trasferimento dei dati. Questi sistemi registrano le proprie attività in logfiles, ricchi di informazioni utili agli operatori per individuare un problema in caso di malfunzionamento del sistema. In previsione di un maggiore flusso di dati nei prossimi anni si sta lavorando per rendere questi siti ancora più affidabili e uno dei possibili modi per farlo è lo sviluppo di un sistema in grado di analizzare i file di log autonomamente e individuare le anomalie che preannunciano un malfunzionamento. Per arrivare a realizzare questo sistema si deve prima individuare il metodo più adatto per l'analisi dei file di log. In questa tesi viene studiato un approccio al problema che utilizza l'intelligenza artificiale per analizzare i logfiles, più nello specifico viene studiato l'approccio che utilizza dell'algoritmo di clustering K-means.
Resumo:
L’elaborato di tesi è frutto di un percorso di tirocinio svolto in Gruppo Montenegro S.r.l., il cui obiettivo risiede nello sviluppo di un algoritmo per la pallettizzazione e la saturazione del mezzo di trasporto per la Divisione Food. Nello specifico viene proposto un algoritmo euristico elaborato nel linguaggio di programmazione Python. La divisione Food è costituita da tre categorie: Cannamela, Cuore e Vitalia.Queste comprendono prodotti molto eterogenei. Attraverso il coinvolgimento delle funzioni aziendali di Packaging e Qualità, sono stati stabiliti i vincoli da rispettare per la pallettizzazione dei prodotti. L’algoritmo proposto viene descritto suddividendo il processo in tre macro-step. La prima parte affronta il problema del 3D Bin Packing Problem, utilizzando e modificando un programma già presente in letteratura per soddisfare le esigenze della categoria Cannamela. Quest’ultima a differenza delle altre categorie, viene allestita in groupage preallestito poiché gli ordini Cannamela possono contenere quantità non-multiple rispetto alle quantità contenute nell’imballo secondario. La seconda parte dell’algoritmo si occupa della creazione dei pallet per le categorie Cuore e Vitalia. Attraverso l’utilizzo dell’algoritmo di clustering K-means sono state create famiglie di codici che permettessero l’allestimento di pallet con prodotti considerati simili. Di conseguenza, l’algoritmo per la pallettizzazione delle due categorie viene sviluppato ex-novo basandosi sulla percentuale di occupazione del prodotto nel pallet. L’ultima parte dell’algoritmo studia la possibilità di sovrapporre i pallet precedentemente creati. Infine, viene effettuata un’analisi di un periodo strategico confrontando i risultatidell’algoritmo Python con quelli dell’algoritmo presente nel gestionale aziendale. I risultati vengono poi analizzati in relazione a due impatti importanti per l’azienda:economici e ambientali.
Resumo:
The inorganic chemical characterization of suspended sediments is of utmost relevance for the knowledge of the dynamics and movement of chemical elements in the aquatic and wet ecosystems. Despite the complexity of the effective design for studying this ecological compartment, this work has tested a procedure for analyzing suspended sediments by instrumental neutron activation analysis, k(0) method (k(0)-INAA). The chemical elements As, Ba, Br, Ca, Ce, Co, Cr, Cs, Eu, Fe, Hf, Fig, K, La, Mo, Na, Ni, Rb, Sb, Sc, Se, Sm, Sr, Ta, Tb, Th, Yb and Zn were quantified in the suspended sediment compartment by means of k(0)-INAA. When compared with World Average for rivers, high mass fractions of Fe (222,900 mg/kg), Ba (4990 mg/kg), Zn (1350 mg/kg), Cr (646 mg/kg), Co (74.5 mg/kg), Br (113 mg/kg) and Mo (31.9 mg/kg) were quantified in suspended sediments from the Piracicaba River, the Piracicamirim Stream and the Marins Stream. Results of the principal component analysis for standardized chemical element mass fractions indicated an intricate correlation among chemical elements evaluated, as a response of the contribution of natural and anthropogenic sources of chemical elements for ecosystems. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In the current work, we studied the effect of the nonionic detergent dodecyloctaethyleneglycol, C(12)E(8), on the structure and oligomeric form of the Na,K-ATPase membrane enzyme (sodium-potassium pump) in aqueous suspension, by means of small-angle X-ray scattering (SAXS). Samples composed of 2 mg/mL of Na,K-ATPase, extracted from rabbit kidney medulla, in the presence of a small amount of C(12)E(8) (0.005 mg/mL) and in larger concentrations ranging from 2.7 to 27 mg/mL did not present catalytic activity. Under this condition, an oligomerization of the alpha subunits is expected. SAXS data were analyzed by means of a global fitting procedure supposing that the scattering is due to two independent contributions: one coming from the enzyme and the other one from C(12)E(8) micelles. In the small detergent content (0.005 mg/mL), the SAXS results evidenced that Na,K-ATPase is associated into aggregates larger than (alpha beta)(2) form. When 2.7 mg/mL of C(12)E(8) is added, the data analysis revealed the presence of alpha(4) aggregates in the solution and some free micelles. Increasing the detergent amount up to 27 mg/mL does not disturb the alpha(4) aggregate: just more micelles of the same size and shape are proportionally formed in solution. We believe that our results shed light on a better understanding of how nonionic detergents induce subunit dissociation and reassembling to minimize the exposure of hydrophobic residues to the aqueous solvent.
Resumo:
Nitrogen adsorption on a surface of a non-porous reference material is widely used in the characterization. Traditionally, the enhancement of solid-fluid potential in a porous solid is accounted for by incorporating the surface curvature into the solid-fluid Potential of the flat reference surface. However, this calculation procedure has not been justified experimentally. In this paper, we derive the solid-fluid potential of mesoporous MCM-41 solid by using solely the adsorption isotherm of that solid. This solid-fluid potential is then compared with that of the non-porous reference surface. In derivation of the solid-fluid potential for both reference surface and mesoporous MCM-41 silica (diameter ranging front 3 to 6.5 nm) we employ the nonlocal density functional theory developed for amorphous solids. It is found that, to out, surprise, the solid-fluid potential of a porous solid is practically the same as that for the reference surface, indicating that there is no enhancement due to Surface curvature. This requires further investigations to explain this unusual departure from our conventional wisdom of curvature-induced enhancement. Accepting the curvature-independent solid-fluid potential derived from the non-porous reference surface, we analyze the hysteresis features of a series of MCM-41 samples. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
Background: The supraceliac aortic cross-clamping can be an option to save patients with hipovolemic shock due to abdominal trauma. However, this maneuver is associated with ischemia/reperfusion (I/R) injury strongly related to oxidative stress and reduction of nitric oxide bioavailability. Moreover, several studies demonstrated impairment in relaxation after I/R, but the time course of I/R necessary to induce vascular dysfunction is still controversial. We investigated whether 60 minutes of ischemia followed by 30 minutes of reperfusion do not change the relaxation of visceral arteries nor the plasma and renal levels of malondialdehyde (MDA) and nitrite plus nitrate (NOx). Methods: Male mongrel dogs (n = 27) were randomly allocated in one of the three groups: sham (no clamping, n = 9), ischemia (supraceliac aortic cross-clamping for 60 minutes, n = 9), and I/R (60 minutes of ischemia followed by reperfusion for 30 minutes, n = 9). Relaxation of visceral arteries (celiac trunk, renal and superior mesenteric arteries) was studied in organ chambers. MDA and NOx concentrations were determined using a commercially available kit and an ozone-based chemiluminescence assay, respectively. Results: Both acetylcholine and calcium ionophore caused relaxation in endothelium-intact rings and no statistical differences were observed among the three groups. Sodium nitroprusside promoted relaxation in endothelium-denuded rings, and there were no inter-group statistical differences. Both plasma and renal concentrations of MDA and NOx showed no significant difference among the groups. Conclusion: Supraceliac aortic cross-clamping for 60 minutes alone and followed by 30 minutes of reperfusion did not impair relaxation of canine visceral arteries nor evoke biochemical alterations in plasma or renal tissue.
Resumo:
Correlations of charged hadrons of 1< p(T) < 10 Gev/c with high pT direct photons and pi(0) mesons in the range 5< p(T) < 15 Gev/c are used to study jet fragmentation in the gamma + jet and dijet channels, respectively. The magnitude of the partonic transverse momentum, k(T), is obtained by comparing to a model incorporating a Gaussian kT smearing. The sensitivity of the associated charged hadron spectra to the underlying fragmentation function is tested and the data are compared to calculations using recent global fit results. The shape of the direct photon-associated hadron spectrum as well as its charge asymmetry are found to be consistent with a sample dominated by quark-gluon Compton scattering. No significant evidence of fragmentation photon correlated production is observed within experimental uncertainties.
Resumo:
The generator-coordinate method is a flexible and powerful reformulation of the variational principle. Here we show that by introducing a generator coordinate in the Kohn-Sham equation of density-functional theory, excitation energies can be obtained from ground-state density functionals. As a viability test, the method is applied to ground-state energies and various types of excited-state energies of atoms and ions from the He and the Li isoelectronic series. Results are compared to a variety of alternative DFT-based approaches to excited states, in particular time-dependent density-functional theory with exact and approximate potentials.
Resumo:
The approach presented in this paper consists of an energy-based field-circuit coupling in combination with multi-physics simulation of the acoustic radiation of electrical machines. The proposed method is applied to a special switched reluctance motor with asymmetric pole geometry to improve the start-up torque. The pole shape has been optimized, subject to low torque ripple, in a previous study. The proposed approach here is used to analyze the impact of the optimization on the overall acoustic behavior. The field-circuit coupling is based on a temporary lumped-parameter model of the magnetic part incorporated into a circuit simulation based on the modified nodal analysis. The harmonic force excitation is calculated by means of stress tensor computation, and it is transformed to a mechanical mesh by mapping techniques. The structural dynamic problem is solved in the frequency domain using a finite-element modal analysis and superposition. The radiation characteristic is obtained from boundary element acoustic simulation. Simulation results of both rotor types are compared, and measurements of the drive are presented.
Resumo:
In this paper use consider the problem of providing standard errors of the component means in normal mixture models fitted to univariate or multivariate data by maximum likelihood via the EM algorithm. Two methods of estimation of the standard errors are considered: the standard information-based method and the computationally-intensive bootstrap method. They are compared empirically by their application to three real data sets and by a small-scale Monte Carlo experiment.
Resumo:
Osteosarcoma (OS) is the most frequent bone tumor in children and adolescents. Tumor antigens are encoded by genes that are expressed in many types of solid tumors but are silent in normal tissues, with the exception of placenta and male germ-line cells. It has been proposed that antigen tumors are potential tumor markers. The premise of this study is that the identification of novel OS-associated transcripts will lead to a better understanding of the events involved in OS pathogenesis and biology. We analyzed the expression of a panel of seven tumor antigens in OS samples to identify possible tumor markers. After selecting the tumor antigen expressed in most samples of the panel, gene expression profiling was used to identify osteosarcoma-associated molecular alterations. A microarray was employed because of its ability to accurately produce comprehensive expression profiles. PRAME was identified as the tumor antigen expressed in most OS samples; it was detected in 68% of the cases. Microarray results showed differences in expression for genes functioning in cell signaling and adhesion as well as extracellular matrix-related genes, implying that such tumors could indeed differ in regard to distinct patterns of tumorigenesis. The hypothesis inferred in this study was gathered mostly from available data concerning other kinds of tumors. There is circumstantial evidence that PRAME expression might be related to distinct patterns of tumorigenesis. Further investigation is needed to validate the differential expression of genes belonging to tumorigenesis-related pathways in PRAME-positive and PRAME-negative tumors.