863 resultados para Data sources detection
Resumo:
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques.
Resumo:
Data mining can be defined as the extraction of implicit, previously un-known, and potentially useful information from data. Numerous re-searchers have been developing security technology and exploring new methods to detect cyber-attacks with the DARPA 1998 dataset for Intrusion Detection and the modified versions of this dataset KDDCup99 and NSL-KDD, but until now no one have examined the performance of the Top 10 data mining algorithms selected by experts in data mining. The compared classification learning algorithms in this thesis are: C4.5, CART, k-NN and Naïve Bayes. The performance of these algorithms are compared with accuracy, error rate and average cost on modified versions of NSL-KDD train and test dataset where the instances are classified into normal and four cyber-attack categories: DoS, Probing, R2L and U2R. Additionally the most important features to detect cyber-attacks in all categories and in each category are evaluated with Weka’s Attribute Evaluator and ranked according to Information Gain. The results show that the classification algorithm with best performance on the dataset is the k-NN algorithm. The most important features to detect cyber-attacks are basic features such as the number of seconds of a network connection, the protocol used for the connection, the network service used, normal or error status of the connection and the number of data bytes sent. The most important features to detect DoS, Probing and R2L attacks are basic features and the least important features are content features. Unlike U2R attacks, where the content features are the most important features to detect attacks.
Resumo:
Incidental findings on low-dose CT images obtained during hybrid imaging are an increasing phenomenon as CT technology advances. Understanding the diagnostic value of incidental findings along with the technical limitations is important when reporting image results and recommending follow-up, which may result in an additional radiation dose from further diagnostic imaging and an increase in patient anxiety. This study assessed lesions incidentally detected on CT images acquired for attenuation correction on two SPECT/CT systems. Methods: An anthropomorphic chest phantom containing simulated lesions of varying size and density was imaged on an Infinia Hawkeye 4 and a Symbia T6 using the low-dose CT settings applied for attenuation correction acquisitions in myocardial perfusion imaging. Twenty-two interpreters assessed 46 images from each SPECT/CT system (15 normal images and 31 abnormal images; 41 lesions). Data were evaluated using a jackknife alternative free-response receiver-operating-characteristic analysis (JAFROC). Results: JAFROC analysis showed a significant difference (P < 0.0001) in lesion detection, with the figures of merit being 0.599 (95% confidence interval, 0.568, 0.631) and 0.810 (95% confidence interval, 0.781, 0.839) for the Infinia Hawkeye 4 and Symbia T6, respectively. Lesion detection on the Infinia Hawkeye 4 was generally limited to larger, higher-density lesions. The Symbia T6 allowed improved detection rates for midsized lesions and some lower-density lesions. However, interpreters struggled to detect small (5 mm) lesions on both image sets, irrespective of density. Conclusion: Lesion detection is more reliable on low-dose CT images from the Symbia T6 than from the Infinia Hawkeye 4. This phantom-based study gives an indication of potential lesion detection in the clinical context as shown by two commonly used SPECT/CT systems, which may assist the clinician in determining whether further diagnostic imaging is justified.
Resumo:
In order to optimize frontal detection in sea surface temperature fields at 4 km resolution, a combined statistical and expert-based approach is applied to test different spatial smoothing of the data prior to the detection process. Fronts are usually detected at 1 km resolution using the histogram-based, single image edge detection (SIED) algorithm developed by Cayula and Cornillon in 1992, with a standard preliminary smoothing using a median filter and a 3 × 3 pixel kernel. Here, detections are performed in three study regions (off Morocco, the Mozambique Channel, and north-western Australia) and across the Indian Ocean basin using the combination of multiple windows (CMW) method developed by Nieto, Demarcq and McClatchie in 2012 which improves on the original Cayula and Cornillon algorithm. Detections at 4 km and 1 km of resolution are compared. Fronts are divided in two intensity classes (“weak” and “strong”) according to their thermal gradient. A preliminary smoothing is applied prior to the detection using different convolutions: three type of filters (median, average and Gaussian) combined with four kernel sizes (3 × 3, 5 × 5, 7 × 7, and 9 × 9 pixels) and three detection window sizes (16 × 16, 24 × 24 and 32 × 32 pixels) to test the effect of these smoothing combinations on reducing the background noise of the data and therefore on improving the frontal detection. The performance of the combinations on 4 km data are evaluated using two criteria: detection efficiency and front length. We find that the optimal combination of preliminary smoothing parameters in enhancing detection efficiency and preserving front length includes a median filter, a 16 × 16 pixel window size, and a 5 × 5 pixel kernel for strong fronts and a 7 × 7 pixel kernel for weak fronts. Results show an improvement in detection performance (from largest to smallest window size) of 71% for strong fronts and 120% for weak fronts. Despite the small window used (16 × 16 pixels), the length of the fronts has been preserved relative to that found with 1 km data. This optimal preliminary smoothing and the CMW detection algorithm on 4 km sea surface temperature data are then used to describe the spatial distribution of the monthly frequencies of occurrence for both strong and weak fronts across the Indian Ocean basin. In general strong fronts are observed in coastal areas whereas weak fronts, with some seasonal exceptions, are mainly located in the open ocean. This study shows that adequate noise reduction done by a preliminary smoothing of the data considerably improves the frontal detection efficiency as well as the global quality of the results. Consequently, the use of 4 km data enables frontal detections similar to 1 km data (using a standard median 3 × 3 convolution) in terms of detectability, length and location. This method, using 4 km data is easily applicable to large regions or at the global scale with far less constraints of data manipulation and processing time relative to 1 km data.
Resumo:
abuticaba (Myrciaria cauliflora. Mart) is a highly perishable fruit native to Brazil, which is consumed both fresh and industrially processed in the form of juices, jams, wines and distilled liqueurs. This processing generates a large amount of waste by-products, which represent approximately 50% of the fruit weight. The by-products are of interest for obtaining valuable bioactive compounds that could be used as nutraceuticals or functional ingredients. In this study, fermented and non-fermented jabuticaba pomaces were studied regarding their hydrophilic and lipophilic compounds, as well as their antioxidant properties, including: soluble sugars, organic acids and tocopherols (using high performance liquid chromatography coupled to refraction index, diode array and fluorescence detector, respectively); phenolics and anthocyanins, (using liquid chromatography coupled to diode array detection, and mass spectrometry with electrospray ionization); and fatty acids (using gas-liquid chromatography with flame ionization detection). The analytical data demonstrated that jabuticaba pomaces are a rich source of bioactive compounds such as tocopherols, polyunsaturated fatty acids and phenolic compounds (namely hydrolyzable tannins and anthocyanins) with antioxidant potential. Therefore, jabuticaba pomace may have good potential as a functional ingredient in the fabrication of human foods and animal feed.
Resumo:
MEGAGEO - Moving megaliths in the Neolithic is a project that intends to find the provenience of lithic materials in the construction of tombs. A multidisciplinary approach is carried out, with researchers from several of the knowledge fields involved. This work presents a spatial data warehouse specially developed for this project that comprises information from national archaeological databases, geographic and geological information and new geochemical and petrographic data obtained during the project. The use of the spatial data warehouse proved to be essential in the data analysis phase of the project. The Redondo Area is presented as a case study for the application of the spatial data warehouse to analyze the relations between geochemistry, geology and the tombs in this area.
Resumo:
Correctness of information gathered in production environments is an essential part of quality assurance processes in many industries, this task is often performed by human resources who visually take annotations in various steps of the production flow. Depending on the performed task the correlation between where exactly the information is gathered and what it represents is more than often lost in the process. The lack of labeled data places a great boundary on the application of deep neural networks aimed at object detection tasks, moreover supervised training of deep models requires a great amount of data to be available. Reaching an adequate large collection of labeled images through classic techniques of data annotations is an exhausting and costly task to perform, not always suitable for every scenario. A possible solution is to generate synthetic data that replicates the real one and use it to fine-tune a deep neural network trained on one or more source domains to a different target domain. The purpose of this thesis is to show a real case scenario where the provided data were both in great scarcity and missing the required annotations. Sequentially a possible approach is presented where synthetic data has been generated to address those issues while standing as a training base of deep neural networks for object detection, capable of working on images taken in production-like environments. Lastly, it compares performance on different types of synthetic data and convolutional neural networks used as backbones for the model.
Resumo:
In questo elaborato vengono analizzate differenti tecniche per la detection di jammer attivi e costanti in una comunicazione satellitare in uplink. Osservando un numero limitato di campioni ricevuti si vuole identificare la presenza di un jammer. A tal fine sono stati implementati i seguenti classificatori binari: support vector machine (SVM), multilayer perceptron (MLP), spectrum guarding e autoencoder. Questi algoritmi di apprendimento automatico dipendono dalle features che ricevono in ingresso, per questo motivo è stata posta particolare attenzione alla loro scelta. A tal fine, sono state confrontate le accuratezze ottenute dai detector addestrati utilizzando differenti tipologie di informazione come: i segnali grezzi nel tempo, le statistical features, le trasformate wavelet e lo spettro ciclico. I pattern prodotti dall’estrazione di queste features dai segnali satellitari possono avere dimensioni elevate, quindi, prima della detection, vengono utilizzati i seguenti algoritmi per la riduzione della dimensionalità: principal component analysis (PCA) e linear discriminant analysis (LDA). Lo scopo di tale processo non è quello di eliminare le features meno rilevanti, ma combinarle in modo da preservare al massimo l’informazione, evitando problemi di overfitting e underfitting. Le simulazioni numeriche effettuate hanno evidenziato come lo spettro ciclico sia in grado di fornire le features migliori per la detection producendo però pattern di dimensioni elevate, per questo motivo è stato necessario l’utilizzo di algoritmi di riduzione della dimensionalità. In particolare, l'algoritmo PCA è stato in grado di estrarre delle informazioni migliori rispetto a LDA, le cui accuratezze risentivano troppo del tipo di jammer utilizzato nella fase di addestramento. Infine, l’algoritmo che ha fornito le prestazioni migliori è stato il Multilayer Perceptron che ha richiesto tempi di addestramento contenuti e dei valori di accuratezza elevati.
Resumo:
During the last semester of the Master’s Degree in Artificial Intelligence, I carried out my internship working for TXT e-Solution on the ADMITTED project. This paper describes the work done in those months. The thesis will be divided into two parts representing the two different tasks I was assigned during the course of my experience. The First part will be about the introduction of the project and the work done on the admittedly library, maintaining the code base and writing the test suits. The work carried out is more connected to the Software engineer role, developing features, fixing bugs and testing. The second part will describe the experiments done on the Anomaly detection task using a Deep Learning technique called Autoencoder, this task is on the other hand more connected to the data science role. The two tasks were not done simultaneously but were dealt with one after the other, which is why I preferred to divide them into two separate parts of this paper.
Resumo:
Flavanones (hesperidin, naringenin, naringin, and poncirin) in industrial, hand-squeezed orange juices and from fresh-in-squeeze machines orange juices were determined by HPLC/DAD analysis using a previously described liquid-liquid extraction method. Method validation including the accuracy was performed by using recovery tests. Samples (36) collected from different Brazilian locations and brands were analyzed. Concentrations were determined using an external standard curve. The limits of detection (LOD) and the limits of quantification (LOQ) calculated were 0.0037, 1.87, 0.0147, and 0.0066 mg 100 g(-1) and 0.0089, 7.84, 0.0302, and 0.0200 mg 100 g(-1) for naringin, hesperidin, poncirin, and naringenin, respectively. The results demonstrated that hesperidin was present at the highest concentration levels, especially in the industrial orange juices. Its average content and concentration range were 69.85 and 18.80-139.00 mg 100 g(-1). The other flavanones showed the lowest concentration levels. The average contents and concentration ranges found were 0.019, 0.01-0.30, and 0.12 and 0.1-0.17, 0.13, and 0.01-0.36 mg 100 g(-1), respectively. The results were also evaluated using the principal component analysis (PCA) multivariate analysis technique which showed that poncirin, naringenin, and naringin were the principal elements that contributed to the variability in the sample concentrations.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
Giardia duodenalis is a flagellate protozoan that parasitizes humans and several other mammals. Protozoan contamination has been regularly documented at important environmental sites, although most of these studies were performed at the species level. There is a lack of studies that correlate environmental contamination and clinical infections in the same region. The aim of this study is to evaluate the genetic diversity of a set of clinical and environmental samples and to use the obtained data to characterize the genetic profile of the distribution of G. duodenalis and the potential for zoonotic transmission in a metropolitan region of Brazil. The genetic assemblages and subtypes of G. duodenalis isolates obtained from hospitals, a veterinary clinic, a day-care center and important environmental sites were determined via multilocus sequence-based genotyping using three unlinked gene loci. Cysts of Giardia were detected at all of the environmental sites. Mixed assemblages were detected in 25% of the total samples, and an elevated number of haplotypes was identified. The main haplotypes were shared among the groups, and new subtypes were identified at all loci. Ten multilocus genotypes were identified: 7 for assemblage A and 3 for assemblage B. There is persistent G. duodenalis contamination at important environmental sites in the city. The identified mixed assemblages likely represent mixed infections, suggesting high endemicity of Giardia in these hosts. Most Giardia isolates obtained in this study displayed zoonotic potential. The high degree of genetic diversity in the isolates obtained from both clinical and environmental samples suggests that multiple sources of infection are likely responsible for the detected contamination events. The finding that many multilocus genotypes (MLGs) and haplotypes are shared by different groups suggests that these sources of infection may be related and indicates that there is a notable risk of human infection caused by Giardia in this region.
Resumo:
The purpose of this study was to compare the polymerization shrinkage stress of composite resins (microfilled, microhybrid and hybrid) photoactivated by quartz-tungsten halogen light (QTH) and light-emitting diode (LED). Glass rods (5.0 mm x 5.0 cm) were fabricated and had one of the surfaces air-abraded with aluminum oxide and coated with a layer of an adhesive system, which was photoactivated with the QTH unit. The glass rods were vertically assembled, in pairs, to a universal testing machine and the composites were applied to the lower rod. The upper rod was placed closer, at 2 mm, and an extensometer was attached to the rods. The 20 composites were polymerized by either QTH (n=10) or LED (n=10) curing units. Polymerization was carried out using 2 devices positioned in opposite sides, which were simultaneously activated for 40 s. Shrinkage stress was analyzed twice: shortly after polymerization (t40s) and 10 min later (t10min). Data were analyzed statistically by 2-way ANOVA and Tukey's test (a=5%). The shrinkage stress for all composites was higher at t10min than at t40s, regardless of the activation source. Microfilled composite resins showed lower shrinkage stress values compared to the other composite resins. For the hybrid and microhybrid composite resins, the light source had no influence on the shrinkage stress, except for microfilled composite at t10min. It may be concluded that the composition of composite resins is the factor with the strongest influence on shrinkage stress.