13 resultados para Materials management - Data processing

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The outcome of this research is an Intelligent Retrieval System for Conditions of Contract Documents. The objective of the research is to improve the method of retrieving data from a computer version of a construction Conditions of Contract document. SmartDoc, a prototype computer system has been developed for this purpose. The system provides recommendations to aid the user in the process of retrieving clauses from the construction Conditions of Contract document. The prototype system integrates two computer technologies: hypermedia and expert systems. Hypermedia is utilized to provide a dynamic way for retrieving data from the document. Expert systems technology is utilized to build a set of rules that activate the recommendations to aid the user during the process of retrieval of clauses. The rules are based on experts knowledge. The prototype system helps the user retrieve related clauses that are not explicitly cross-referenced but, according to expert experience, are relevant to the topic that the user is interested in.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research presents several components encompassing the scope of the objective of Data Partitioning and Replication Management in Distributed GIS Database. Modern Geographic Information Systems (GIS) databases are often large and complicated. Therefore data partitioning and replication management problems need to be addresses in development of an efficient and scalable solution. ^ Part of the research is to study the patterns of geographical raster data processing and to propose the algorithms to improve availability of such data. These algorithms and approaches are targeting granularity of geographic data objects as well as data partitioning in geographic databases to achieve high data availability and Quality of Service(QoS) considering distributed data delivery and processing. To achieve this goal a dynamic, real-time approach for mosaicking digital images of different temporal and spatial characteristics into tiles is proposed. This dynamic approach reuses digital images upon demand and generates mosaicked tiles only for the required region according to user's requirements such as resolution, temporal range, and target bands to reduce redundancy in storage and to utilize available computing and storage resources more efficiently. ^ Another part of the research pursued methods for efficient acquiring of GIS data from external heterogeneous databases and Web services as well as end-user GIS data delivery enhancements, automation and 3D virtual reality presentation. ^ There are vast numbers of computing, network, and storage resources idling or not fully utilized available on the Internet. Proposed "Crawling Distributed Operating System "(CDOS) approach employs such resources and creates benefits for the hosts that lend their CPU, network, and storage resources to be used in GIS database context. ^ The results of this dissertation demonstrate effective ways to develop a highly scalable GIS database. The approach developed in this dissertation has resulted in creation of TerraFly GIS database that is used by US government, researchers, and general public to facilitate Web access to remotely-sensed imagery and GIS vector information. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as “histogram binning” inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. ^ Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. ^ The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. ^ These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. ^ In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present 8 yr of long-term water quality, climatological, and water management data for 17 locations in Everglades National Park, Florida. Total phosphorus (P) concentration data from freshwater sites (typically ,0.25 mmol L21, or 8 mg L21) indicate the oligotrophic, P-limited nature of this large freshwater–estuarine landscape. Total P concentrations at estuarine sites near the Gulf of Mexico (average ø0.5 m mol L21) demonstrate the marine source for this limiting nutrient. This ‘‘upside down’’ phenomenon, with the limiting nutrient supplied by the ocean and not the land, is a defining characteristic of the Everglade landscape. We present a conceptual model of how the seasonality of precipitation and the management of canal water inputs control the marine P supply, and we hypothesize that seasonal variability in water residence time controls water quality through internal biogeochemical processing. Low freshwater inflows during the dry season increase estuarine residence times, enabling local processes to control nutrient availability and water quality. El Nin˜o–Southern Oscillation (ENSO) events tend to mute the seasonality of rainfall without altering total annual precipitation inputs. The Nin˜o3 ENSO index (which indicates an ENSO event when positive and a La Nin˜a event when negative) was positively correlated with both annual rainfall and the ratio of dry season to wet season precipitation. This ENSO-driven disruption in seasonal rainfall patterns affected salinity patterns and tended to reduce marine inputs of P to Everglades estuaries. ENSO events also decreased dry season residence times, reducing the importance of estuarine nutrient processing. The combination of variable water management activities and interannual differences in precipitation patterns has a strong influence on nutrient and salinity patterns in Everglades estuaries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In his discussion - Database As A Tool For Hospitality Management - William O'Brien, Assistant Professor, School of Hospitality Management at Florida International University, O’Brien offers at the outset, “Database systems offer sweeping possibilities for better management of information in the hospitality industry. The author discusses what such systems are capable of accomplishing.” The author opens with a bit of background on database system development, which also lends an impression as to the complexion of the rest of the article; uh, it’s a shade technical. “In early 1981, Ashton-Tate introduced dBase 11. It was the first microcomputer database management processor to offer relational capabilities and a user-friendly query system combined with a fast, convenient report writer,” O’Brien informs. “When 16-bit microcomputers such as the IBM PC series were introduced late the following year, more powerful database products followed: dBase 111, Friday!, and Framework. The effect on the entire business community, and the hospitality industry in particular, has been remarkable”, he further offers with his informed outlook. Professor O’Brien offers a few anecdotal situations to illustrate how much a comprehensive data-base system means to a hospitality operation, especially when billing is involved. Although attitudes about computer systems, as well as the systems themselves have changed since this article was written, there is pertinent, fundamental information to be gleaned. In regards to the digression of the personal touch when a customer is engaged with a computer system, O’Brien says, “A modern data processing system should not force an employee to treat valued customers as numbers…” He also cautions, “Any computer system that decreases the availability of the personal touch is simply unacceptable.” In a system’s ability to process information, O’Brien suggests that in the past businesses were so enamored with just having an automated system that they failed to take full advantage of its capabilities. O’Brien says that a lot of savings, in time and money, went un-noticed and/or under-appreciated. Today, everyone has an integrated system, and the wise business manager is the business manager who takes full advantage of all his resources. O’Brien invokes the 80/20 rule, and offers, “…the last 20 percent of results costs 80 percent of the effort. But times have changed. Everyone is automating data management, so that last 20 percent that could be ignored a short time ago represents a significant competitive differential.” The evolution of data systems takes center stage for much of the article; pitfalls also emerge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity. We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation established a software-hardware integrated design for a multisite data repository in pediatric epilepsy. A total of 16 institutions formed a consortium for this web-based application. This innovative fully operational web application allows users to upload and retrieve information through a unique human-computer graphical interface that is remotely accessible to all users of the consortium. A solution based on a Linux platform with My-SQL and Personal Home Page scripts (PHP) has been selected. Research was conducted to evaluate mechanisms to electronically transfer diverse datasets from different hospitals and collect the clinical data in concert with their related functional magnetic resonance imaging (fMRI). What was unique in the approach considered is that all pertinent clinical information about patients is synthesized with input from clinical experts into 4 different forms, which were: Clinical, fMRI scoring, Image information, and Neuropsychological data entry forms. A first contribution of this dissertation was in proposing an integrated processing platform that was site and scanner independent in order to uniformly process the varied fMRI datasets and to generate comparative brain activation patterns. The data collection from the consortium complied with the IRB requirements and provides all the safeguards for security and confidentiality requirements. An 1-MR1-based software library was used to perform data processing and statistical analysis to obtain the brain activation maps. Lateralization Index (LI) of healthy control (HC) subjects in contrast to localization-related epilepsy (LRE) subjects were evaluated. Over 110 activation maps were generated, and their respective LIs were computed yielding the following groups: (a) strong right lateralization: (HC=0%, LRE=18%), (b) right lateralization: (HC=2%, LRE=10%), (c) bilateral: (HC=20%, LRE=15%), (d) left lateralization: (HC=42%, LRE=26%), e) strong left lateralization: (HC=36%, LRE=31%). Moreover, nonlinear-multidimensional decision functions were used to seek an optimal separation between typical and atypical brain activations on the basis of the demographics as well as the extent and intensity of these brain activations. The intent was not to seek the highest output measures given the inherent overlap of the data, but rather to assess which of the many dimensions were critical in the overall assessment of typical and atypical language activations with the freedom to select any number of dimensions and impose any degree of complexity in the nonlinearity of the decision space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurately assessing the extent of myocardial tissue injury induced by Myocardial infarction (MI) is critical to the planning and optimization of MI patient management. With this in mind, this study investigated the feasibility of using combined fluorescence and diffuse reflectance spectroscopy to characterize a myocardial infarct at the different stages of its development. An animal study was conducted using twenty male Sprague-Dawley rats with MI. In vivo fluorescence spectra at 337 nm excitation and diffuse reflectance between 400 nm and 900 nm were measured from the heart using a portable fiber-optic spectroscopic system. Spectral acquisition was performed on (1) the normal heart region; (2) the region immediately surrounding the infarct; and (3) the infarcted region—one, two, three and four weeks into MI development. The spectral data were divided into six subgroups according to the histopathological features associated with various degrees/severities of myocardial tissue injury as well as various stages of myocardial tissue remodeling, post infarction. Various data processing and analysis techniques were employed to recognize the representative spectral features corresponding to various histopathological features associated with myocardial infarction. The identified spectral features were utilized in discriminant analysis to further evaluate their effectiveness in classifying tissue injuries induced by MI. In this study, it was observed that MI induced significant alterations (p < 0.05) in the diffuse reflectance spectra, especially between 450 nm and 600 nm, from myocardial tissue within the infarcted and surrounding regions. In addition, MI induced a significant elevation in fluorescence intensities at 400 and 460 nm from the myocardial tissue from the same regions. The extent of these spectral alterations was related to the duration of the infarction. Using the spectral features identified, an effective tissue injury classification algorithm was developed which produced a satisfactory overall classification result (87.8%). The findings of this research support the concept that optical spectroscopy represents a useful tool to non-invasively determine the in vivo pathophysiological features of a myocardial infarct and its surrounding tissue, thereby providing valuable real-time feedback to surgeons during various surgical interventions for MI.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A number of factors influence the information processing needs of organizations, particularly with respect to the coordination and control mechanisms within a hotel. The authors use a theoretical framework to illustrate alternative mechanisms that can be used to coordinate and control hotel operations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurately assessing the extent of myocardial tissue injury induced by Myocardial infarction (MI) is critical to the planning and optimization of MI patient management. With this in mind, this study investigated the feasibility of using combined fluorescence and diffuse reflectance spectroscopy to characterize a myocardial infarct at the different stages of its development. An animal study was conducted using twenty male Sprague-Dawley rats with MI. In vivo fluorescence spectra at 337 nm excitation and diffuse reflectance between 400 nm and 900 nm were measured from the heart using a portable fiber-optic spectroscopic system. Spectral acquisition was performed on - (1) the normal heart region; (2) the region immediately surrounding the infarct; and (3) the infarcted region - one, two, three and four weeks into MI development. The spectral data were divided into six subgroups according to the histopathological features associated with various degrees / severities of myocardial tissue injury as well as various stages of myocardial tissue remodeling, post infarction. Various data processing and analysis techniques were employed to recognize the representative spectral features corresponding to various histopathological features associated with myocardial infarction. The identified spectral features were utilized in discriminant analysis to further evaluate their effectiveness in classifying tissue injuries induced by MI. In this study, it was observed that MI induced significant alterations (p < 0.05) in the diffuse reflectance spectra, especially between 450 nm and 600 nm, from myocardial tissue within the infarcted and surrounding regions. In addition, MI induced a significant elevation in fluorescence intensities at 400 and 460 nm from the myocardial tissue from the same regions. The extent of these spectral alterations was related to the duration of the infarction. Using the spectral features identified, an effective tissue injury classification algorithm was developed which produced a satisfactory overall classification result (87.8%). The findings of this research support the concept that optical spectroscopy represents a useful tool to non-invasively determine the in vivo pathophysiological features of a myocardial infarct and its surrounding tissue, thereby providing valuable real-time feedback to surgeons during various surgical interventions for MI.