918 resultados para Spatial analysis statistics -- Data processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Navigation of deep space probes is most commonly operated using the spacecraft Doppler tracking technique. Orbital parameters are determined from a series of repeated measurements of the frequency shift of a microwave carrier over a given integration time. Currently, both ESA and NASA operate antennas at several sites around the world to ensure the tracking of deep space probes. Just a small number of software packages are nowadays used to process Doppler observations. The Astronomical Institute of the University of Bern (AIUB) has recently started the development of Doppler data processing capabilities within the Bernese GNSS Software. This software has been extensively used for Precise Orbit Determination of Earth orbiting satellites using GPS data collected by on-board receivers and for subsequent determination of the Earth gravity field. In this paper, we present the currently achieved status of the Doppler data modeling and orbit determination capabilities in the Bernese GNSS Software using GRAIL data. In particular we will focus on the implemented orbit determination procedure used for the combined analysis of Doppler and intersatellite Ka-band data. We show that even at this earlier stage of the development we can achieve an accuracy of few mHz on two-way S-band Doppler observation and of 2 µm/s on KBRR data from the GRAIL primary mission phase.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A wide variety of spatial data collection efforts are ongoing throughout local, state and federal agencies, private firms and non-profit organizations. Each effort is established for a different purpose but organizations and individuals often collect and maintain the same or similar information. The United States federal government has undertaken many initiatives such as the National Spatial Data Infrastructure, the National Map and Geospatial One-Stop to reduce duplicative spatial data collection and promote the coordinated use, sharing, and dissemination of spatial data nationwide. A key premise in most of these initiatives is that no national government will be able to gather and maintain more than a small percentage of the geographic data that users want and desire. Thus, national initiatives depend typically on the cooperation of those already gathering spatial data and those using GIs to meet specific needs to help construct and maintain these spatial data infrastructures and geo-libraries for their nations (Onsrud 2001). Some of the impediments to widespread spatial data sharing are well known from directly asking GIs data producers why they are not currently involved in creating datasets that are of common or compatible formats, documenting their datasets in a standardized metadata format or making their datasets more readily available to others through Data Clearinghouses or geo-libraries. The research described in this thesis addresses the impediments to wide-scale spatial data sharing faced by GIs data producers and explores a new conceptual data-sharing approach, the Public Commons for Geospatial Data, that supports user-friendly metadata creation, open access licenses, archival services and documentation of parent lineage of the contributors and value- adders of digital spatial data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background. The purpose of this study was to describe the risk factors and demographics of persons with salmonellosis and shigellosis and to investigate both seasonal and spatial variations in the occurrence of these infections in Texas from 2000 to 2004, utilizing time series analyses and the geographic information system digital mapping methods. ^ Methods. Spatial Analysis: MapInfo software was used to map the distribution of age-adjusted rates of reported shigellosis and salmonellosis in Texas from 2000–2004 by zip codes. Census data on above or below poverty level, household income, highest level of educational attainment, race, ethnicity, and urban/rural community status was obtained from the 2000 Decennial Census for each zip code. The zip codes with the upper 10% and lower 10% were compared using t-tests and logistic regression to determine whether there were any potential risk factors. ^ Temporal analysis. Seasonal patterns in the prevalence of infections in Texas from 2000 to 2003 were determined by performing time-series analysis on the numbers of cases of salmonellosis and shigellosis. A linear regression was also performed to assess for trends in the incidence of each disease, along with auto-correlation and multi-component cosinor analysis. ^ Results. Spatial analysis: Analysis by general linear model showed a significant association between infection rates and age, with young children aged less than 5 and those aged 5–9 years having increased risk of infection for both disease conditions. The data demonstrated that those populations with high percentages of people who attained a higher than high school education were less likely to be represented in zip codes with high rates of shigellosis. However, for salmonellosis, logistic regression models indicated that when compared to populations with high percentages of non-high school graduates, having a high school diploma or equivalent increased the odds of having a high rate of infection. ^ Temporal analysis. For shigellosis, multi-component cosinor analyses were used to determine the approximated cosine curve which represented a statistically significant representation of the time series data for all age groups by sex. The shigellosis results show 2 peaks, with a major peak occurring in June and a secondary peak appearing around October. Salmonellosis results showed a single peak and trough in all age groups with the peak occurring in August and the trough occurring in February. ^ Conclusion. The results from this study can be used by public health agencies to determine the timing of public health awareness programs and interventions in order to prevent salmonellosis and shigellosis from occurring. Because young children depend on adults for their meals, it is important to increase the awareness of day-care workers and new parents about modes of transmission and hygienic methods of food preparation and storage. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Floods are the leading cause of fatalities related to natural disasters in Texas. Texas leads the nation in flash flood fatalities. From 1959 through 2009 there were three times more fatalities in Texas (840) than the following state Pennsylvania (265). Texas also leads the nation in flood-related injuries (7753). Flood fatalities in Texas represent a serious public health problem. This study addresses several objectives of Healthy People 2010 including reducing deaths from motor vehicle accidents (Objective 15-15), reducing nonfatal motor vehicle injuries (Objective 15-17), and reducing drownings (Objective 15-29). The study examined flood fatalities that occurred in Texas between 1959 and 2008. Flood fatality statistics were extracted from three sources: flood fatality databases from the National Climatic Data Center, the Spatial Hazard Event and Loss Database for the United States, and the Texas Department of State Health Services. The data collected for flood fatalities include the date, time, gender, age, location, and type of flood. Inconsistencies among the three databases were identified and discussed. Analysis reveals that most fatalities result from driving into flood water (77%). Spatial analysis indicates that more fatalities occurred in counties containing major urban centers – some of the Flash Flood Alley counties (Bexar, Dallas, Travis, and Tarrant), Harris County (Houston), and Val Verde County (Del Rio). An intervention strategy targeting the behavior of driving into flood water is proposed. The intervention is based on the Health Belief model. The main recommendation of the study is that flood fatalities in Texas can be reduced through a combination of improved hydrometeorological forecasting, educational programs aimed at enhancing the public awareness of flood risk and the seriousness of flood warnings, and timely and appropriate action by local emergency and safety authorities.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective. The goal of this study is to characterize the current workforce of CIHs, the lengths of professional practice careers of the past and current CIHs.^ Methods. This is a secondary data analysis of data compiled from all of the nearly 50 annual roster listings of the American Board of Industrial Hygiene (ABIH) for Certified Industrial Hygienists active in each year since 1960. Survival analysis was performed as a technique to measure the primary outcome of interest. The technique which was involved in this study was the Kaplan-Meier method for estimating the survival function.^ Study subjects: The population to be studied is all Certified Industrial Hygienists (CIHs). A CIH is defined by the ABIH as an individual who has achieved the minimum requirements for education, working experience and through examination, has demonstrated a minimum level of knowledge and competency in the prevention of occupational illnesses. ^ Results. A Cox-proportional hazards model analysis was performed by different start-time cohorts of CIHs. In this model we chose cohort 1 as the reference cohort. The estimated relative risk of the event (defined as retirement, or absent from 5 consecutive years of listing) occurred for CIHs for cohorts 2,3,4,5 relative to cohort 1 is 0.385, 0.214, 0.234, 0.299 relatively. The result show that cohort 2 (CIHs issued from 1970-1980) has the lowest hazard ratio which indicates the lowest retirement rate.^ Conclusion. The manpower of CIHs (still actively practicing up to the end of 2009) increased tremendously starting in 1980 and grew into a plateau in recent decades. This indicates that the supply and demand of the profession may have reached equilibrium. More demographic information and variables are needed to actually predict the future number of CIHs needed. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives. The central objective of this study was to systematically examine the internal structure of multihospital systems, determining the management principles used and the performance levels achieved in medical care and administrative areas.^ The Universe. The study universe consisted of short-term general American hospitals owned and operated by multihospital corporations. Corporations compared were the investor-owned (for-profit) and the voluntary multihospital systems. The individual hospital was the unit of analysis for the study.^ Theoretical Considerations. The contingency theory, using selected aspects of the classical and human relations schools of thought, seemed well suited to describe multihospital organization and was used in this research.^ The Study Hypotheses. The main null hypotheses generated were that there are no significant differences between the voluntary and the investor-owned multihospital sectors in their (1) hospital structures and (2) patient care and administrative performance levels.^ The Sample. A stratified random sample of 212 hospitals owned by multihospital systems was selected to equally represent the two study sectors. Of the sampled hospitals approached, 90.1% responded.^ The Analysis. Sixteen scales were constructed in conjunction with 16 structural variables developed from the major questions and sub-items of the questionnaire. This was followed by analysis of an additional 7 structural and 24 effectiveness (performance) measures, using frequency distributions. Finally, summary statistics and statistical testing for each variable and sub-items were completed and recorded in 38 tables.^ Study Findings. While it has been argued that there are great differences between the two sectors, this study found that with a few exceptions the null hypotheses of no difference in organizational and operational characteristics of non-profit and for-profit hospitals was accepted. However, there were several significant differences found in the structural variables: functional specialization, and autonomy were significantly higher in the voluntary sector. Only centralization was significantly different in the investor owned. Among the effectiveness measures, occupancy rate, cost of data processing, total manhours worked, F.T.E. ratios, and personnel per occupied bed were significantly higher in the voluntary sector. The findings indicated that both voluntary and for-profit systems were converging toward a common hierarchical corporate management approach. Factors of size and management style may be better descriptors to characterize a specific multihospital group than its profit or nonprofit status. (Abstract shortened with permission of author.) ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clinical Research Data Quality Literature Review and Pooled Analysis We present a literature review and secondary analysis of data accuracy in clinical research and related secondary data uses. A total of 93 papers meeting our inclusion criteria were categorized according to the data processing methods. Quantitative data accuracy information was abstracted from the articles and pooled. Our analysis demonstrates that the accuracy associated with data processing methods varies widely, with error rates ranging from 2 errors per 10,000 files to 5019 errors per 10,000 fields. Medical record abstraction was associated with the highest error rates (70–5019 errors per 10,000 fields). Data entered and processed at healthcare facilities had comparable error rates to data processed at central data processing centers. Error rates for data processed with single entry in the presence of on-screen checks were comparable to double entered data. While data processing and cleaning methods may explain a significant amount of the variability in data accuracy, additional factors not resolvable here likely exist. Defining Data Quality for Clinical Research: A Concept Analysis Despite notable previous attempts by experts to define data quality, the concept remains ambiguous and subject to the vagaries of natural language. This current lack of clarity continues to hamper research related to data quality issues. We present a formal concept analysis of data quality, which builds on and synthesizes previously published work. We further posit that discipline-level specificity may be required to achieve the desired definitional clarity. To this end, we combine work from the clinical research domain with findings from the general data quality literature to produce a discipline-specific definition and operationalization for data quality in clinical research. While the results are helpful to clinical research, the methodology of concept analysis may be useful in other fields to clarify data quality attributes and to achieve operational definitions. Medical Record Abstractor’s Perceptions of Factors Impacting the Accuracy of Abstracted Data Medical record abstraction (MRA) is known to be a significant source of data errors in secondary data uses. Factors impacting the accuracy of abstracted data are not reported consistently in the literature. Two Delphi processes were conducted with experienced medical record abstractors to assess abstractor’s perceptions about the factors. The Delphi process identified 9 factors that were not found in the literature, and differed with the literature by 5 factors in the top 25%. The Delphi results refuted seven factors reported in the literature as impacting the quality of abstracted data. The results provide insight into and indicate content validity of a significant number of the factors reported in the literature. Further, the results indicate general consistency between the perceptions of clinical research medical record abstractors and registry and quality improvement abstractors. Distributed Cognition Artifacts on Clinical Research Data Collection Forms Medical record abstraction, a primary mode of data collection in secondary data use, is associated with high error rates. Distributed cognition in medical record abstraction has not been studied as a possible explanation for abstraction errors. We employed the theory of distributed representation and representational analysis to systematically evaluate cognitive demands in medical record abstraction and the extent of external cognitive support employed in a sample of clinical research data collection forms. We show that the cognitive load required for abstraction in 61% of the sampled data elements was high, exceedingly so in 9%. Further, the data collection forms did not support external cognition for the most complex data elements. High working memory demands are a possible explanation for the association of data errors with data elements requiring abstractor interpretation, comparison, mapping or calculation. The representational analysis used here can be used to identify data elements with high cognitive demands.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Detailed data on land use and land cover constitute important information for Earth system models, environmental monitoring and ecosystem services research. Global land cover products are evolving rapidly; however, there is still a lack of information particularly for heterogeneous agricultural landscapes. We censused land use and land cover field by field in the agricultural mosaic catchment Haean in South Korea. We recorded the land cover types with additional information on agricultural practice. In this paper we introduce the data, their collection and the post-processing protocol. Furthermore, because it is important to quantitatively evaluate available land use and land cover products, we compared our data with the MODIS Land Cover Type product (MCD12Q1). During the studied period, a large portion of dry fields was converted to perennial crops. Compared to our data, the forested area was underrepresented and the agricultural area overrepresented in MCD12Q1. In addition, linear landscape elements such as waterbodies were missing in the MODIS product due to its coarse spatial resolution. The data presented here can be useful for earth science and ecosystem services research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, forward seismic modelling of four geological models with Hydrocarbon (HC) traps were performed by ray tracing method to produce synthetic seismogram of each model. The idea is to identify the Hydrocarbon Indicators (HCI‟s) such as bright spot, flat spot, dim spot and Bottom Simulating Reflector (BSR) in the synthethic seismogram. The modelling was performed in DISCO/FOCUS 5.0 seismic data processing programme. Strong positive and negative reflection amplitudes and some artifact reflection horizons were observed on produced seismograms due to rapid changes in subsurface velocity and geometry respectively Additionally, Amplitude-versus-angle (AVA) curves of each HCIs was calculated by the Crewes Zoeppritz Explorer programme. AVA curves show that how the reflection coefficients change with the density and the P and S wave velocities of each layer such as oil, gas, gas hydrate or water saturated sediments. Due to AVA curves, an increase in reflection amplitude with incident angle of seismic waves corresponds to an indicator of a hydrocarbon reservoir

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Una de las principales líneas de investigación de la economía urbana es el comportamiento del mercado inmobiliario y sus relaciones con la estructura territorial. Dentro de este contexto, la reflexión sobre el significado del valor urbano, y abordar su variabilidad, constituye un tema de especial importancia, dada la relevancia que ha supuesto y supone la actividad inmobiliaria en España. El presente estudio ha planteado como principal objetivo la identificación de aquellos factores, ligados a la localización que explican la formación del valor inmobiliario y justifican su variabilidad. Definir este proceso precisa de una evaluación a escala territorial estableciendo aquellos factores de carácter socioeconómico, medioambiental y urbanístico que estructuran el desarrollo urbano, condicionan la demanda de inmuebles y, por tanto, los procesos de formación de su valor. El análisis se centra en valores inmobiliarios residenciales localizados en áreas litorales donde la presión del sector turístico ha impulsado un amplio. Para ello, el ámbito territorial seleccionado como objeto de estudio se sitúa en la costa mediterránea española, al sur de la provincia de Alicante, la comarca de la Vega Baja del Segura. La zona, con una amplia diversidad ecológica y paisajística, ha mantenido históricamente una clara distinción entre espacio urbano y espacio rural. Esta dicotomía ha cambiado drásticamente en las últimas décadas, experimentándose un fuerte crecimiento demográfico y económico ligado a los sectores turístico e inmobiliario, aspectos que han tenido un claro reflejo en los valores inmobiliarios. Este desarrollo de la comarca es un claro ejemplo de la política expansionista de los mercados de suelo que ha tenido lugar en la costa española en las dos últimas décadas y que derivado en la regeneración de un amplio tejido suburbano. El conocimiento del marco territorial ha posibilitado realizar un análisis de variabilidad espacial mediante un tratamiento masivo de datos, así como un análisis econométrico que determina los factores que se valoran positivamente y negativamente por el potencial comprador. Estas relaciones permiten establecer diferentes estructuras matemáticas basadas en los modelos de precios hedónicos, que permiten identificar rasgos diferenciales en los ámbitos económico, social y espacial y su incidencia en el valor inmobiliario. También se ha sistematizado un proceso de valoración territorial a través del análisis del concepto de vulnerabilidad estructural, entendido como una situación de fragilidad debida a circunstancias tanto sociales como económicas, tanto actual como de tendencia en el futuro. Actualmente, esta estructura de demanda de segunda residencia y servicios ha mostrado su fragilidad y ha bloqueado el desarrollo económico de la zona al caer drásticamente la inversión en el sector inmobiliario por la crisis global de la deuda. El proceso se ha agravado al existir un tejido industrial marginal al que no se ha derivado inversiones importantes y un abandono progresivo de las explotaciones agropecuarias. El modelo turístico no sería en sí mismo la causa del bloqueo del desarrollo económico comarcal, sino la forma en que se ha implantado en la Costa Blanca, con un consumo del territorio basado en el corto plazo, poco respetuoso con aspectos paisajísticos y medioambientales, y sin una organización territorial global. Se observa cómo la vinculación entre índices de vulnerabilidad y valor inmobiliario no es especialmente significativa, lo que denota que las tendencias futuras de fragilidad no han sido incorporadas a la hora de establecer los precios de venta del producto inmobiliario analizado. El valor muestra una clara dependencia del sistema de asentamiento y conservación de las áreas medioambientales y un claro reconocimiento de tipologías propias del medio rural aunque vinculadas al sector turístico. En la actualidad, el continuo descenso de la demanda turística ha provocado una clara modificación en la estructura poblacional y económica. Al incorporar estas modificaciones a los modelos especificados podemos comprobar un verdadero desmoronamiento de los valores. Es posible que el remanente de vivienda construida actualmente vaya dirigido a un potencial comprador que se encuentra en retroceso y que se vincula a unos rasgos territoriales ya no existentes. Encontrar soluciones adaptables a la oferta existente, implica la viabilidad de renovación del sistema poblacional o modificaciones a nivel económico. La búsqueda de respuestas a estas cuestiones señala la necesidad de recanalizar el desarrollo, sin obviar la potencialidad del ámbito. SUMMARY One of the main lines of research regarding the urban economy focuses on the behavior of the real estate market and its relationship to territorial structure. Within this context, one of the most important themes involves considering the significance of urban property value and dealing with its variability, particularly given the significant role of the real estate market in Spain, both in the past and present. The main objective of this study is to identify those factors linked to location, which explain the formation of property values and justify their variability. Defining this process requires carrying out an evaluation on a territorial scale, establishing the socioeconomic, environmental and urban planning factors that constitute urban development and influence the demand for housing, thereby defining the processes by which their value is established. The analysis targets residential real estate values in coastal areas where pressure from the tourism industry has prompted large-scale transformations. Therefore, the focal point of this study is an area known as Vega Baja del Segura, which is located on the Spanish Mediterranean coast in southern Alicante (province). Characterized by its scenic and ecological diversity, this area has historically maintained a clear distinction between urban and rural spaces. This dichotomy has drastically changed in past decades due to the large increase in population attributed to the tourism and real estate markets – factors which have had a direct effect on property values. The development of this area provides a clear example of the expansionary policies which have affected the housing market on the coast of Spain during the past two decades, resulting in a large increase in suburban development. Understanding the territorial framework has made it possible to carry out a spatial variability analysis through massive data processing, as well as an econometric analysis that determines the factors that are evaluated positively and negatively by potential buyers. These relationships enable us to establish different mathematical systems based on hedonic pricing models that facilitate the identification of differential features in the economic, social and spatial spheres, and their impact on property values. Additionally, a process for land valuation was established through an analysis of the concept of structural vulnerability, which is understood to be a fragile situation resulting from either social or economic circumstances. Currently, this demand structure for second homes and services has demonstrated its fragility and has inhibited the area’s economic development as a result of the drastic fall in investment in the real estate market, due to the global debt crisis. This process has been worsened by the existence of a marginal industrial base into which no important investments have been channeled, combined with the progressive abandonment of agricultural and fishing operations. In and of itself, the tourism model did not inhibit the area’s economic development, rather it is the result of the manner in which it was implemented on the Costa Brava, with a land consumption based on the short-term, lacking respect for landscape and environmental aspects and without a comprehensive organization of the territory. It is clear that the link between vulnerability indexes and property values is not particularly significant, thereby indicating that future fragility trends have not been incorporated into the problem in terms of establishing the sale prices of the analyzed real estate product in question. Urban property values are clearly dependent on the system of development and environmental conservation, as well as on a clear recognition of the typologies that characterize rural areas, even those linked to the tourism industry. Today, the continued drop in tourism demand has provoked an obvious modification in the populational and economic structures. By incorporating these changes into the specified models, we can confirm a real collapse in values. It’s possible that the surplus of already-built homes is currently being marketed to a potential buyer who is in recession and linked to certain territorial characteristics that no longer exist. Finding solutions that can be adapted to the existing offer implies the viability of renewing the population system or carrying out modifications on an economic level. The search for answers to these questions suggests the need to reform the development model, without leaving out an area’s potentiality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stereo video techniques are effective for estimating the space–time wave dynamics over an area of the ocean. Indeed, a stereo camera view allows retrieval of both spatial and temporal data whose statistical content is richer than that of time series data retrieved from point wave probes. We present an application of the Wave Acquisition Stereo System (WASS) for the analysis of offshore video measurements of gravity waves in the Northern Adriatic Sea and near the southern seashore of the Crimean peninsula, in the Black Sea. We use classical epipolar techniques to reconstruct the sea surface from the stereo pairs sequentially in time, viz. a sequence of spatial snapshots. We also present a variational approach that exploits the entire data image set providing a global space–time imaging of the sea surface, viz. simultaneous reconstruction of several spatial snapshots of the surface in order to guarantee continuity of the sea surface both in space and time. Analysis of the WASS measurements show that the sea surface can be accurately estimated in space and time together, yielding associated directional spectra and wave statistics at a point in time that agrees well with probabilistic models. In particular, WASS stereo imaging is able to capture typical features of the wave surface, especially the crest-to-trough asymmetry due to second order nonlinearities, and the observed shape of large waves are fairly described by theoretical models based on the theory of quasi-determinism (Boccotti, 2000). Further, we investigate space–time extremes of the observed stationary sea states, viz. the largest surface wave heights expected over a given area during the sea state duration. The WASS analysis provides the first experimental proof that a space–time extreme is generally larger than that observed in time via point measurements, in agreement with the predictions based on stochastic theories for global maxima of Gaussian fields.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A basic requirement of the data acquisition systems used in long pulse fusion experiments is the real time physical events detection in signals. Developing such applications is usually a complex task, so it is necessary to develop a set of hardware and software tools that simplify their implementation. This type of applications can be implemented in ITER using fast controllers. ITER is standardizing the architectures to be used for fast controller implementation. Until now the standards chosen are PXIe architectures (based on PCIe) for the hardware and EPICS middleware for the software. This work presents the methodology for implementing data acquisition and pre-processing using FPGA-based DAQ cards and how to integrate these in fast controllers using EPICS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los avances en el hardware permiten disponer de grandes volúmenes de datos, surgiendo aplicaciones que deben suministrar información en tiempo cuasi-real, la monitorización de pacientes, ej., el seguimiento sanitario de las conducciones de agua, etc. Las necesidades de estas aplicaciones hacen emerger el modelo de flujo de datos (data streaming) frente al modelo almacenar-para-despuésprocesar (store-then-process). Mientras que en el modelo store-then-process, los datos son almacenados para ser posteriormente consultados; en los sistemas de streaming, los datos son procesados a su llegada al sistema, produciendo respuestas continuas sin llegar a almacenarse. Esta nueva visión impone desafíos para el procesamiento de datos al vuelo: 1) las respuestas deben producirse de manera continua cada vez que nuevos datos llegan al sistema; 2) los datos son accedidos solo una vez y, generalmente, no son almacenados en su totalidad; y 3) el tiempo de procesamiento por dato para producir una respuesta debe ser bajo. Aunque existen dos modelos para el cómputo de respuestas continuas, el modelo evolutivo y el de ventana deslizante; éste segundo se ajusta mejor en ciertas aplicaciones al considerar únicamente los datos recibidos más recientemente, en lugar de todo el histórico de datos. En los últimos años, la minería de datos en streaming se ha centrado en el modelo evolutivo. Mientras que, en el modelo de ventana deslizante, el trabajo presentado es más reducido ya que estos algoritmos no sólo deben de ser incrementales si no que deben borrar la información que caduca por el deslizamiento de la ventana manteniendo los anteriores tres desafíos. Una de las tareas fundamentales en minería de datos es la búsqueda de agrupaciones donde, dado un conjunto de datos, el objetivo es encontrar grupos representativos, de manera que se tenga una descripción sintética del conjunto. Estas agrupaciones son fundamentales en aplicaciones como la detección de intrusos en la red o la segmentación de clientes en el marketing y la publicidad. Debido a las cantidades masivas de datos que deben procesarse en este tipo de aplicaciones (millones de eventos por segundo), las soluciones centralizadas puede ser incapaz de hacer frente a las restricciones de tiempo de procesamiento, por lo que deben recurrir a descartar datos durante los picos de carga. Para evitar esta perdida de datos, se impone el procesamiento distribuido de streams, en concreto, los algoritmos de agrupamiento deben ser adaptados para este tipo de entornos, en los que los datos están distribuidos. En streaming, la investigación no solo se centra en el diseño para tareas generales, como la agrupación, sino también en la búsqueda de nuevos enfoques que se adapten mejor a escenarios particulares. Como ejemplo, un mecanismo de agrupación ad-hoc resulta ser más adecuado para la defensa contra la denegación de servicio distribuida (Distributed Denial of Services, DDoS) que el problema tradicional de k-medias. En esta tesis se pretende contribuir en el problema agrupamiento en streaming tanto en entornos centralizados y distribuidos. Hemos diseñado un algoritmo centralizado de clustering mostrando las capacidades para descubrir agrupaciones de alta calidad en bajo tiempo frente a otras soluciones del estado del arte, en una amplia evaluación. Además, se ha trabajado sobre una estructura que reduce notablemente el espacio de memoria necesario, controlando, en todo momento, el error de los cómputos. Nuestro trabajo también proporciona dos protocolos de distribución del cómputo de agrupaciones. Se han analizado dos características fundamentales: el impacto sobre la calidad del clustering al realizar el cómputo distribuido y las condiciones necesarias para la reducción del tiempo de procesamiento frente a la solución centralizada. Finalmente, hemos desarrollado un entorno para la detección de ataques DDoS basado en agrupaciones. En este último caso, se ha caracterizado el tipo de ataques detectados y se ha desarrollado una evaluación sobre la eficiencia y eficacia de la mitigación del impacto del ataque. ABSTRACT Advances in hardware allow to collect huge volumes of data emerging applications that must provide information in near-real time, e.g., patient monitoring, health monitoring of water pipes, etc. The data streaming model emerges to comply with these applications overcoming the traditional store-then-process model. With the store-then-process model, data is stored before being consulted; while, in streaming, data are processed on the fly producing continuous responses. The challenges of streaming for processing data on the fly are the following: 1) responses must be produced continuously whenever new data arrives in the system; 2) data is accessed only once and is generally not maintained in its entirety, and 3) data processing time to produce a response should be low. Two models exist to compute continuous responses: the evolving model and the sliding window model; the latter fits best with applications must be computed over the most recently data rather than all the previous data. In recent years, research in the context of data stream mining has focused mainly on the evolving model. In the sliding window model, the work presented is smaller since these algorithms must be incremental and they must delete the information which expires when the window slides. Clustering is one of the fundamental techniques of data mining and is used to analyze data sets in order to find representative groups that provide a concise description of the data being processed. Clustering is critical in applications such as network intrusion detection or customer segmentation in marketing and advertising. Due to the huge amount of data that must be processed by such applications (up to millions of events per second), centralized solutions are usually unable to cope with timing restrictions and recur to shedding techniques where data is discarded during load peaks. To avoid discarding of data, processing of streams (such as clustering) must be distributed and adapted to environments where information is distributed. In streaming, research does not only focus on designing for general tasks, such as clustering, but also in finding new approaches that fit bests with particular scenarios. As an example, an ad-hoc grouping mechanism turns out to be more adequate than k-means for defense against Distributed Denial of Service (DDoS). This thesis contributes to the data stream mining clustering technique both for centralized and distributed environments. We present a centralized clustering algorithm showing capabilities to discover clusters of high quality in low time and we provide a comparison with existing state of the art solutions. We have worked on a data structure that significantly reduces memory requirements while controlling the error of the clusters statistics. We also provide two distributed clustering protocols. We focus on the analysis of two key features: the impact on the clustering quality when computation is distributed and the requirements for reducing the processing time compared to the centralized solution. Finally, with respect to ad-hoc grouping techniques, we have developed a DDoS detection framework based on clustering.We have characterized the attacks detected and we have evaluated the efficiency and effectiveness of mitigating the attack impact.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Process mineralogy provides the mineralogical information required by geometallurgists to address the inherent variation of geological data. The successful benefitiation of ores mostly depends on the ability of mineral processing to be efficiently adapted to the ore characteristics, being liberation one of the most relevant mineralogical parameters. The liberation characteristics of ores are intimately related to mineral texture. Therefore, the characterization of liberation necessarily requieres the identification and quantification of those textural features with a major bearing on mineral liberation. From this point of view grain size, bonding between mineral grains and intergrowth types are considered as the most influential textural attributes. While the quantification of grain size is a usual output of automated current technologies, information about grain boundaries and intergrowth types is usually descriptive and difficult to quantify to be included in the geometallurgical model. Aiming at the systematic and quantitative analysis of the intergrowth type within mineral particles, a new methodology based on digital image analysis has been developed. In this work, the ability of this methodology to achieve a more complete characterization of liberation is explored by the analysis of chalcopyrite in the rougher concentrate of the Kansanshi copper-gold mine (Zambia). Results obtained show that the method provides valuable textural information to achieve a better understanding of mineral behaviour during concentration processes. The potential of this method is enhanced by the fact that it provides data unavailable by current technologies. This opens up new perspectives on the quantitative analysis of mineral processing performance based on textural attributes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stereo video techniques are effective for estimating the space-time wave dynamics over an area of the ocean. Indeed, a stereo camera view allows retrieval of both spatial and temporal data whose statistical content is richer than that of time series data retrieved from point wave probes. To prove this, we consider an application of the Wave Acquisition Stereo System (WASS) for the analysis of offshore video measurements of gravity waves in the Northern Adriatic Sea. In particular, we deployed WASS at the oceanographic platform Acqua Alta, off the Venice coast, Italy. Three experimental studies were performed, and the overlapping field of view of the acquired stereo images covered an area of approximately 1100 m2. Analysis of the WASS measurements show that the sea surface can be accurately estimated in space and time together, yielding associated directional spectra and wave statistics that agree well with theoretical models. From the observed wavenumber-frequency spectrum one can also predict the vertical profile of the current flow underneath the wave surface. Finally, future improvements of WASS and applications are discussed.