971 resultados para ALS data-set
Resumo:
This dataset consists of 2D footprints of the buildings in the metropolitan Boston area, based on tiles in the orthoimage index (orthophoto quad ID: 229890, 229894, 229898, 229902, 233886, 233890, 233894, 233898, 233902, 237890, 237894, 237898, 237902, 241890, 241894, 241898, 241902, 245898, 245902). This data set was collected using 3Di's Digital Airborne Topographic Imaging System II (DATIS II). Roof height and footprint elevation attributes (derived from 1-meter resolution LIDAR (LIght Detection And Ranging) data) are included as part of each building feature. This data can be combined with other datasets to create 3D representations of buildings and the surrounding environment.
Resumo:
The present data set includes 268,127 vertical in situ fluorescence profiles obtained from several available online databases and from published and unpublished individual sources. Metadata about each profiles are given in the file provided here in further details. The majority of profiles comes from the National Oceanographic Data Center (NODC) and the fluorescence profiles acquired by Bio-Argo floats available on the Oceanographic Autonomous Observations (OAO) platform (63.7% and 12.5% respectively).
Different modes of acquisition were used to collect the data presented in this study: (1) CTD profiles are acquired using a fluorometer mounted on a CTD-rosette; (2) OSD (Ocean Station Data) profiles are derived from water samples and are defined as low resolution profiles; (3) the UOR (Undulating Oceanographic Recorder) profiles are acquired by a
Resumo:
Acoustic and pelagic trawl data were collected during various pelagic surveys carried out by IFREMER in May between 2000 and 2012 (except 2001), on the eastern continental shelf of the Bay of Biscay (Pelgas series). The acoustic data were collected with a Simrad EK60 echosounder operating at 38 kHz (beam angle at -3 dB: 7°, pulse length set to 1.024 ms). The echosounder transducer was mounted on the vessel keel, at 6 m below the sea surface. The sampling design were parallel transects spaced 12 nm apart which were orientated perpendicular to the coast line from 20 m to about 200 m bottom depth. The nominal sailing speed was 10 knots and 3 knots on average during fishing operations. The scrutinising (species identification) of acoustic data was done by first characterising acoustic schools by type and then linking these types with the species composition of specific trawl hauls. The data set contains nautical area backscattering values, biomass and abundance estimates for blue whiting for one nautical mile long transect lines. Further information on the survey design, scrutinising and biomass estimation can be found in Doray et al. 2012.
Resumo:
The Arctic Ocean System is a key player regarding the climatic changes of Earth. Its highly sensitive ice Cover, the exchange of surface and deep water masses with the global ocean and the coupling with the atmosphere interact directly with global climatic changes. The output of cold, polar water and sea ice influences the production of deep water in the North Atlantic and controls the global ocean circulation ("the conveyor belt"). The Arctic Ocean is surrounded by the large Northern Hemisphere ice sheets which not only affect the sedimentation in the Arctic Ocean but also are supposed to induce the Course of glacials and interglacials. Terrigenous sediment delivered from the ice sheets by icebergs and meltwater as well as through sea ice are major components of Arctic Ocean sediments. Hence, the terrigenous content of Arctic Ocean sediments is an outstanding archive to investigate changes in the paleoenvironment. Glazigenic sediments of the Canadian Arctic Archipelago and surface samples of the Arctic Ocean and the Siberian shelf regions were investigated by means of x-ray diffraction of the bulk fraction. The source regions of distinct mineral compositions were to be deciphered. Regarding the complex circumpolar geology stable christalline shield rocks, active and ancient fold belts including magmatic and metamorphic rocks, sedimentary rocks and wide periglacial lowlands with permafrost provide a complete range of possible mineral combinations. Non- glaciated shelf regions mix the local input from a possible point source of a particular mineral combination with the whole shelf material and function as a sampler of the entire region draining to the shelf. To take this into account, a literature research was performed. Descriptions of outcropping lithologies and Arctic Ocean sediments were scanned for their mineral association. The analyses of glazigenic and shelf sediments yielded a close relationship between their mineral composition and the adjacent source region. The most striking difference between the circumpolar source regions is the extensive outcrop of carbonate rocks in the vicinity of the Canadian Arctic Archipelago and in N Greenland while siliciclastic sediments dominate the Siberian shelves. In the Siberian shelf region the eastern Kara Sea and the western Laptev Sea form a destinct region defined by high smectite, (clino-) pyroxene and plagioclase input. The source of this signal are the extensive outcrops of the Siberian trap basalt in the Putorana Plateau which is drained by the tributaries of the Yenissei and Khatanga. The eastern Laptev Sea and the East Siberian Sea can also be treated as one source region containing a feldspar, quartz, illite, mica, and chlorite asscciation combined with the trace minerals hornblende and epidote. Franz Josef Land provides a mineral composition rich in quartz and kaolinite. The diverse rock suite of the Svalbard archipelago distributes specific mineral compositions of highly metamorphic christalline rocks, dolomite-rich carbonate rocks and sedimentary rocks with a higher diagenetic potential manifested in stable newly built diagenetic minerals and high organic maturity. To reconstruct the last 30,000 years as an example of the transition between glacial and interglacial conditions a profile of sediment cores, recovered during the RV Polarstern" expedition ARK-VIIIl3 (ARCTIC '91), and additional sediment cores around Svalbard were investigated. Besides the mineralogy of different grain size fractions several additional sedimentological and organo-geochemical Parameterswere used. A detailed stratigraphic framework was achieved. By exploiting this data set changes in the mineral composition of the Eurasian Basin sediments can be related to climatic changes. Certain mineral compositions can even be associated with particular transport processes, e.g. the smectitel pyroxene association with sea ice transport from the eastern Kara Sea and the western Laptev Sea. Hence, it is possible to decipher the complex interplay between the influx of warm Atlantic waters into the Southwest of the Eurasian Basin, the waxing and waning of the Svalbard1Barents- Sea- and Kara-Sea-Ice-Sheets, the flooding of the Siberian shelf regions and the surface and deep water circulation. Until now the Arctic Ocean was assumed to be a rather stable System during the last 30,000 years which only switched from a completely ice covered situation during the glacial to seasonally Open waters during the interglacial. But this work using mineral assemblages of sediment cores in the vicinity of Svalbard revealed fast changes in the inflow of warm Atlantic water with the Westspitsbergen Current (< 1000 years), short periods of advances and retreats of the marine based Eurasian ice sheets (1000-3000 years), and short melting phases (400 years?). Deglaciation of the marine-based Eurasian and the land-based north American and Greenland ice sheets are not simultaneous. This thesis postulates that the Kara Sea Ice Sheet released an early meltwater signal prior to 15,000 14C years leading the Barents Sea Ice Sheet while the western land-based ice sheets are following later than 13,500 14C years. The northern Eurasian Basin records the shift between iceberg and sea-ice material derived from the Canadian Arctic Archipelago and N-Greenland and material transported by sea-ice and surface currents from the Siberian shelf region. The phasing of the deglaciation becomes very obvious using the dolomite and quartd phyllosilicate record. It is also supposed that the flooding of the Laptev Sea during the Holocene is manifested in a stepwise increase of sediment input at the Lomonosov Ridge between the Eurasian and Amerasian Basin. Depending on the strength of meltwater pulses from the adjacent ice sheets the Transpolar Drift can probably be relocated. These movements are traceable by the distribution of indicator minerals. Based on the outcome of this work the feasibility of bulk mineral determination can be qualified as excellent tool for paleoenvironmental reconstructions in the Arctic Ocean. The easy preparation and objective determination of bulk mineralogy provided by the QUAX software bears the potential to use this analyses as basic measuring method preceding more time consuming and highly specialised mineralogical investigations (e.g. clay mineralogy, heavy mineral determination).
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
To account for the preponderance of zero counts and simultaneous correlation of observations, a class of zero-inflated Poisson mixed regression models is applicable for accommodating the within-cluster dependence. In this paper, a score test for zero-inflation is developed for assessing correlated count data with excess zeros. The sampling distribution and the power of the test statistic are evaluated by simulation studies. The results show that the test statistic performs satisfactorily under a wide range of conditions. The test procedure is further illustrated using a data set on recurrent urinary tract infections. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.
Resumo:
Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.
Resumo:
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.
Resumo:
We analyse how the Generative Topographic Mapping (GTM) can be modified to cope with missing values in the training data. Our approach is based on an Expectation -Maximisation (EM) method which estimates the parameters of the mixture components and at the same time deals with the missing values. We incorporate this algorithm into a hierarchical GTM. We verify the method on a toy data set (using a single GTM) and a realistic data set (using a hierarchical GTM). The results show our algorithm can help to construct informative visualisation plots, even when some of the training points are corrupted with missing values.
Resumo:
Hierarchical visualization systems are desirable because a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex high-dimensional data sets. We extend an existing locally linear hierarchical visualization system PhiVis [1] in several directions: bf(1) we allow for em non-linear projection manifolds (the basic building block is the Generative Topographic Mapping -- GTM), bf(2) we introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree, bf(3) we describe folding patterns of low-dimensional projection manifold in high-dimensional data space by computing and visualizing the manifold's local directional curvatures. Quantities such as magnification factors [3] and directional curvatures are helpful for understanding the layout of the nonlinear projection manifold in the data space and for further refinement of the hierarchical visualization plot. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. We demonstrate the visualization system principle of the approach on a complex 12-dimensional data set and mention possible applications in the pharmaceutical industry.
Resumo:
This article is aimed primarily at eye care practitioners who are undertaking advanced clinical research, and who wish to apply analysis of variance (ANOVA) to their data. ANOVA is a data analysis method of great utility and flexibility. This article describes why and how ANOVA was developed, the basic logic which underlies the method and the assumptions that the method makes for it to be validly applied to data from clinical experiments in optometry. The application of the method to the analysis of a simple data set is then described. In addition, the methods available for making planned comparisons between treatment means and for making post hoc tests are evaluated. The problem of determining the number of replicates or patients required in a given experimental situation is also discussed. Copyright (C) 2000 The College of Optometrists.
Resumo:
When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.