913 resultados para Data-driven analysis
Resumo:
Functional connectivity (FC) as measured by correlation between fMRI BOLD time courses of distinct brain regions has revealed meaningful organization of spontaneous fluctuations in the resting brain. However, an increasing amount of evidence points to non-stationarity of FC; i.e., FC dynamically changes over time reflecting additional and rich information about brain organization, but representing new challenges for analysis and interpretation. Here, we propose a data-driven approach based on principal component analysis (PCA) to reveal hidden patterns of coherent FC dynamics across multiple subjects. We demonstrate the feasibility and relevance of this new approach by examining the differences in dynamic FC between 13 healthy control subjects and 15 minimally disabled relapse-remitting multiple sclerosis patients. We estimated whole-brain dynamic FC of regionally-averaged BOLD activity using sliding time windows. We then used PCA to identify FC patterns, termed "eigenconnectivities", that reflect meaningful patterns in FC fluctuations. We then assessed the contributions of these patterns to the dynamic FC at any given time point and identified a network of connections centered on the default-mode network with altered contribution in patients. Our results complement traditional stationary analyses, and reveal novel insights into brain connectivity dynamics and their modulation in a neurodegenerative disease.
Resumo:
OBJECTIVE: To develop a provisional definition for the evaluation of response to therapy in juvenile dermatomyositis (DM) based on the Paediatric Rheumatology International Trials Organisation juvenile DM core set of variables. METHODS: Thirty-seven experienced pediatric rheumatologists from 27 countries achieved consensus on 128 difficult patient profiles as clinically improved or not improved using a stepwise approach (patient's rating, statistical analysis, definition selection). Using the physicians' consensus ratings as the "gold standard measure," chi-square, sensitivity, specificity, false-positive and-negative rates, area under the receiver operating characteristic curve, and kappa agreement for candidate definitions of improvement were calculated. Definitions with kappa values >0.8 were multiplied by the face validity score to select the top definitions. RESULTS: The top definition of improvement was at least 20% improvement from baseline in 3 of 6 core set variables with no more than 1 of the remaining worsening by more than 30%, which cannot be muscle strength. The second-highest scoring definition was at least 20% improvement from baseline in 3 of 6 core set variables with no more than 2 of the remaining worsening by more than 25%, which cannot be muscle strength (definition P1 selected by the International Myositis Assessment and Clinical Studies group). The third is similar to the second with the maximum amount of worsening set to 30%. This indicates convergent validity of the process. CONCLUSION: We propose a provisional data-driven definition of improvement that reflects well the consensus rating of experienced clinicians, which incorporates clinically meaningful change in core set variables in a composite end point for the evaluation of global response to therapy in juvenile DM.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
In the analysis of multivariate categorical data, typically the analysis of questionnaire data, it is often advantageous, for substantive and technical reasons, to analyse a subset of response categories. In multiple correspondence analysis, where each category is coded as a column of an indicator matrix or row and column of Burt matrix, it is not correct to simply analyse the corresponding submatrix of data, since the whole geometric structure is different for the submatrix . A simple modification of the correspondence analysis algorithm allows the overall geometric structure of the complete data set to be retained while calculating the solution for the selected subset of points. This strategy is useful for analysing patterns of response amongst any subset of categories and relating these patterns to demographic factors, especially for studying patterns of particular responses such as missing and neutral responses. The methodology is illustrated using data from the International Social Survey Program on Family and Changing Gender Roles in 1994.
Resumo:
It is shown how correspondence analysis may be applied to a subset of response categories from a questionnaire survey, for example the subset of undecided responses or the subset of responses for a particular category. The idea is to maintain the original relative frequencies of the categories and not re-express them relative to totals within the subset, as would normally be done in a regular correspondence analysis of the subset. Furthermore, the masses and chi-square metric assigned to the data subset are the same as those in the correspondence analysis of the whole data set. This variant of the method, called Subset Correspondence Analysis, is illustrated on data from the ISSP survey on Family and Changing Gender Roles.
Resumo:
We show the equivalence between the use of correspondence analysis (CA)of concadenated tables and the application of a particular version ofconjoint analysis called categorical conjoint measurement (CCM). Theconnection is established using canonical correlation (CC). The second part introduces the interaction e¤ects in all three variants of theanalysis and shows how to pass between the results of each analysis.
Resumo:
In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).
Resumo:
This study investigated the spatial, spectral, temporal and functional proprieties of functional brain connections involved in the concurrent execution of unrelated visual perception and working memory tasks. Electroencephalography data was analysed using a novel data-driven approach assessing source coherence at the whole-brain level. Three connections in the beta-band (18-24 Hz) and one in the gamma-band (30-40 Hz) were modulated by dual-task performance. Beta-coherence increased within two dorsofrontal-occipital connections in dual-task conditions compared to the single-task condition, with the highest coherence seen during low working memory load trials. In contrast, beta-coherence in a prefrontal-occipital functional connection and gamma-coherence in an inferior frontal-occipitoparietal connection was not affected by the addition of the second task and only showed elevated coherence under high working memory load. Analysis of coherence as a function of time suggested that the dorsofrontal-occipital beta-connections were relevant to working memory maintenance, while the prefrontal-occipital beta-connection and the inferior frontal-occipitoparietal gamma-connection were involved in top-down control of concurrent visual processing. The fact that increased coherence in the gamma-connection, from low to high working memory load, was negatively correlated with faster reaction time on the perception task supports this interpretation. Together, these results demonstrate that dual-task demands trigger non-linear changes in functional interactions between frontal-executive and occipitoparietal-perceptual cortices.
Resumo:
One of the key challenges in the field of nanoparticle (NP) analysis is in producing reliable and reproducible characterisation data for nanomaterials. This study looks at the reproducibility using a relatively new, but rapidly adopted, technique, Nanoparticle Tracking Analysis (NTA) on a range of particle sizes and materials in several different media. It describes the protocol development and presents both the data and analysis of results obtained from 12 laboratories, mostly based in Europe, who are primarily QualityNano members. QualityNano is an EU FP7 funded Research Infrastructure that integrates 28 European analytical and experimental facilities in nanotechnology, medicine and natural sciences with the goal of developing and implementing best practice and quality in all aspects of nanosafety assessment. This study looks at both the development of the protocol and how this leads to highly reproducible results amongst participants. In this study, the parameter being measured is the modal particle size.
Resumo:
Due to the existence of free software and pedagogical guides, the use of data envelopment analysis (DEA) has been further democratized in recent years. Nowadays, it is quite usual for practitioners and decision makers with no or little knowledge in operational research to run themselves their own efficiency analysis. Within DEA, several alternative models allow for an environment adjustment. Five alternative models, each of them easily accessible to and achievable by practitioners and decision makers, are performed using the empirical case of the 90 primary schools of the State of Geneva, Switzerland. As the State of Geneva practices an upstream positive discrimination policy towards schools, this empirical case is particularly appropriate for an environment adjustment. The alternative of the majority of DEA models deliver divergent results. It is a matter of concern for applied researchers and a matter of confusion for practitioners and decision makers. From a political standpoint, these diverging results could lead to potentially opposite decisions. Grâce à l'existence de logiciels en libre accès et de guides pédagogiques, la méthode data envelopment analysis (DEA) s'est démocratisée ces dernières années. Aujourd'hui, il n'est pas rare que les décideurs avec peu ou pas de connaissances en recherche opérationnelle réalisent eux-mêmes leur propre analyse d'efficience. A l'intérieur de la méthode DEA, plusieurs modèles permettent de tenir compte des conditions plus ou moins favorables de l'environnement. Cinq de ces modèles, facilement accessibles et applicables par les décideurs, sont utilisés pour mesurer l'efficience des 90 écoles primaires du canton de Genève, Suisse. Le canton de Genève pratiquant une politique de discrimination positive envers les écoles défavorisées, ce cas pratique est particulièrement adapté pour un ajustement à l'environnement. La majorité des modèles DEA génèrent des résultats divergents. Ce constat est préoccupant pour les chercheurs appliqués et perturbant pour les décideurs. D'un point de vue politique, ces résultats divergents conduisent à des prises de décision différentes selon le modèle sur lequel elles sont fondées.
Resumo:
This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.
Resumo:
Flow cytometry has become a valuable tool in cell biology. By analyzing large number of cells individually using light-scatter and fluorescence measurements, this technique reveals both cellular characteristics and the levels of cellular components. Flow cytometry has been developed to rapidly enumerate cells and to distinguish among different cell stages and structures using multiple staining. In addition to high-speed multiparametric data acquisition, analysis and cell sorting, which allow other characteristics of individual cells to be studied, have increased the interest of researchers in this technique. This chapter gives an overview of the principles of flow cytometry and examples of the application ofthe technique.
Resumo:
L'Oficina de les Nacions Unides contra la Droga i el Delicte en el seu informe "Global Report on Trafficking in Persons" de 2012 recull que "l'explotació sexual és, amb gran diferència, la forma de tràfic de persones detectades amb més freqüència, concretament en xifres un total del 79 % dels casos. Enllaç a l'informe sencer: http://www.unodc.org/documents/data-and-analysis/glotip/Trafficking_in_Persons_2012_web.pdf
Resumo:
Although extensive research has been conducted on urban freeway capacity estimation methods, minimal research has been carried out for rural highway sections, especially sections within work zones. This study attempted to fill that void for rural highways in Kansas, by estimating capacity of rural highway work zones in Kansas. Six work zone locations were selected for data collection and further analysis. An average of six days’ worth of field data was collected, from mid-October 2013 to late November 2013, at each of these work zone sites. Two capacity estimation methods were utilized, including the Maximum Observed 15-minute Flow Rate Method and the Platooning Method divided into 15-minute intervals. The Maximum Observed 15-minute Flow Rate Method provided an average capacity of 1469 passenger cars per hour per lane (pcphpl) with a standard deviation of 141 pcphpl, while the Platooning Method provided a maximum average capacity of 1195 pcphpl and a standard deviation of 28 pcphpl. Based on observed data and analysis carried out in this study, the suggested maximum capacity can be considered as 1500 pcphpl when designing work zones for rural highways in Kansas. This proposed standard value of rural highway work zone capacity could be utilized by engineers and planners so that they can effectively mitigate congestion at or near work zones that would have otherwise occurred due to construction/maintenance.
Resumo:
It is well established that cancer cells can recruit CD11b(+) myeloid cells to promote tumor angiogenesis and tumor growth. Increasing interest has emerged on the identification of subpopulations of tumor-infiltrating CD11b(+) myeloid cells using flow cytometry techniques. In the literature, however, discrepancies exist on the phenotype of these cells (Coffelt et al., Am J Pathol 2010;176:1564-1576). Since flow cytometry analysis requires particular precautions for accurate sample preparation and trustable data acquisition, analysis, and interpretation, some discrepancies might be due to technical reasons rather than biological grounds. We used the syngenic orthotopic 4T1 mammary tumor model in immunocompetent BALB/c mice to analyze and compare the phenotype of CD11b(+) myeloid cells isolated from peripheral blood and from tumors, using six-color flow cytometry. We report here that the nonspecific antibody binding through Fc receptors, the presence of dead cells and cell doublets in tumor-derived samples concur to generate artifacts in the phenotype of tumor-infiltrating CD11b(+) subpopulations. We show that the heterogeneity of tumor-infiltrating CD11b(+) subpopulations analyzed without particular precautions was greatly reduced upon Fc block treatment, dead cells, and cell doublets exclusion. Phenotyping of tumor-infiltrating CD11b(+) cells was particularly sensitive to these parameters compared to circulating CD11b(+) cells. Taken together, our results identify Fc block treatment, dead cells, and cell doublets exclusion as simple but crucial steps for the proper analysis of tumor-infiltrating CD11b(+) cell populations.