47 resultados para twitter, conversation retrieval
Resumo:
Shoeprint evidence collected from crime scenes can play an important role in forensic investigations. Usually, the analysis of shoeprints is carried out manually and is based on human expertise and knowledge. As well as being error prone, such a manual process can also be time consuming; thus affecting the usability and suitability of shoeprint evidence in a court of law. Thus, an automatic system for classification and retrieval of shoeprints has the potential to be a valuable tool. This paper presents a solution for the automatic retrieval of shoeprints which is considerably more robust than existing solutions in the presence of geometric distortions such as scale, rotation and scale distortions. It addresses the issue of classifying partial shoeprints in the presence of rotation, scale and noise distortions and relies on the use of two local point-of-interest detectors whose matching scores are combined. In this work, multiscale Harris and Hessian detectors are used to select corners and blob-like structures in a scale-space representation for scale invariance, while Scale Invariant Feature Transform (SIFT) descriptor is employed to achieve rotation invariance. The proposed technique is based on combining the matching scores of the two detectors at the score level. Our evaluation has shown that it outperforms both detectors in most of our extended experiments when retrieving partial shoeprints with geometric distortions, and is clearly better than similar work published in the literature. We also demonstrate improved performance in the face of wear and tear. As matter of fact, whilst the proposed work outperforms similar algorithms in the literature, it is shown that achieving good retrieval performance is not constrained by acquiring a full print from a scene of crime as a partial print can still be used to attain comparable retrieval results to those of using the full print. This gives crime investigators more flexibility is choosing the parts of a print to search for in a database of footwear.
Resumo:
This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labeled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different preprocessing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).
Resumo:
The increasing popularity of the social networking service, Twitter, has made it more involved in day-to-day communications, strengthening social relationships and information dissemination. Conversations on Twitter are now being explored as indicators within early warning systems to alert of imminent natural disasters such earthquakes and aid prompt emergency responses to crime. Producers are privileged to have limitless access to market perception from consumer comments on social media and microblogs. Targeted advertising can be made more effective based on user profile information such as demography, interests and location. While these applications have proven beneficial, the ability to effectively infer the location of Twitter users has even more immense value. However, accurately identifying where a message originated from or author’s location remains a challenge thus essentially driving research in that regard. In this paper, we survey a range of techniques applied to infer the location of Twitter users from inception to state-of-the-art. We find significant improvements over time in the granularity levels and better accuracy with results driven by refinements to algorithms and inclusion of more spatial features.
Resumo:
Epidermal growth factor receptor pathway substrate clone 15 (Eps15) is a protein implicated in endocytosis, endosomal protein sorting, and cytoskeletal organization. Its role is, however, still unclear, because of reasons including limitations of dominant-negative experiments and apparent redundancy with other endocytic proteins. We generated Drosophila eps15-null mutants and show that Eps15 is required for proper synaptic bouton development and normal levels of synaptic vesicle (SV) endocytosis. Consistent with a role in SV endocytosis, Eps15 moves from the center of synaptic boutons to the periphery in response to synaptic activity. The endocytic protein, Dap160/intersectin, is a major binding partner of Eps15, and eps15 mutants phenotypically resemble dap160 mutants. Analyses of eps15 dap160 double mutants suggest that Eps15 functions in concert with Dap160 during SV endocytosis. Based on these data, we hypothesize that Eps15 and Dap160 promote the efficiency of endocytosis from the plasma membrane by maintaining high concentrations of multiple endocytic proteins, including dynamin, at synapses.
Resumo:
A RkNN query returns all objects whose nearest k neighbors
contain the query object. In this paper, we consider RkNN
query processing in the case where the distances between
attribute values are not necessarily metric. Dissimilarities
between objects could then be a monotonic aggregate of dissimilarities
between their values, such aggregation functions
being specified at query time. We outline real world cases
that motivate RkNN processing in such scenarios. We consider
the AL-Tree index and its applicability in RkNN query
processing. We develop an approach that exploits the group
level reasoning enabled by the AL-Tree in RkNN processing.
We evaluate our approach against a Naive approach
that performs sequential scans on contiguous data and an
improved block-based approach that we provide. We use
real-world datasets and synthetic data with varying characteristics
for our experiments. This extensive empirical
evaluation shows that our approach is better than existing
methods in terms of computational and disk access costs,
leading to significantly better response times.