In the morning there was a keynote given by Lars Rasmussen who got his PhD at the University of Edinburgh, then worked in his own start-up that was bought by Google. At Google he was then working on Google Wave and now he is employed at Facebook, where he is working on Facebook graph search using natural language interface. In his keynote he presented the new Facebook NLP graph search with some interesting queries, e.g. “Photos of my friends who knit”, “Photos of my friends from national parks”, “Restaurants in Sofia by locals”, etc.. He also pointed out some notions of problems in understanding the system by Facebook employees and by public. For example, how would you select people who knit? Their idea was to support obvious query “People who like Knitting”, but public generated questions like “People who knit”, “knitters”, etc. He also presented the development of the system since the beginning of 2 years ago and gave a high level architecture overview. Next to the obvious NLP parts, they also introduce “de-sillyfication” method before doing further NLP processing. Interestingly, for more detailed questions at the end of the talk there were also some engineers from his team that could answer more technical questions.
In the first session I attended to the following talks: A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation by Zhenhua Tian, Hengheng Xiang, Ziqi Liu and Qinghua Zheng, ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling by Egoitz Laparra and German Rigau and Cross-lingual Transfer of Semantic Role Labeling Models by Mikhail Kozhevnikov and Ivan Titov
During the lunch break I visited Pavlina Ivanova and her friend from the association of doctoral candidates in Bulgaria. We had a really nice talk and tomorrow evening we are planning to go around the Sofia center.
In the afternoon I attended to:
Argument Inference from Relevant Event Mentions in Chinese Argument Extraction by Peifeng Li, Qiaoming Zhu and Guodong Zhou
Fine-grained Semantic Typing of Emerging Entities by Ndapandula Nakashole, Tomasz Tylenda and Gerhard Weikum
The talk was about how to detect emerging entities that will become popular. For Out-of-KB Entity Detection, they focused on noun phrases, that can represent: a class/general concept and not an entity, already known within a KB, a new name for an old entity in KB, a new entity, unknown in DB. They use PATTY, which has a collection of 300.000 synsets, e.g. PATTY phrase <musician> released <album>, <music band> released <album>, <company> released <product>. Using PATTY, they propose a probabilistic weight model P(t1,t2|p) = P(t1,t2|p)/P(p), where <*> p <*>. If one of the entities is known in a DB, the IMENOVALCE can also be P(p, t2).
Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction by Barbara Plank and Alessandro Moschitti
Their task is to find binary ACE-2004 (newspaper and broadcast news) type relations. What happens when you change the domain? They propose a term generalization approach and a general syntactic structure. They crawled a pivot corpus from WWW. They have idea to use standard syntactic tree kernel, i.e. similarity between two trees is counting the number of the same subtrees. Issue here: there are similar syntactic structure, but the leaves differ, e.g. mother of two VS. governor from Texas have similar subtrees. So they tried to employ semantic syntactic tree kernel, which allows soft matches between terminal nodes. How is this good for domain adaptation? They focus on two types of semantic similarities: (1) Brown word clusters, they induced 1k clusters from ukWc corpus (Baroni et al.) and (2) — i forgot what — :). Their system workflow looks as follows: Raw Text-> (Charniak) Parser-> Parse Trees with entities-> Tree Kernel based SVMs-> Multi-class classification. They also tested within ACE datasets – train on one, test on other (the thing I wanted to ask :):) because I did the same thing for coreference resolution across different corpora), and as expected, results were lower by about 10%. Interestingly, there is almost no related work on DA (domain adaptation) on RE (I think it is the same for coreference resolution).
During the coffee break I talked to the Baidu people, which say that Baidu is the largest search engine in the world (Maybe, but I am not completely sure about that…). I also found out that “Baidu” is a term from a Chinese poem and means something like: “You search for something and it suddenly appears”. Baidu is also this year’s biggest ACL sponsor and the organizers said we should look a little on sponsor pages, so you as a reader should also gaze a little bit at the following photos:
In the last session I attended to: Smatch: an Evaluation Metric for Semantic Feature Structures by Shu Cai and Kevin Knight, Variable Bit Quantisation for LSH by Sean Moran, Victor Lavrenko and Miles Osborne, Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora by Dhouha Bouamor, Nasredine Semmar and Pierre Zweigenbaum and lastly The Effects of Lexical Resource Quality on Preference Violation Detection by Jesse Dunietz, Lori Levin and Jaime Carbonell
In the last talk the authors have also presented some useful systems that I can use for pre-processing: VerbNet (http://verbs.colorado.edu/~mpalmer/projects/verbnet.html), SemLink (http://verbs.colorado.edu/semlink/), SENNA (http://ml.nec-labs.com/senna/).
In the evening there was a Banquet (aka. dinner) at Sheraton Hotel. I socialized a bit more and found out there are a lot of people from Industry, just observing the conference and some of them having no papers at all. I got to know people from Google (they have quite a lot of papers), Intel, Nuance (Siri uses their speech recognition if you did not know, very successful company), Sony, Microsoft, Nice Systems, …
Some shots:
If you cannot see the videos, you can download them from URLs http://zitnik.si/temp/acl2013_1.mp4, http://zitnik.si/temp/acl2013_2.mp4 and http://zitnik.si/temp/acl2013_3.mp4.
Pictures: