Geo-search engines and location-based services allow to query for points-of-interest (POIs) in a certain region or next to the current user location. Hereby, search queries often ask for classes ('hotels New York', 'supermarket Berlin', 'Italian restaurant London') rather than single points ('Hotel Belvedere New York'). In OpenStreetMap (OSM), one can specify the basic class along with every POI e.g. via the amenity tag (amenity=fast_food), via direct tags (shop=supermarket) or several other specialized tags, as the cuisine tag for restaurants. These tags are mandatory for a certain POI to show up among the search results for a class-based query. Moreover they are useful to categorize search results, e.g. searching for 'Venice beach' should inform the user that there are beaches, hotels, fitness studios and clothing stores with that name.
Unfortunately in OSM, there are plenty of POIs where the class is not provided. But many of those POIs exhibit a name tag ('Sunset Hotel', 'Wal Mart') which already contains some information about the respective class.
In this paper, we investigate methods for automatic extrapolation of class, amenity and specialized tags solely based on POI names. For example, 'Pizzaria Bella Italia' most certainly indicates an Italian restaurant while 'Tapas Bar' indicates Spanish food. We use machine learning tools to extract for many amenities typical words and phrases that occur in associated name tags and learn respective POI classifiers. For example, learning indicators for 'shop=hairdresser' on German OSM tags led to high scores for 'fris', 'cut', hair' and 'haar'. While 'studio' and 'design' also appeared in many name tags, they are not suitable to distiguish between 'shop=hairdresser' and 'shop=beauty' with the latter including nail spas. For other kinds of POIs as supermarkets or gas stations, names of large chains ('ALDI', 'Aral') showed up as typical indicators.
We empirically prove that with the help of our learned classifiers, tags for POIs with unknown class can be extrapolated with high accuracy. For example, amongst all hairdressers 8% were untagged but could be identified by our approach.