Video demo of a graph database running on Neo4j. We plot the URLs (nodes) connected via tags (edges)

# Introduction

Video demo of a graph database running on Neo4j. We plot the URLs (nodes) connected via tags (edges)
youtube.com/watch?v=9wOhWyHsoHU
The node we choose to analyse is the URL of a live web page of rtve.es. RTVE is the website of the state-owned public Spanish television and media corporation.

# Proof of concept

We showcase the power of graph databases querying large corpora of websites. The demo shows how we find the interconnections of individual URLs with other in a corpus of 211,224 nodes (individual URLs), 4,338,066 properties and 13,321,040 relationships.

# Description of the video

We load the web interface of the console - a local server with the Neo4j - Graph Database Kernel 1.9.RC2

min 00:08
On the search box we enter node number 19. This node contains the URL rtve.es/alacarta/videos/telediario/telediario-1-horas-11-05-13/1814352.shtml
We find 22 tags on this node. The tags relate this note with as many URLs.

min. 00:18
The tags assigned to the web page are: "section, january, year, news, etc"

min 00:26
We search for the tag "rtve". This tag connects 51 nodes related to it.

min 00:33
We select the tag "la ultima hora" (breaking news). This tag connects three external nodes: eleconomista.es and expansion - 2 media titles.

min 01:01
We iterate the selection of tags by selecting all of them. We plot the connections between them and their nodes.

min 1:30
We plot the relations of yet another node, number 382. This is a finance and markets news website - bolsamania.com. The mechanics of the queries are similar to the ones explained above for the URL of RTVE.

Loading more stuff…

Hmm…it looks like things are taking a while to load. Try again?

Loading videos…