I have used a Raspberry Pi loaded with the Graphviz (http://www.graphviz.org/) module and I feed my Python script a URL and it will spit back a GIS and SVG graph of the comment relationships like this:
The number of borders denotes the size of the comment, eg: 1 x border is less than 500 characters, 2 x borders is 500-1500 characters and 3 x boarders is over 1500 characters long. Just set arbitrarily. The gray nodes are comments that have become orphaned due to their posts being deleted by the moderator. I could look for the next parent up the tree and re-attach them there, but I think they kind of look cool as they are.
Graphviz is a fantastic bit of free software that can be used to display relationships between objects. In the past I have used it to auto generate a HR Org chart of an 300+ employee organisation (data pulled from Active Directory using OpenLDAP) and tracking/monitoring network objects with data pulled from NMAP scan of the network.
Here are a few more interesting graphs:
in the comment tree below, I like how one persons comment sparked a LOT of further comments:
All the nodes are hyper-linked, so if you open a SVG file in a modern browser, you can click on any of the nodes to bring up the actual article, or comment. Unfortunately phpBB won't allow SVG files to be added in, so i can't demo that directly here.
The graphs have been shrunk down to fit into the forum webpage, but they can be rather large. At least a SVG file can be zoomed in with out becoming pixelated, as it's a vector graphics file, as opposed to a GIF file which is a raster image.
How does a Graphviz data structure work?
It's all very simple. You have one line per node to describe how the node looks. Here is an example:
Code: Select all
567820 [colour=black fillcolor=khaki peripheries=1 URL="https://theconversation.com/factcheck-did-carbon-emissions-fall-faster-before-the-carbon-price-36504#comment_567820" label="Casey Jones"];
Above, the first number is kind of a node ID. It can be anything, but has to be unique. Here I have used the comment ID number. The rest of the line is the description of the node and URL etc.
Then you have to show the relationship between the nodes in a separate line like this:
Code: Select all
567820 -> 36504
Above, this is asking Graphviz to automatically draw a line from node id 567820 to node id 36504. In this case, node id: 36504 is the webpage article ID number.
All my python script does is parse the comment tree to find the relationship of the comments and write out the lines to a file which at the end is fed into Graphviz.
That's all I really set out to do, but as I progressed and pondered the project, I now wonder on a few extra enhancements:
1) Have an attempt to analyse the comment text to see if the commenter was: for or against the article or just fed up with the commenter. I have attempted text analysis before, and it's not an easy thing to do (for me), so this will be a low priority.
2) How about crawling over the website and superficially analyse say 2000 articles. These could be checked for:
a) the article URL and comment file URL
b) the number of comments for each article
c) record the keywords or tags for each article.
From this I could have a go at working out the relationship of comments vs. article type, ie: Energy, Immigration, Politics, Arts, Food etc. All very interesting