Tuesday, November 17, 2009

Installing OpenLink Virtuoso triple store on Unix

Recently I had to install OpenLink virtuoso triple store for the orechem project that I am working for. They have an ocean of documentation. I was suggested by Marlon to put down the steps here for future reference. To learn what exactly Virtuoso is, read here.
This blog is about how I installed it on Unix. Will write down the windows installation in the next blog.

1) Downloaded the source files using wget
wget "http://sourceforge.net/projects/virtuoso/files/virtuoso/5.0.12/virtuoso-opensource-5.0.12.tar.gz/download"

2) untarred . tar -xvzf virtuoso-opensource-5.0.12.tar.gz

3) To generate the configure script it needs lot of other packages. Checked here for the list of package dependencies. Made sure they are installed.

4) cd into the directory created and typed ./configure. By default the install target directories are under /usr/local/ .In order to specify a particular target directory(in my case I created a directory called virtuoso-opensource) type ./configure --prefix pathtodir

5) Typed make (this took around 20-30 mins) followed by make install.

6) Four directories (var, bin, lib, share) were created in the configure target directory.

7) To start the virtuoso server configuration file "virtuoso.ini" is needed. I found that its located at var/lib/virtuoso/db

8) cd bin. Then started the server by typing:
./virtuoso-t -c ~/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.ini -f

10) Accessed the web admin inerface from a web browser at http://gf18.ucs.indiana.edu:8890. I said gf18.ucs.indiana.edu, for I was working remotely on that machine. If thats not the case you can say localhost. 8890 is usually the port at which it is created. If this is not working check here for more information. The first time the virtuoso server starts, it installs Conductor VAD(Virtuoso Application Distribution) and an empty database.

11) On the web admin interface, clicked on Conductor and logged in using the default username dba and default password dba. There are lot of tabs there just explored those. From System Admin, User Accounts tab I created an account with user name schalla and set up a password for it.

12) Now comes the actual uploading of RDF triples. There are several methods described here. I implemented using the WebDAV browser available on the web interface and using "curl" from command line. To upload file from command line I typed :

curl -i -T bio2rdfdemo.rdf http://gf18.ucs.indiana.edu:8890/DAV/home/schalla/rdf_sink/bio2rdfdemo.rdf -u "schalla:password"

I could see the RDF files uploaded into rdf_sink folder. The actual process virtuoso follows is that it uploads RDF files into rdf_sink folder and from there the triples are stored on to the RDF_QUAD table in the database. To learn more about how exactly virtuoso stores triples read here.

13) Once RDF file is uploaded an IRI is to be generated. After a long search got the correct IRI generated, from RDF, Graphs tabs on the web interface. IRI created for the above file was http://local.virt/DAV/home/schalla/rdf_sink/bio2rdfdemo.rdf. Now using this as Default IRI graph in the SPARQL tab on web interface I executed the following sparql query

select DISTINCT ?p
where {?s ?p ?o}

and this query gave all the distinct properties in the graph.
Then tried running the SPARQL query from command line using curl as following

curl -F "query=SELECT DISTINCT ?p FROM WHERE {?s ?p ?o}" http://gf18.ucs.indiana.edu:8890/sparql

14) Port 1111 is the virtuoso DBMS port. This can be accesed using isql. First time when I typed ./isql 1111 dba dba I got an error that said "couldnotSQL connect"
Then checked to see the unixODBC drivers installed, so typed
odbcinst -j got this as output,
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
USER DATA SOURCES..: /globalhome/schalla/.odbc.ini

After some reading on the web here and here got to learn that I need to include the following in the User Data sources i.e. .odbc.ini file.

[LocalVirt]
Driver=/globalhome/schalla/virtuoso-opensource/lib/virtodbc.so
Address=localhost:1111

[ODBC Data Sources]
triples-store=OpenLink Virtuoso

[triples-store]
Driver=OpenLink Virtuoso
Address=localhost:1111

Then again when I typed ./isql 1111 dba dba I got the SQL prompt and I could see the tables by typing "tables;" isql commands are not the same as SQL. Check here for more information.

15) How to terminate this virtuoso server ? I did not know how to do this. Thought there could be some command to do the same, searched a lot in the documentation with no luck, but what I finally did was killed the virtuoso process that was running.

Some of the useful links are here, here and here.

Virtuoso's Sponger RDFizer is a cartridge that can generate RDF data from non-RDF data (could be XML, HTML). I need to convert atom feed into RDF using GRDDL and I need to look into how I do it using sponger does this. Will post that stuff in the next blog.