Algorithms, Part I — Social Network Connectivity (With Union-Find)

Nice blog post about “Social network connectivity” problem that appears in Coursera Algorithms I (

Steel Wings

Q: Given a social network containing N members and a log file containing M timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend … of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be MlogN or better and use extra space proportional to N.

(Java Code on github at the bottom of the post. )

My thoughts and solutions:

When solving dynamic connectivity problems (like whether there is a path between two nodes but we don’t care what the exact path is), we can try using the Union-Find algorithm. And this is a typical application of union-find in social network.

This question asks if we can determine the earliest time at which all…

View original post 391 more words



growth of the network seems to show that connectivity is its 
own reward, and is more valuable than any individual application 
such as mail or the World-Wide Web. [1]


It is about conditional independence

I have been contemplating the reason why learning the weights of ER inference rules separately or jointly, essentially leads to the same weights given the same ground data (evidence and ground truth) as shown in the slides today. To understand the reason behind this I had to dig back to my notes when I was still learning about Markov Networks back in February 2014. While stratification might play a part in this, the actual reason is that the query predicates in my rules are independent from each other given the evidence predicates in the ground Markov Networks. As you recall the ER inference rules has the following form:

Evid1(X) ^ Evid2(X) ^ Evid3(X) => QUERY1(X)

Evid1(X) ^ !Evid2(X) ^ Evid4(X) => QUERY2(X)

Given a domain of values for X, in the ground Markov Network paths from nodes of QUERY1(X) are always conditionally independent from nodes of QUERY2(X) given the nodes of Evid(X), Evid2(X), Evid3(X), and Evid4(X). As a result of this, the ground atoms of QUERY1 and QUERY2 never appear together in the same potential function in the factorized probability density function. Thus the optimization process always optimizes the weights for such rules independently from each other. And as a consequence of this weights are always the same whether they learned together or individually.

On dbpedia-owl vs. dbprop namespaces

If you know about DBPedia then you will know that properties in DBPedia resources have two namespaces:  dbpedia-owl vs. dbprop. The dbprop name space is for properties extracted from the infobox directly.  The dbpedia-owl name space is for mapped properties.  If you are looking for consistent data in your project, then you should consider the use of dbpedia-owl name space properties.


Apache Derby Demo

I was recently looking for a some tutorial on using Embedded Apache Derby (JavaDB). The best example I found was provided by Wayne Pollock and it can be found here Here is the code on that page;

import java.sql.*;
import javax.swing.JOptionPane;

public class DerbyDemo {

  public static void main ( String [] args ) {

     String driver = "org.apache.derby.jdbc.EmbeddedDriver";
     String dbName="DerbyDemoDB";
     String connectionURL = "jdbc:derby:" + dbName + ";create=true";
       // The ";create=true" will create the DB if not created yet.

     String SQL_CreateTable = "create table addresses ( "
      + "ID     int not null generated always as identity "
      + "       (start with 1000), "
      + "lname  varchar(40) not null, fname varchar(40) not null, "
      + "phone  varchar(14), notes varchar(256), "
      + "primary key (ID) )";

     // This SQL inserts three records into the addresses table:
     String SQL_Insert = "insert into addresses "
      + "(lname, fname, phone, notes) values "
      + "('Pollock', 'Wayne', '253-7213', 'Professor'), "
      + "('Piffl', 'Hymie', NULL, 'Fake student name'), "
      + "('Jojo', 'Mojo', NULL, 'Super-villan')";

     String SQL_Query = "SELECT * FROM addresses";

    // Load the Derby Embedded DB driver into the JRE:
    // Note this should not be needed for Java >=6, it is automatic!
    try { new org.apache.derby.jdbc.EmbeddedDriver();
    } catch ( Exception e ) {
       System.out.println( "**** Cannot load Derby Embedded DB driver!" );
    Connection con = null;
    Statement stmnt = null;

    // Try to connect to the DB:
    try {
      con = DriverManager.getConnection( connectionURL );
    } catch ( Exception e ) {
        System.err.println( "**** Cannot open connection to "
          + dbName + "!" );

    // Drop (delete) the table if it exists.  This is common for demo code,
    // otherwise every time you run the code, it keeps adding copies of the
    // data.  Current versions of Derby throw an Exception if you try to drop
    // a non-existing table, so check if it is there first:

    if ( tableExists( con, "addresses" ) )  {
      System.out.println ( "Dropping table addresses..." );
        try {
            stmnt = con.createStatement();
            stmnt.executeUpdate( "DROP TABLE addresses" );
        } catch ( SQLException e ) {
            String theError = e.getSQLState();
            System.out.println( "Can't drop table: " + theError );

    // Create the table addresses if it doesn't exist:
    if ( ! tableExists( con, "addresses" ) )  {
      System.out.println ( "Creating table addresses..." );
      try {
        stmnt = con.createStatement();
        stmnt.execute( SQL_CreateTable );
      } catch ( SQLException e ) {
        String theError = e.getSQLState();
        System.out.println( "Can't create table: " + theError );

    // Insert records into table (Note if you run this code twice
    // the same people get added but with different IDs):
    try {
      stmnt = con.createStatement();
      System.out.println ( "Inserting rows into table addresses..." );
      stmnt.executeUpdate( SQL_Insert );  // Add some rows
    } catch ( SQLException e ) {
        String theError = e.getSQLState();
        System.out.println( "Can't insert rows in table: " + theError );

    // Query the table and display the results:
    try {
      stmnt = con.createStatement();
      // This is dangerous if the query string contains any external text!
      ResultSet rs = stmnt.executeQuery( SQL_Query );
      displayResults( rs );

      // When not using your own data in SQL statement, you should use
      // PreparedStatements instead of Statements, to prevent SQL injection
      // attacks (a common security vulnerability in "textbook-quality"
      // code).  Here's an example to query the table with untrusted user data:

      // The SQL Query to use (note case-insensitive comparison):
      String dangerousQuery =

      // Create a prepared statement to use:
      PreparedStatement pStmnt = con.prepareStatement( dangerousQuery );

      // Get the last name to query for, from the user:
      String lastName = JOptionPane.showInputDialog(
         "Please enter your name: " );

      if ( lastName != null ) {
        // Safely substitute data for "?" in query:
        // (Note there are many type-checking set* methods, e.g. "setInt")
        pStmnt.setString( 1, lastName );
        ResultSet lastNameSearchResults = pStmnt.executeQuery();
        System.out.println( "\n\tResults of last name query for " + lastName );
        displayResults( lastNameSearchResults );

    } catch ( SQLException e ) {
        String theError = e.getSQLState();
        System.out.println("Can't query table: " + theError );

    // Shut down all databases and the Derby engine, when done.  Note,
    // Derby always throws an Exception when shutdown, so ignore it:
    System.out.println ( "Shutting down the database..." );
    try {
    } catch ( SQLException e ) {} // empty: ignore exception

    // Note that nothing breaks if you don't cleanly shut down Derby, but
    // it will start in recovery mode next time (which takes longer to start).


  // Derby doesn't support the standard SQL views.  To see if a table
  // exists you normally query the right view and see if any rows are
  // returned (none if no such table, one if table exists).  Derby
  // does support a non-standard set of views which are complicated,
  // but standard JDBC supports a DatabaseMetaData.getTables method.
  // That returns a ResultSet but not one where you can easily count
  // rows by "rs.last(); int numRows = rs.getRow()".  Hence the loop.

  private static boolean tableExists ( Connection con, String table ) {
    int numRows = 0;
    try {
      DatabaseMetaData dbmd = con.getMetaData();
      // Note the args to getTables are case-sensitive!
      ResultSet rs = dbmd.getTables( null, "APP", table.toUpperCase(), null);
      while( ) ++numRows;
    } catch ( SQLException e ) {
        String theError = e.getSQLState();
        System.out.println("Can't query DB metadata: " + theError );
    return numRows > 0;

  private static void displayResults ( ResultSet rs ) {
    // Collect meta-data:
    try {
      ResultSetMetaData meta = rs.getMetaData();
      String catalog = meta.getCatalogName(1);
      String schema  = meta.getSchemaName(1);
      String table   = meta.getTableName(1);
      int numColumns = meta.getColumnCount();

    // Display results:
    System.out.print( "\n\t\t---" );
    if ( catalog != null && catalog.length() > 0 )
       System.out.print( " Catalog: " + catalog );
    if ( schema != null && schema.length() > 0 )
       System.out.print( " Schema: " + schema );

    System.out.println( " Table: " + table + " ---\n" );

    for ( int i = 1; i <= numColumns; ++i )
      System.out.printf( "%-12s", meta.getColumnLabel( i ) );

    while ( )       // Fetch next row, quit when no rows left.
    {   for ( int i = 1; i <= numColumns; ++i )
        {   String val = rs.getString( i );
            if ( val == null )
                val = "(null)";
            System.out.printf( "%-12s", val );
   } catch ( SQLException e ) {
        String theError = e.getSQLState();
        System.out.println("Can't view resultSet: " + theError );


Note on MLN Predicate Categories

Markov Logic Networks (MLNs) is an approach to Statistical Relation Learning (SRL) that combines weighted First Order Logic formuals  and Markov networks to allow probabilistic learning an inference over multi-relational data. Learning and inference in MLNs is supported by a number of algorithms such as Gibbs Sampling, L-BFGS, Voted Perceptron, and MC-SAT. There a number of software package that implement MLNs such as: Alchemy, TuffyMarkov the beast, and ProbCog.

In Markov Logic Networks one can generalize  two dimensions for categorizing the predicates in the MLN. The first is by the component being modelled in the domain. To apply the MLNs approach to a problem domain, we need to recognize that a domain is made up from three components: objects of interest, attributes of the objects, and relations between the objects. Hence a predicate can be categorized as either class predicate, a relation predicate, or an attribute predicate. The second dimension for categorizing predicates in MLNs is by the end-user data need. In most probabilistic inference systems, a user query is composed of a target query and evidence which is used to estimate the probability of the target query.  Hence, a predicate can be categorized as  either an evidence predicate or a query (target) predicate.

In MLNs learning can be done either discriminatively or generatively.  In discriminative learning one or more predicates whose values are unknown during testing is designated as target predicates. The learning optimizes the performance with respect  to such predicates assuming that all the remaining predicates are given. In generative learning all predicates are treated equally.  Discriminative learning is used when we know a head of time what type of prediction we want to make. On the other hand generative learning is used when we want to capture all the aspects of domain.  So from a system point of view the second categorization is essential if we perform discriminative learning of a model.

The first categorization allow us to relate a problem to one or more common Statistical Relations Learning tasks. Take for example collective classification, link prediction, and social networks analysis. In collective classification the goal is predict  the class of the object given the object’s attributes and relations as evidence. Whereas in link prediction that goal is to predict the relation given the class(s) of the object and its relations.  In social networks analysis that target query could be the attributes of the objects or the relations between objects. In general these tasks fall under “classification” in Machine Learning theory.

Handling Persistent RDF models using Jena SDB

Jena API is one of the most commonly used API for handling RDF. It allows for creating different types of RDF models which includes: memory models, file-based models, inferencing models and database-backed models. The older versions of Jena used to be shipped with database subsystem that supported database-backed models . Jena authors later shifted two SDB and TDB. SDB uses an SQL database fir the storage and query of RDF data. As per Jena SDB documentation the use of SDB in new applications is not recommended as it may be pulled from the main build at very short notice. Developers are encouraged to use TDB which is a non-transactional, faster database soultion. This post explains the setup of SDB and how it is used for storing a simple RDF file in a MySQL database.

For this I will assume that you have the latest release of Jena (Jena 2.11.0). I will also assume that you have loaded your IDE and imported Jena API into you newly created project.

The first thing you will need to do is to setup a mysql database. In my case I created a mysql database named “rdfplay”. Now we have the database ready, you should download the maintenance release of SDB. Save and extract the files to you favorite location. Form here on I will assume that you extracted SDB to “/path/to/SDB”. Under this folder you will find a “lib” directory which contains the necessary jar files for running SDB tools. Get your hand into a copy of mysql JDBC driver jar file and place it under “lib” . We will need this we creating the format of our RDF model in MySQL. Next you need to create the store description file for MySQL which will look like this:

@prefix sdb:     <> .
@prefix rdfs:	 <> .
@prefix rdf:     <> .
@prefix ja:      <> .

# MySQL - InnoDB

rdf:type sdb:Store ;
sdb:layout "layout2" ;
sdb:connection ;
sdb:engine "InnoDB" ; # MySQL specific

rdf:type sdb:SDBConnection ;
sdb:sdbType "MySQL" ; # Needed for JDBC URL
sdb:sdbHost "localhost" ;
sdb:sdbName "rdfplay" ; # database name
sdb:driver "com.mysql.jdbc.Driver" ;
sdb:sdbUser "rdfplay" ;
sdb:sdbPassword "password";

Save this file with the name sdb-mysql-innodb.ttl in the SDB folder. Next you need to configure a couple environment variables for creating before configuring the RDF store.

$ export SDBROOT=/path/to/sdb
$ export PATH=$SDBROOT/bin:$PATH
$ export SDB_JDBC=com.mysql.jdbc.Driver

Then you need to run the following to initialize the database. Be aware that this will wipe any existing data in the database. So you may need to check first.

$ sdbconfig --sdb sdb-mysql-innodb.ttl --format

This will create a basic layout in you database. Four tables should be created given that we have used layout2 in our store description file. The tables are: Nodes, Prefixes, Quads and Triples. Now we are ready to load some RDF to our database. Copy the following codes to eclipse and run it. The code simply, connects to the database store, checks of the model in the database (using the URI of the RDF file), if not it reads the RDF in an in-memory model and then adds it to the store. Otherwise it fetches the model from the store. The last statement prints the statements in the model. The code is pretty much self explanatory. Happy coding.

import com.hp.hpl.jena.query.Dataset;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.Resource;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.sdb.SDBFactory;
import com.hp.hpl.jena.sdb.Store;
import com.hp.hpl.jena.util.PrintUtil;

public class DBModelTester {

	private static String MODEL_URI = "";

	public static void main(String[] args) throws Exception {

		// connect to a store
		Store store = SDBFactory.connectStore("sdb-mysql-innodb.ttl");

		// connect to an RDF dataset in the store
		Dataset ds = SDBFactory.connectDataset(store);

		// here is our mode
		Model m = null;

		// check the dataset if it contains our model
		if (!ds.containsNamedModel(MODEL_URI)) { // if not
			System.out.println("Loading Instance --- done once");
			// create an in memory model
			m = ModelFactory.createDefaultModel();
			// read the RDF in my model;
			// add it to the data sore
			ds.addNamedModel(MODEL_URI, m);
		} else { // already in the story - just fetch it
			m = ds.getNamedModel(MODEL_URI);

		// print the model
		printStatements(m, null, null, null);
		// close the data sore


	private static void printStatements(Model m, Resource s, Property p,
			Resource o) {

		for (StmtIterator iter = m.listStatements(s, p, o); iter.hasNext();) {
			Statement stmt =;

			System.out.println(" - " + PrintUtil.print(stmt));