A little tale: how to debug java based software

13.01.2012.

One of the great problems with open-source tools is lack of good quality documentation and how to guides. This article describes a general installation problem that we had with one of open-source applications (Ephesoft). It describes approaches that were used to find a root cause of the problem without investigating or compiling a source code of the application.

Ephesoft is document organization and classification software built on JEE technology stack. It is typically used for automated mail room processing, contract and mortgage processing, filled form processing etc. It is a typical tool used as a “front-end” to document management system that supports complex workflow, archiving, versioning, records management etc.

The problem that we encountered was that the installer did not manage to create all necessary database objects in our target database. After rounding up the usual suspects (Are DB users created correctly? Are usernames and passwords correct? What about grants? Can the application connect to the database? Is there enough space on the file system? Does log file indicate source of error?, …), we found out little. Application was not able to create the schema.

Well, the first step usually done is to turn log level to debug. Luckily, this web application uses Log4J. So we located log4j.xml, changed thresholds to DEBUG and restarted it. Now we had a lot of entries, but still no clue. Our application seemed to be stuck – there was obviously piece of code that upgrades the database objects from the previous version, but fails if the objects don’t exist in the first place.

Next thing to try out is to locate SQL files that create the schema. After searching for “CREATE TABLE” statements and sql files in the file system and JAR files, no progress was made. Maybe the DDL generation code is not that simple. So we decided to go one level “below” java and used strace command to track which files is our application opening. We also wanted to record what were the SQL statements that our application used against the database. This required a small change in tomcat’s startup.sh that would modify it to log all syscalls with timestamps:

strace –s 256 –t –f –o /tmp/out java ….

Indeed, we found that the code did not even try to create tables. Nor it read any SQL files. Now we were puzzled even more.

By reviewing the code and the log files, we noticed that our application uses hibernate ORM for data access. So there was a breakthrough – this package can “autocreate” schema based on HBM description files or Java annotations. We added this property (hibernate.hbm2ddl.auto) to Ephesoft database configuration, but it seemed that the application simply ignored our setting. We turned back to log4j configuration and enabled DEBUG logging for org.hibernate package. And voila, we found an entry that showed us that hibernate initialized from hibernate.cfg.xml located deeply inside one of the web application JAR files. Well, what we quickly did:

jar xvf $EPHESOFT_DIR/WEB-INF/lib/ephesoft.jar

Added the following key to hibernate.cfg.xml:

<property name="hibernate.hbm2ddl.auto" value="create-drop"/>

And we repackaged the JAR to the original location:

jar cvf $EPHSOFT_DIR/WEB-INF/lib/ephesoft.jar *

After restart of the application there was finally a clue in the log output:

ERROR [org.hibernate.tool.hbm2ddl.SchemaExport] You have an error
in your SQL syntax; check the manual that corresponds to your MySQL server
version for the right syntax to use near 'type=InnoDB' at line 1....

Quick search on the Internet suggested to use org.hibernate.dialect.MySQL5InnoDBDialect instead of org.hibernate.dialect.MySQLInnoDBDialect. After changing the original database configuration file and restart, our application successfully created the schema.

This story illustrates some important aspects of the open-source software:

• it relies on standard libraries and packages (we used our knowledge of Log4J and Hibernate to reconfigure it in order to provide us with the clues)
• it does not hide what is being used
• it is usually straightforward how to extract the source code or setup a debugging session
• it uses numerous configuration files (for its code and included packages)
• the knowledge base is available on the whole Internet – it is not locked as part of the vendor support site
• administrators and developers can reuse the knowledge about subsystems from other applications on the same stack (I think that SAP expert could not help a lot if the task would be to assist installation of PeopleSoft, for example)