Batch Import Performance with Grails and MySQL

I found on Burt Beckwith’s blog, that there are 2 separate leaks, one of them is in the hibernate first-level cache, the other is a map that Grails uses for domain object validation errors.

Normally, a grails hibernate session executes something quickly and returns. During importing, we do a ton of processing, all with the same hibernate session. All of these objects that would normally be garbage collected when the session closed are piling up.

The easiest way to deal with this is to create a simple method to clear out these collections periodically.

We can modify our BookService to clean up GORM after every 100 books we insert:

class BookService {
 
      def sessionFactory
      def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
 
      def importBooksInLibrary(library) {
            library.eachWithIndex { Map bookValueMap, index ->
                  updateOrInsertBook(bookValueMap)
                  if (index % 100 == 0) cleanUpGorm()
            }
      }    
 
      def cleanUpGorm() {
            def session = sessionFactory.currentSession
            session.flush()
            session.clear()
            propertyInstanceMap.get().clear()
      }
 
      def updateOrInsertBook(Map bookValueMap) {
            // ... same as above
      }
} 

http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-my...

Groovy: Execute a process and use the standard input pipe

Here's a short example how to execute a process in Groovy and use the standard input pipe to support the process with input data. On the Linux command line it is very easy:

$ echo "FOOBAR" | file -

It's a little bit more complicated in Groovy:

def baos = new ByteArrayOutputStream()

def out = new BufferedOutputStream(baos)
out << "FOOBAR"
out.close()

def proc = "file -".execute()
proc.out << baos.toByteArray()
proc.out.close()

println proc.text

A more sophisticated example for piping processes in Groovy can be found here.

UPDATE:

In my original code the input data is stored in a ByteArrayOutputStream (image data). With text input you can shorten it even more by using:

proc.out << "FOOBAR"

How-to fix user data and temp folder issues in Eclipse + SpringSource Tool Suite 2.6.0 + Grails

Recently I had two problems in using the Eclipse + STS + Grails for which I wanted to share the fixes that worked for me. I use the following software versions:

Eclipse 3.6.2 Helios Service Release 2 (SR2)
SpringSource Tool Suite (STS) 2.52./2.6.0.RC3
Grails 1.3.7

1)  Because I had moved my user home to another partition (D:), STS had some problems building the correct GRAILS_ROOT, because it expects it to be user.home + ".grails" which does not work if user.home is missing the trailing slash. A workaround is to set user.home yourself when starting Eclipse. You can do this by adding this to your eclipse.ini file: -Duser.home=D:\

2) When I executed a Grails command in the Grails Command Prompt (Alt + Strg + Shift + G) I ended up with an IOException: Access denied. This was because for some unknown reason, the java.io.tmpdir was set to C:\Windows\ in the Grails Command Prompt. Running the command with a Run Configuration or in the cmd.exe worked fine. The fix was to add an environment variable in the workspace:

  • Window > Preferences > Groovy > Grails > Grails Launch > New Variable 
  • Variable: java.io.tmpdir
  • Value: C:\Users\<USER>\AppData\Local\Temp\

NOTE: Both bugs seem to be fixed in the new 2.6.1 release of STS (Number 1 might even be fixed in 2.6.0, I don't remember exactly what version I had installed back then)!

How to disable Windows Media Player 12 scan of Windows 7 Libraries

Based on this post I found a method to disable Windows Media Player 12 to scan the Windows 7 Libraries:

  1. Navigate to c:\Users\[your name]\AppData\Local\Microsoft\Media Player
  2. Make sure that Windows Media Player is not running (also check the Windows Media Player Network Sharing Service)
  3. Make the following files read-only (right click -> Properties -> Read-Only):
    - CurrentDatabase_372.wmdb
    - LocalMLS_[0..4].wmdb
  4. OPTIONAL: To clean your Media Library, delete the files first and recreate them by hand.

Now, Windows Media Player 12 should not scan your libraries anymore. At least it worked for me.

Reminder to self: Never ever forget the trailing newline when working with OpenSSL on the command line!

Whenever you work with OpenSSL on the Linux command line and when you are sending the plain data by using echo and pipes, DO NOT FORGET that echo adds a trailing newline at the end of the input! Especially when you're using it to test implementations of hash and other cryptographic functions (like RSA, Elliptic curve cryoptography, etc.) in your own code! Thinking about the trailing newline can save you from several hours of searching for nonexistent bugs in your code when e.g. a signature does not verify as expected.

If you create the data from the plaintext "foo" in your code you get a byte array with 3 items: (C#)

byte[] plainBytes = Encoding.UTF8.GetBytes("foo");
// is { 0x66, 0x6F, 0x6F } 

But if you echo the word "foo" in the Linux command line, a trailing newline is automatically added:

echo "foo" | hexdump -C
00000000  66 6f 6f 0a                                       |foo.|

In order to avoid this problem, be sure to ALWAYS call echo with the -n switch which disables the trailing newline and you get the desired result:

echo -n "foo" | hexdump -C
00000000  66 6f 6f                                          |foo|

BoxCryptor :: On-the-fly encryption for cloud storage

Cloud storage is a great thing and I really love my Dropbox, but the absence of personal encryption was always a bumper for me. Although Dropbox (and many other cloud storage providers like Box.net) encrypt the data on their storage backend, you can not be 100% sure that your data is confidential because they own the key and are always able to access your data. They are not 'zero-knowledge' providers like Spideroak or Wuala!

That's why it is mandatory to encrypt all confidential files already on the client side when using a cloud storage provider without 'zero-knowledge'. The Dropbox Wiki offers some good tools to accomplish this requirement. Noteably these are: FreeOTFE, Truecrypt, EncFS and SecurStick. FreeOTFE and Truecrypt are quite popular, but use container-based files which occupy all the initial disk space and can not grow and/or shrink. SecurStick and EncFS are great because they work file-based and encrypt each file on its own. However EncFS is only available for Linux and MacOS users and SecurStick is based on WebDAV, besides that I didn't like the overall handling of SecurStick.

As I was not happy with any of the existing solutions, I developed my own encryption tool which provides a similar experience to Windows users as EncFS does for Linux and MacOS users. BoxCryptor is a cryptographic virtual harddisk that encrypts all data On-the-fly in real-time. Encrypted data is stored transparently in an arbitrary directory.

(download)
Save files to the virtual harddisk and BoxCryptor encrypts them on-the-fly and stores the encrypted files in a directory or your choice, e.g. a directory that is synced to the cloud like your DropBox folder. When reading files from the virtual harddisk, BoxCryptor decrypts them on-the-fly so that they are accessible as any other unencrypted file. BoxCryptor takes care that all data is automatically encrypted and decrypted just before it is loaded or saved.

BoxCryptor currently encrypts files using a modified version of the RC4 encryption algorithm. I'm planning to support AES encryption in the near future, but the stream cipher RC4 was easier to implement for the beginning.

Be aware that this is  a very early release of BoxCryptor which is intended for testing purposes only! You should not yet use it for productive use! The current version 0.1.0 Alpha of BoxCryptor expires on 31st March 2011 and will provide read-only access to the encrypted files past this date. An updated and more stable version of BoxCryptor should be availble until then.

Go to the BoxCryptor website.