Home Page Image    
     

Next Event >
I am speaking on the SMARTER information retrieval project at Cornell on April 25th. Email me for details.

New Tool! >
run-svm-text Perl script

 

 

run-svm.pl

Important note on April 15th, 2008: run-svm.pl is being replaced by run-svm-text.pl. run-svm.pl will remain available until April 15th, 2009.

Click here to download the run-svm.pl script

run-svm.pl is an evolving tool created first by Dustin Hillard and now maintaince is evolving towards my responsibility as he heads off to Yahoo! to work on their search effort. The authors make no warranties about the use of this software and any user should assume that it does not work to fulfill their needs. The code is copyright Dustin Hillard and Stephen Purpura and you should contact me if you wish to use it for commercial purposes. Use for non-commercial purposes is not restricted, but (for academic research) please cite the following paper when you use the script:

Purpura, S., Hillard D. “Automated Classification of Congressional Legislation." Proceedings of the Seventh International Conference on Digital Government Research. San Diego, CA.

INPUT FILE FORMAT and DEPENDENCIES

The script linked from this web site control's Thorsten Joachim's SVM-light but it takes as input a training text file and a testing text file. The training text file provides learning examples while the testing text file provides records that the user wishes to make predictions against. The format of the files is pipe delimited: CLASS | TEXT

run-svm.pl must be run at least twice, once to build classification models from the training samples and a second time to generate predictions against the training set. Sample command lines include:

CREATE TRAINING MODELS

run-svm -input-train TRAIN_FILE -result-dir tmp-test/test-results -stem
-model tmp-test/train-models/ -vocab-file tmp-test/train-feats/vocab  -ver
-ver -feature-dir tmp-test/train-feats

GENERATE TEST PREDICTIONS

run-svm -input-test TEST_FILE -result-dir tmp-test/test-results -stem
-model tmp-test/train-models/ -vocab-file tmp-test/train-feats/vocab  -ver
-ver -feature-dir tmp-test/test-feats

Contact me if you wish to contribute, file bugs, or suggest new features.

Special thanks to the following people for their contributions:

  • Dustin Hillard
  • Thorsten Joachims
  • Claire Cardie
  • Eric Breck


 
   
           
Google