|
Important note on April 15th, 2008: run-svm.pl is being replaced by run-svm-text.pl. run-svm.pl will remain available until April 15th, 2009.
Click here to download the run-svm.pl script
run-svm.pl is an evolving tool created first by Dustin Hillard and now maintaince is evolving towards my responsibility as he heads off to Yahoo! to work on their search effort. The authors make no warranties about the use of this software and any user should assume that it does not work to fulfill their needs. The code is copyright Dustin Hillard and Stephen Purpura and you should contact me if you wish to use it for commercial purposes. Use for non-commercial purposes is not restricted, but (for academic research) please cite the following paper when you use the script:
Purpura, S., Hillard D. “Automated Classification of Congressional Legislation." Proceedings of the Seventh International Conference on Digital Government Research. San Diego, CA.
The script linked from this web site control's Thorsten Joachim's SVM-light but it takes as input a training text file and a testing text file. The training text file provides learning examples while the testing text file provides records that the user wishes to make predictions against. The format of the files is pipe delimited: CLASS | TEXT
run-svm.pl must be run at least twice, once to build classification models from the training samples and a second time to generate predictions against the training set. Sample command lines include:
run-svm -input-train TRAIN_FILE -result-dir tmp-test/test-results -stem
-model tmp-test/train-models/ -vocab-file tmp-test/train-feats/vocab -ver
-ver -feature-dir tmp-test/train-feats
run-svm -input-test TEST_FILE -result-dir tmp-test/test-results -stem
-model tmp-test/train-models/ -vocab-file tmp-test/train-feats/vocab -ver
-ver -feature-dir tmp-test/test-feats
Contact me if you wish to contribute, file bugs, or suggest new features.
Special thanks to the following people for their contributions:
- Dustin Hillard
- Thorsten Joachims
- Claire Cardie
- Eric Breck
|