T ~/weka-3-7-9/data/ReutersCorn-test.arff \
The command line can get a bit messy, though: (I chose RandomForest here, because it is a lot faster than NN). Multiple filtersĪdding multiple filter is also no problem, that is what the MultiFilter is for. A class must be assigned via -c, for WEKA default behaviour use -c last.īut ReplaceMissingValue is an unsupervised filter, as is StringToWordVector. taking advantage of the class information. You have to omit the -c last, because the ReplaceMissingValue filter doesn't like it.Ĭlasses below in the class hierarchy are for supervised filtering, i.e. I know, you copied it from my answer to another question, but I also just noticed it. The command line you posted in your question contains an error. There is also a new Weka documentation site. I understand that you want to classify text files, so you should also have a look at Text categorization with WEKA. Weka is not really the shining example of documentation, but you can still find valuable information about it on their sites.
I know how to change it to another classifier though (like NB or libSVM so that is good).īut I am not sure how to add multiple filters in one call as I also need to add the StringToWordVector filter (and possibly the Reorder filter to make the class the last, instead of first attribute).Īnd then how do I get it actually output me the prediction labels of each class? And then store so those in an arff with the initial data. I am also not going to be using MLP as NN tend to be too slow when I have a few thousand features from the text data.
Running that code gives me "Illegal option -c last" and I am not sure why. t "training_file_with_missing_values.arff" I have this code as a starting block java -classpath weka.jar arff files, one for training, one for testing and get an output of predictions for the missing labels in the test data. I find documentation is poor and I am struggling to figure out a few things to do.
The process is like with Unix/Linux systems, but since the host system is Win32 and therefore the Java installation also a Windows application, you'll have to use the semicolon as separator for several jars.I am fairly new to Weka and even more new to Weka on the command line. Note: the prefixing with $CLASSPATH adds the mysql jar at the end of the currently existing CLASSPATH. Unix/Linux uses the colon : as path separator, in contrast to Windows, which uses the semicolon. Setenv CLASSPATH $CLASSPATH:/home/johndoe/jars/mysql-connector-java-5.1.6-bin.jar Open a shell and execute the following command, depending on the shell you're using:Įxport CLASSPATH=$CLASSPATH:/home/johndoe/jars/mysql-connector-java-5.1.6-bin.jar I assume, that the mysql jar is located in the following directory: /home/johndoe/jars/ If you want to add additional jars, you'll have to separate them with the path separator, the semicolon (no spaces!). Enter the following name for the variable CLASSPATHĪnd add this value C:\Program Files\Weka-3-8\mysql-connector-java-5.1.6-bin.jar There you will find a button called Environment Variables, click it.ĭepending on, whether you're the only person using this computer or it is a lab computer shared by many, you can either create a new system-wide (you are the only user) environment variable or a user dependent one (recommended for multi-user machines).
In the Control Panel click on System (or right click on This PC and select Properties) and then go to the Advanced tab.
We assume that the mysql-connector-java-5.1.6-bin.jar archive is located in the following directory: C:\Program Files\Weka-3-8 In the following we add the mysql-connector-java-5.1.6-bin.jar to our CLASSPATH variable (this works for any other jar archive) to make it possible to access MySQL Databases via JDBC. ANT offers a nice way for building (and separating source code and class files) Java projects.īut still, if you're only working on totally separate projects, it might be easiest for you to use the environment variable. The CLASSPATH would just mess up things, if you're not careful (or just forget to remove an entry). I, personally, never use the environment variable, since I'm working often on a project in different versions in parallel. Since Java does the search in a ''first-come-first-serve'' kind of manner, you'll have to take care where and what to put in your CLASSPATH. The CLASSPATH environment variable tells Java where to look for classes.