What combination of features might I pursue to raise my probability of contract award?
Open WEKA explorer
On pre-process tab find the government_contracts.arff file.
Perform pre-processing
Escape non-enclosure single- and double-quotes (\’, \”) if using a delimited text version.
Check ‘UniqueTransactionID’ and click ‘Remove’. Stating the obvious, there is no value in analysis of a continuous random transaction ID, discretization and local smoothing can lead to overfitting, and it has no predictive value.
If you have saved the arff back into a csv you will have to filter the ZIP code fields RecipientZipCode and PlaceOfPerformanceZipCode back to nominal with the unsupervised attribute filter StringToNominal and DollarsObligated to numeric.
On the Associate tab, select the Apriori algorithm and click ‘start’. The results:
This indicates that selecting for Firm Fixed Price contracts for the VA, if you are located in ZIP 83110 and the work will be performed within ZIP 83110 you may have an advantage in the acquisition.
Can I predict which contracts will likely be awarded in my area?
By Don Krapohl
Open WEKA explorer
On pre-process tab find the government_contracts.arff file.
Perform pre-processing
Escape non-enclosure single- and double-quotes (’, ”) if using a delimited text version.
Check ‘UniqueTransactionID’ and click ‘Remove’. Stating the obvious, there is no value in analysis of a continuous random transaction ID, discretization and local smoothing can lead to overfitting, and it has no predictive value.
If you have saved the arff back into a csv you will have to filter the ZIP code fields RecipientZipCode and PlaceOfPerformanceZipCode back to nominal with the unsupervised attribute filter StringToNominal and DollarsObligated to numeric.
Using the attribute evaluator to explore algorithm merit on the ‘Select Attributes’ tab, use the ClassifierSubsetEval evaluator with the Naïve Bayes algorithm and a RandomSearch search predicting the Product or Service Code (PSC). This yields:
Selected attributes: 2,3,4,6 : 4
ContractPricing
FundingAgency
PlaceofPerformanceZipCode
RecipientZipCode
This indicates the best prediction of a Product or Service Code using the Naïve Bayes algorithm is a 40% (0.407 subset merit) predictive ability if you know these contract attributes.
Using those attributes to predict PSC, select the Classify tab, bayes classifier -> Naïve Bayes, 10-fold cross validation, predict PSC and click ‘Start’. The output will indicate F-measure and other attribute significance by class. An example of a single class result is:
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0 0.014 0 0 0 0.972 REFRIGERATION AND AIR CONDITIONING COMPONENTS
View the threshold for the prediction by right-clicking the result buffer entry at the left, hover over Threshold Curve. Select the “REFRIGERATION AND AIR CONDITIONING COMPONENTS” for example. The curve is as follows:
This shows a 97% predictive accuracy on this class. The F-Measure visualization further supports this:
To see an analogous cluster visualization using Excel and the SQL Server 2008 R2 addins, see my quick article on Activity Clustering on Geography.