Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Entropy Tree and Random Forest Classifier
Date
November 2023
Location
Arlington, Texas
This Python script implements a decision tree and random forest for classification tasks. The decision tree is constructed using the ID3 algorithm with an option for optimization. The random forest is an ensemble of decision trees, with the option to create a forest of a specific number of trees. The project is designed for classification tasks where the input data consists of features and labels.
Here's a breakdown of the code:
Classes:
Node_Decisiont:
Represents a node in the decision tree.
Attributes:
feat: Feature index for the split.
tshld: Threshold value for the split.
left: Left subtree.
right: Right subtree.
value: Class label for leaf nodes.
Functions:
load_data(file_path):
Loads data from a file.
Assumes each line in the file contains space-separated feature values followed by the label.
inf_gain(y, left_y, right_y):
Calculates information gain based on entropy.
calculate_entropy(y):
Calculates entropy for a set of class labels.
log_base_2(x):
Calculates the base-2 logarithm.
build_tree(feats, labels, option, depth=0, max_depth=5):
Constructs a decision tree based on the specified option (optimized or randomized).
Implements stopping conditions such as reaching maximum depth or having a pure leaf.
find_best_split(feats, labels):
Finds the best split for the decision tree using information gain.
find_random_split(feats, labels):
Finds a random split for the decision tree.
predict(tree, x):
Predicts the class label for a given input based on the decision tree.
predict_forest(forest, x):
Predicts the class label for a given input based on the random forest.
evlt(tree, feats, labels):
Evaluates the accuracy of a decision tree on a given dataset.
evlt_forest(forest, feats, labels):
Evaluates the accuracy of a random forest on a given dataset.
main(training_file, test_file, option):
Main function that orchestrates loading data, building, and evaluating the model.
print_results(index, predicted_class, true_class, accy):
Prints results for each object in the dataset, including the predicted and true class labels.
print_results_for_model(model, feats, labels):
Prints results for a single decision tree model.
print_results_for_forest(model, feats, labels):
Prints results for a random forest model.
build_and_evaluate_tree(train_feats, train_labels, test_feats, test_labels, option):
Builds and evaluates a single decision tree model.
build_and_evaluate_forest(train_feats, train_labels, test_feats, test_labels, num_trees):
Builds and evaluates a random forest model.




