Because of code confidentiality, this page only demonstrates the framework

Machine Learning: Forum Post Sorting

Piazza is a learning management system which allows students to ask questions in a forum-type format. Instructors can set up “tags” for the class content, such as “Assignment 1,” “Project 2,” “Midterm Exam,” “Class Logistic,” etc. Each post shoud be properly categorized by the student who posts it, or it should be automatically categorized by the ai(“Potatobot”).
piazza1

Piazza Dataset

Forum theme: EECS 280 (the second largest course at University of Michigan)

Labels:

By topic: "Project 1," "Project 2," "Project 3," "Project 4," "Project 5," "Midterm," "Final"

By author: "Instructor," "Student"

Train: W14-W16, EECS280 Piazza Post, with proper labels

Validation: SP16, EECS280 Piazza Post

Test: W17, EECS280 Piazza Post

(Post content was downloaded by crawlers)

(Index categorization is based on binary search tree, the searching is based on map)


Model Construction

Set: texts are considered unordered and non-repetitive

Conditional Probability: we write P(A) to denote the probability (between 0 and 1) that some event A will occur. We write P(A|B) to denote the probability that event A will occur given that we already know event B has occurred.



Prediction Model

piazza3 piazza4

Result Demonstruation

Small size sample run:

trained on 8 examples

test data:
  correct = euchre, predicted = euchre, log-probability score = -13.7
  content = my code segfaults when bob is the dealer

  correct = euchre, predicted = calculator, log-probability score = -12.5
  content = no rational explanation for this bug

  correct = calculator, predicted = calculator, log-probability score = -13.6
  content = countif function in stack class not working

performance: 2 / 3 posts predicted correctly


Midium size sample run:

trained on 2552 examples
test data:
  correct = exam, predicted = exam, log-probability score = -162
  content = final exam scores have been released and regrade requests are open please take a look at the solutions before submitting a regrade request solutions have been posted on the google drive you have until sunday at 100 pm to submit a regrade request requests submitted after this time will not be processed exam statistics can be found in the grade statistics thread in 7
  ......
  
performance: 245 / 332 posts predicted correctly


Large size sample run:

trained on 11365 examples

test data:
  correct = instructor, predicted = instructor, log-probability score = -112
  content = we want to express our appreciation for those of you who completed the projects with honesty and integrity this semester 9 cases involving 15 current and 10 past students from our course are being referred to the honor council pin
  ......
  
performance: 2602 / 2988 posts predicted correctly




Related Posts