Date post: | 06-Jul-2015 |
Category: |
Engineering |
Upload: | sonam05 |
View: | 76 times |
Download: | 1 times |
Presented bySonam (10103470)
Stack Overflow is a question andanswer site for professional andenthusiast programmers.
Tags are user-generated labels/keywords
for entities that summarize the features of
the questions from different views
Questions that are not related to programming
topics are marked ‘closed’ by experienced users
and community moderators
Questions that are
deleted/locked by
experienced users and
community
moderators
•Tag recommendation to questions being posted on Stack Overflow
•Prediction of ‘closed’ question at post creation time
•Prediction of ‘deleted’ question after deletion
•Easier question posting
•Better organization of the site
•Feedback to question asker
•Community moderator assistance
•Feedback to Moderator/owner
•Whether it should worth deletion or remain undeleted
Database Snapshot
•TF.IDF WEIGHTING•NAÏVE BAYES CLASSIFICATION•SVM CLASSIFICATION•K- NEAREST NEIGHBOR CLASSIFICATION
•Flow chart of tag prediction
Following graph shows the comparison of accuracy with andwithout feedback.
Represents accuracies corresponding to each post for therecommendation of 1 tag,2 tags, top 3, top 4, and top5 tags
Accuracies of full system for Tag reccomending system with the variation of tags
•RANDOM FOREST CLASSIFIER•ADABOOST CLASSIFIER•EXTRATREES CLASSIFIER
•Score of post• User’s reputation• Age of user account• Score of other posts of user• Post content
•Flow chart of Closed Question
Graph shows the importance of different features basis on Random Forest
Graph shows the importance of different features basis on AdaBoost
Graph shows the importance of different features basis on ExtraTrees
Following graph shows the comparison of accuracy with different number of features
Following graph shows the comparison of accuracy with different number of estimators
Comparison between three classifiers
On the basis of closed question found:
Accuracy comparison:
On the basis of estimators:
Accuracy comparison:
On the basis of changing training set count :
•RANDOM FOREST CLASSIFIER•ADABOOST CLASSIFIER•EXTRATREES CLASSIFIER
•Score of post• User’s reputation• Age of user account• Score of other posts of user• Post content
•Flow chart of Deletion
Though deleted questions are mostly on relevant. These are removed by reputed
authors which do this for saving their reputation on stack overflow.
Accuracy comparison:
On the basis of estimators:
Accuracy comparison:
On the basis of changing training set count :
•Tag recommendation has been implemented with and without feedback. We found that we achieve better accuracy with feedback.
•‘Closed’ question prediction has been implemented with three different classifiers and along with different number of features and estimators. We found that we achieve better accuracy in Adaboost.
•Same for deleted questions we found with all three classifiers and resulted that many questions are worth deletion but some require to get back.
• Increase the accuracy of our algorithms.
• Predicting the trend on stack overflow.
•Predicting & finding the unanswered question.
•Predict the quality of answers with non textual features.