This is my last project of my undergrad, so I'm going to try and get it finished this week. I've separated the scripts into a reader and a summarizer. One teaches the algorithm, and the other implements it. I'm slowly growing my corpus day to day. I looked into parsing RSS feeds to increase the corpus size, but I decided I could save more time and retain accuracy by doing it manually. I'm aiming for a corpus that involves 100 articles; right now I'm at 40.
I'm planning on testing different corpus sizes and different algorithms against a grading system of my design. Hopefully I can prove that as the corpus grows, the algorithm gets more effective.
More updates to come.