Text Summarization on the Books of Harry Potter

Dursley stopped dead.Twelve times he clicked the Put-Outer, until the only lights left on the whole street were two tiny pinpricks in the distance, which were the eyes of the cat watching him.Dumbledore slipped the Put-Outer back inside his cloak and set off down the street toward number four, where he sat down on the wall next to the cat."But I c-c-can't stand it — Lily an' James dead — an' poor little Harry off ter live with Muggles —" "Yes, yes, it's all very sad, but get a grip on yourself, Hagrid, or we'll be found," Professor McGonagall whispered, patting Hagrid gingerly on the arm as Dumbledore stepped over the low garden wall and walked to the front door.Dumbledore turned and walked back down the street.Luhn SummarizerOne of the first text summarization algorithms was published in 1958 by Hans Peter Luhn, working at IBM research..Luhn’s algorithm is a naive approach based on TF-IDF and looking at the “window size” of non-important words between words of high importance..It also assigns higher weights to sentences occurring near the beginning of a document.It was now reading the sign that said Privet Drive — no, looking at the sign; cats couldn't read maps or signs.He didn't see the owls swooping past in broad daylight, though people down in the street did; they pointed and gazed open-mouthed as owl after owl sped overhead.No one knows why, or how, but they're saying that when he couldn't kill Harry Potter, Voldemort's power somehow broke — and that's why he's gone.""But I c-c-can't stand it — Lily an' James dead — an' poor little Harry off ter live with Muggles —" "Yes, yes, it's all very sad, but get a grip on yourself, Hagrid, or we'll be found," Professor McGonagall whispered, patting Hagrid gingerly on the arm as Dumbledore stepped over the low garden wall and walked to the front door.G'night, Professor McGonagall — Professor Dumbledore, sir."LSA SummarizerLatent Semantic Analysis is a relatively new algorithm which combines term frequency with singular value decomposition.He dashed back across the road, hurried up to his office, snapped at his secretary not to disturb him, seized his telephone, and had almost finished dialing his home number when he changed his mind.It seemed that Professor McGonagall had reached the point she was most anxious to discuss, the real reason she had been waiting on a cold, hard wall all day, for neither as a cat nor as a woman had she fixed Dumbledore with such a piercing stare as she did now.He looked simply too big to be allowed, and so wild — long tangles of bushy black hair and beard hid most of his face, he had hands the size of trash can lids, and his feet in their leather boots were like baby dolphins.For a full minute the three of them stood and looked at the little bundle; Hagrid's shoulders shook, Professor McGonagall blinked furiously, and the twinkling light that usually shone from Dumbledore's eyes seemed to have gone out.A breeze ruffled the neat hedges of Privet Drive, which lay silent and tidy under the inky sky, the very last place you would expect astonishing things to happen.TextRank SummarizerTextRank is another text summarizer based on the ideas of PageRank, and was also developed at the same time as LexRank, though by different groups of people..TextRank is a bit more simplistic than LexRank; although both algorithms are very similar, LexRank applies a heuristic post-processing step to remove sentences with highly duplicitous.Mr..and Mrs..Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much.They were the last people you'd expect to be involved in anything strange or mysterious, because they just didn't hold with such nonsense.Mr..Dursley was the director of a firm called Grunnings, which made drills.He was a big, beefy man with hardly any neck, although he did have a very large mustache.Mrs..Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbors.Edmundson SummarizerIn 1969, Harold Edmundson developed the summarizer bearing his name..Edmundson’s algorithm was, along with Luhn’s, one of the seminal text summarization techniques..What sets the Edmundson summarizer apart from the others is that it takes into account “bonus words”, words stated by the user as of high importance; “stigma words”, words of low importance or even negative importance; and “stop words”, which are the same as used elsewhere in NLP processing..Edmundson suggested using the words in a document’s title as bonus words..Using the chapter title as bonus words, this is what Edmundson outputs:The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street.When Dudley had been put to bed, he went into the living room in time to catch the last report on the evening news: "And finally, bird-watchers everywhere have reported that the nation's owls have been behaving very unusually today.Twelve times he clicked the Put-Outer, until the only lights left on the whole street were two tiny pinpricks in the distance, which were the eyes of the cat watching him.Dumbledore slipped the Put-Outer back inside his cloak and set off down the street toward number four, where he sat down on the wall next to the cat.He couldn't know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: "To Harry Potter — the boy who lived!"Another addition I made was to use LDA to extract topic keywords, and then add those topic keywords back in as additional bonus words.. More details

Leave a Reply