Coding a Python script to find dead links in a Wikipedia page.

So we use a status code validation from requests module from Python.temp_page = requests.get(i, verify=False)if (temp_page.status_code == 403): forbidden_urls.append(i)elif not (temp_page.status_code == 200): print("* ", i) dead_links.append(i)in the above code we have kept aside pages giving status code 403 (forbidden) because they can’t be tracked by requests module but browser can got to those URLs.Imagine a Wikipedia page containing thousand links, how much time would it take?.(I know Python is slow when compared to JavaScript) In my case finding dead-links from Narendra Modi’s wiki took around 1610 seconds, around 26 minutes (which is humongous), but, I couldn’t do nothing and I searched how I could lessen that time, then I used a .csv file storage..Which takes same 21 minutes on the first run and only around 0.0082 seconds on second run.time taken on the first runtime taken on the second runNow that the basic logic is almost done and being a command line application would attract none..So, I decided to go with building a flask application with a simple UI..Flask is a micro web framework written in Python alongside of Jinja2 templating which reduces redundancy of HTML codeA simple Hello world flask application would look like -The app combined with the logic mentioned above looks like this —Homepageand the result looks like this —This app is still in development and I thank Raja CSP Raman very much for encouraging me in completing this project.You can find the code for this web app at Here!Any suggestions and improvements are most welcomed..On any further updates, this post will be updated. More details

Leave a Reply