Web Scraping HTML Tables with Python

For our purpose, we will inspect the elements of the table, as illustrated below:Inspecting cell of HTML TableBased on the HTML codes, the data are stored in after <tr>..</tr>..Finally, we will store the data on a Pandas Dataframe.import requestsimport lxml.html as lhimport pandas as pdScrape Table CellsThe code below allows us to get the Pokemon stats data of the HTML table.url='http://pokemondb.net/pokedex/all'#Create a handle, page, to handle the contents of the websitepage = requests.get(url)#Store the contents of the website under docdoc = lh.fromstring(page.content)#Parse data that are stored between <tr>..</tr> of HTMLtr_elements = doc.xpath('//tr')For sanity check, ensure that all the rows have the same width.. More details

Leave a Reply