Python Web Scraping Tutorial 3 (Downloading Stock Data)

Author: chris reeves
102875 View
14m 5s Lenght
555 Rating

https://www.eventbrite.com/e/python-programming-class-tickets-9797688149 Code for tutorials can be found at my github repository. Even more code is available for free here as well. http://github.com/creeveshft Downloading stock data from yahoo finance with regular expressions and url libraries. This is a simple but processor intensive way to search for data. I will be going over a better way to request data in later tutorials. To see my data feeds and other products for sale and lease visit my website and purchase data feeds or software products. http://christopherreevesofficial.com Follow me on Twitter: http://twitter.com/cjreeves2011 The web scraping news system is located here http://adbnews.com For consulting work greater than $50,000 or comments and suggestions email creeveshft@gmail.com Read my personal blog : http://blog.christopherreevesofficial.com

Comments

Great tutorial. Thanks!
how would you save what you scraped into a dataframe?
Im getting empty brackets.
<span class="Fw(b) D(ib) Fz(36px) Mb(-4px)" data-reactid=".7lb0s3cs0a.0.$0.0.1.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.1.0.$price.0">(.+?)</span>
any help?
Do you have a video that teaches how to grab dynamic data?
Thank you!
Hi, hoping you can still help even after 3 years since you've posted the video. So, I think I get why I'm getting [] when I print price.. it's because span id, yahoo now uses span data-reactid, which generates a different ID each time you refresh the page... rendering this method obsolete. Hope to get your opinion about this. Thanks!
the price of aapl is []
the price of spy is []
the price of goog is []
the price of nflx is []

This is my Output. How to fix this please help me.
error: nothing to repeat

i'm having a trouble in line 6 : pattern = re.compile(regex)

pls help
removing the span id solved the empty brackets issue for me
regex = '"yfs_l84_'+symbolslist[i]+'">(.+?)</span>'
How do I use regex when the html code I am searching for is across multiple line?
such as
<div class="FL gL_10 UC">P/E</div>
<div class="FR gD_12">(.+?)</div>'
In the html source there are multiple items with similar lines. One set for PE another for DE Ratio, etc.
can you make it so it finds the stock by it self. So you dont have to get the name and link
Hi Chris,

Thank you for the videos. I have learnt a lot from them.

Could you make another video based on the new yahoo finance website? I can't seem to get the data sources from the network tab. The network tab looks very different in the new version. Thank you!
why not use enumerate?
You're great Chris! Thanks for the tutorials!
Hi Chris, thanks for making such good videos. Question: why did you choose a while loop and not a for loop? I went ahead and tried to ran a for loop and i got a EOL error...
Very very very very clear! Good job
Great series, i can replicate your example but ran in some trouble creating my own.
I want to get the prices of smartphones from this website, http://tweakers.net. It's a Dutch site.

The textfile 'TweakersTelefoons.txt' contains 3 entries:
samsung-galaxy-s6-32gb-zwart
lg-nexus-5x-32gb-zwart
huawei-nexus-6p-32gb-zwart

I'm using python 2.7 and this is the code I used:

import urllib
import re

symbolfile = open("TweakersTelefoons.txt")
symbolslist = symbolfile.read()
symbolslist = symbolslist.split("\n")

i = 0
while i < len(symbolslist):
url = "http://tweakers.net/pricewatch/[^.]*/" +symbolslist[i]+ ".html"
## http://tweakers.net/pricewatch/423541/samsung-galaxy-s6-32gb-zwart.html is the original html

htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()

regex = '<span itemprop="lowPrice">(.+?)</span>'
## <span itemprop="lowPrice">€ 471,95</span> is what the original code looks like
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)

print "the price of", symbolslist[i], "is ", price
i+=1

Output:
the price of samsung-galaxy-s6-32gb-zwart is []
the price of lg-nexus-5x-32gb-zwart is []
the price of huawei-nexus-6p-32gb-zwart is []

The prices are not shown
I tries using [^.] to get rid of the euro sign, but that didn't work.
Furthermore it might be that in Europe we use a "," instead of "." as a seperator for decimals.
Please help.

Thank you in advance.
does not work!!!
this is what i have done:

from urllib.request import urlopen
import re

htmlfile = urlopen("https://uk.finance.yahoo.com/q?s=ocdo&ql=1")
htmltext = htmlfile.read()
regex = '<span class="" id="yfs_184_ocdo.l">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
print(price)

Need help does not work in python 3.3

Please actualize your code. Thanks.
for those of you having the empty brackets the problem is due to your regex. it must be "yfs_l84 not 184 l84 its an L not a one.