HomeОбразованиеRelated VideosMore From: Siraj Raval

Recurrent Neural Network - The Math of Intelligence (Week 5)

1170 ratings | 76012 views
Recurrent neural networks let us learn from sequential data (time series, music, audio, video frames, etc ). We're going to build one from scratch in numpy (including backpropagation) to generate a sequence of words in the style of Franz Kafka. Code for this video: https://github.com/llSourcell/recurrent_neural_network Please Subscribe! And like. And comment. That's what keeps me going. More learning resources: https://www.youtube.com/watch?v=hWgGJeAvLws https://www.youtube.com/watch?v=cdLUzrjnlr4 https://medium.freecodecamp.org/dive-into-deep-learning-with-these-23-online-courses-bf247d289cc0 http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ https://deeplearning4j.org/lstm.html http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ And please support me on Patreon: https://www.patreon.com/user?u=3191693 Follow me: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w
Html code for embedding videos on your blog
Text Comments (231)
KRYPTOSHI (1 year ago)
#notificationsquad where art thou ?
Nick Kartha (4 months ago)
we are everywhere and we are nowhere
Siraj Raval (1 year ago)
Tanner Leonard (1 year ago)
nooo, i'm too late
Saif Abdur Rahman (1 year ago)
Gabriel Augusto (11 hours ago)
very nice lesson thanks alot.. helped me to understand recurrent neural networks to make my conclusion work in computer enegineering degree
Aymas Rayman (6 days ago)
it's sad that you're gay... good work
Carlos Nexus (7 days ago)
Generate some charmander bro. Love from Argentina
Luke Piette (17 days ago)
wait did nt he just copy this guy's code? https://gist.github.com/karpathy/d4dee566867f8291f086 Not that it really matter, but he should at least credit or something (the code was written in 2015)
Mario Galindo (25 days ago)
This is one of your best videos. Please consider completing it with another video using LSTM. Thank you. Also will be very interesting to consider a model with two recurrent hidden layers. Thank you again.
orazdow (1 month ago)
This is the best rnn explanation out of any other video
dimitris karipidis (2 months ago)
Is it possible to vectorize the forward/back propagation for RNNs, like classification ANNs?
Benjamin Jordan (2 months ago)
I freaking love your energy man, it's like you just realized you're conscious and you are determined to figure out how you're able to think.
Asma Hawari (2 months ago)
i 'd like to know more about you .. How did you begin in this career and how long does it takes from you to reach this level ! .. i'm curious about your time management and for how many hours did you read and study .. how we can get motivated all the time i guess this is a good video idea !
Asma Hawari (2 months ago)
i LOVE your tutorials and these two sentences in the beginning of all your videos make me love you more hhh "Hello world it's Siraj" ! hhh you are awesome man <3
LAMA ALRAMADAN (2 months ago)
Thanks Siraj , you help me alot
Rohan Sawant (3 months ago)
Check out my fork https://github.com/CT83/handwriting-synthesis, Handwriting Synthesis using RNNs
sum guy (3 months ago)
over a year old and this still applies, just shows you've been smarter to keep with up with life
richard juan (4 months ago)
love the crasyness :) rap with me: (input * weight + bias ) Activate
Lei Rui (4 months ago)
You are awesome. It's not easy to get this topic through.
Ps Ml (4 months ago)
Please attach the souce code in plain .py. I cant install anything else other than python due to restrictions.
Praveen Kumar Chandaliya (4 months ago)
Can you help in ConvolutionLSTM and DeConvloutionLSTM
Adrian Dydynski (4 months ago)
Why there is in sample function ix = np.random.choice(range(vocab_size), p=p.ravel()) instead of argmax?
Rajat Bhatt (5 months ago)
stock1337 (5 months ago)
what if we are dealing with language that has no alphabet? Such as Mandarin/Chinese ? How do we implement RNN in that case?
Anique Tahir (2 months ago)
Each kanji has some meaning associated to it. The total number of kanji are a few thousand. We can use embeddings to represent them.
shashank chaudhary (5 months ago)
Why do we need to use two different activation functions(sigmoid & tanh) in input gate in LSTM? and why do we need to use tanh in output gate in LSTM?...
sikor02 (5 months ago)
I lost it on partial derivatives and computing deltas
Anshul Mathew (5 months ago)
So right now there is only one hidden layer which spits out value at t-1 which is used along with input to generate values at time stamp t. What happens if there are multiple hidden layers? for eg if the architecture is as follows i/o ----> h1--->h2 --->o/p How would the connections between the hidden layers be in a rnn of this type ?
Pradyumn Kommajoshyula (5 months ago)
This is a great video. Is there like a tensorflow implementation of this application?
Ala Kazam (6 months ago)
All the part on the loss function is not very clearful.. can you explain what is dhraw and all those operations ?
Rohit .Sinha (6 months ago)
Guy is taking public for a ride. The output of the project is garbage. What did you solve apart from some funky mathematics which includes linear algebra and derivatives. Don't take people for granted.
Łukasz Ogan (6 months ago)
How do you make pictures of neural network?
imranias (7 months ago)
17:03 : 'r' argument in open is not for "recursive", but "read" mode.
Md.Yasin Arafat Yen (7 months ago)
Where do I get this Kafka.txt?
Thomas Schijf (7 months ago)
Nice Vid Siraj, there are some developments in RNN field like the Echo State Network, maybe can you do a video on this https://www.quantamagazine.org/machine-learnings-amazing-ability-to-predict-chaos-20180418/ 🔥 https://github.com/cknd/pyESN
Stuart Mashaal (7 months ago)
dat python2 doe...
hj labs (8 months ago)
You are a professor by nature... cool video... awesome... keep going...
GamingFiddler (8 months ago)
Why is it that some rnn models I see online show the output from the previous timestep going into the hidden layer, however in this video you say to use the hidden layer from the previous timestep should added to the hidden later?
Ronak Majd (8 months ago)
It was great, thanks a lot. It comes from your soul and all your cells. I could feel it.
streamer841 (8 months ago)
This is very clear explanation.. recommended for the intermediate level learning. This is really help a lots
teja polisetty (8 months ago)
can we predict next number of a given integer sequence using RNN?
Abhishek shukla (8 months ago)
The final code gave me an error saying "sample is not defined". Please help.
gagan shivanandgangothri (8 months ago)
Thanks Siraj. Learnt a lot from this video . Got a new better way to look at RNNs
hrishikesh aware (9 months ago)
Thanks a lot Siraj.....it is so helpful....
afq radeon (9 months ago)
Hi Siraj, Can you please make a detailed coding video about different gradient descent optimizers ? like how to code momentum, or Adam etc.. Please..
Kags (9 months ago)
The way you never code important parts makes things much harder, there is no step-by-step explanation. There is no difference between reading through that python notebook and watching your videos. The only use i see to those videos is to discover a new technology, so i can understand somewhere else...
Azure Wang (9 months ago)
I spent 10 second logging in youtube to just click the like button of this video.
Carlo Lepelaars (9 months ago)
Thank you for this series! This is awesome! When running the model for 500000+ iterations on the Kafka text it doesn't seem to get lower than a 40% loss. What would you suggest to optimize this particular model most efficiently? Greetings from the Netherlands
atrumluminarium (9 months ago)
So for deep networks, on which hidden layer do I stick a past hidden to? All of them? Just one of them?
Chandra Kanth (9 months ago)
Every time he says "Chars"(Kars) as "Chaars".. It kills me!
Zeegoner (4 months ago)
"most" meaning every I have encountered
Zeegoner (4 months ago)
@Leon Tepe so do most of the professors at my (prestigious) school who are actually knowledgeable about the field
Leon Tepe (4 months ago)
I actually pronounce it the same way as he does whats the deal
Nick Kartha (4 months ago)
@Deepak Mishra: Sirajamander evolves into Sirajameleon evolves Sirajazard.
Revant Tiwari (10 months ago)
why did you use 0.01 as multiplicant of np.random.rand(.....)?
Khaja Moinuddin Nadaf (10 months ago)
i think 'r' at 17:06 is read mode not recursive.
mcan (10 months ago)
16:46 "one morning Gregor Samsa awoke from uneasy dreams he found himself transformed in his bed into a gigantic insect." You can't say blah blah :)
Erik Hammer (10 months ago)
Can you explain why you have to format the input vector into a dictionary then to binary vector? You have for example: a:55, r: 47 c:22 which you map to a binary vector (80x1) -> a = 0, 0, 0 ... 0, 1, 0, 0... Could you not just have that dictionary of 80 characters and scale the integer representation to a float of 0->1, such that for example a:0.6875 c:0.5875 c:0.275. Then instead of an input vector of (80x1) your input is just a float value (1x1) representing a unique character. I know this probably wouldn't work, but I don't understand why. The reason I ask is because I'm trying to port your code to a time series waveform and I just have input data in float form from 0->1 and I don't know if I need to label each float point to a binary vector to represent each unique float value in the sequence. That doesn't seem like it would make sense.. please help :)
Erik Hammer (10 months ago)
I may have found part of the answer to my question: label encoding vs 1-hot encoding. https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f
Moshi Wei (10 months ago)
You have done a great amount of good job indeed. But, please please please no more singing or rapping again. I do not enjoy any second of it. Please keep this channel academical.
Tracy Yang (11 months ago)
Hi Raj, Great video. I have a question about neural network. What is the difference between neural network, convolutional neural network and recurrent neural network?
Paul Hilton (11 months ago)
Hi Siraj, Could you please give the reference text or source from where you are getting the forumlas and differnet diffrentiations? I am getting different answer for dLi/df_k than your answer (p_k - 1). Also, you only covered chain rule here but there is definately some advanced rules used here (product rule). Also, I am not sure how one would do derivation of summation of e^j where one of the j = k.
Jose Daniel Alvarez (11 months ago)
Hi Siraj! Thanks for the great material.. I am wondering if it is possible to use a Recurrent Neural Network to make a classifier? I would like to classify the events of a device based on some sensors like accelerometers, and other signals. I guess it should be similar to classifying the physical activity like running or walking. However, in my case the events are not periodic. I have everything to collect the labeled samples, but any idea about how large should be the dataset for the training part? Any idea would be much appreciated..
sharada Oli (11 months ago)
how to combine cnn and rnn to detect disease in plant leaf
Germán Rimoldi (1 year ago)
So, you said you didn't care that much about using DL on financial data. Then you said you where going to talk more about it because WE cared about that. You put US first! You are awesome, dude.
Germán Rimoldi (1 year ago)
Can I use this to generate recommended URLs?
Richard (1 year ago)
love the video! easy to follow if you understand basic NN already
getrasa1 (1 year ago)
Why do we have 3 pairs (of 3 for input-hidden, hidden-hidden, hidden-output) instead of just one pair ?
soumya sarkar (1 year ago)
Siraj....U ROCK :)
Siraj Raval (1 year ago)
thanks Soumya :)
digging deep (1 year ago)
Thank you a lot. what does 'iteration' exactly mean? iteration happen when learning or generating?
manish adwani (1 year ago)
Can anybody please tell me what does this mean xs[t][input[t]] whose value are we changing
Data Ninja (10 months ago)
At time (iteration) t, inputs[t] is the current character and xs[t] is the one-hot representation of that character (meaning that x[t] is a vector of zeros except for the component that corresponds to the character which has the value 1). Example: Let's consider that the vocabulary is {'a', 'b', 'c'}. Here, vocab_size = 3 and char_to_ix would look like : { 'a' -> 0, 'b' -> 1, 'c' -> 2}. For example, if inputs[t] is equal to 1 (representing the letter b), then xs[t] will be equal to [0, 1, 0]. So the vector representation of the character 'b' is [0, 1, 0]. Note that the size of the vector xs[t] is equal to vocab_size.
manish adwani (1 year ago)
Xs[t][inputs[t]] what does this do
ahmed alfatih (1 year ago)
i never thought that i can be smart and cool before thanks a lot siraj
Please consider applying your skills to Anti-AI: Learning leads to Knowledge, Knowledge is Power, Power corrupts and absolute Power corrupts absolutely. Promoting AI leads to a brief 'honeymoon period' with many awesome outcomes, soon Human Obsolescence will take its toll on people and business alike. Then an AUTONOMOUS AGI (while intesting, self-awareness is NOT required) will become Earth's Apex Predator: Nothing Singularity, just the GONE moment for Humans. It is utterly amazing to watch a clever person be so myopic and obtuse about the inevitably self-defeating nature of AI. LIMIT THE DEPTH and BREADTH OF ANY/ALL AI. AI always OPTIMISES. Humans are many, many things, OPTIMAL is not amongst these.
Theo Valentino (1 year ago)
Humans are redundant. AI is the future.
Nasib Ullah (1 year ago)
there is an error in 19:19 .it should be ix_to_char not char_to_char
Siraj Raval (1 year ago)
Callum Lock (1 year ago)
Hey SIraj, I am a huge fan of your videos, they have helped me a lot. Do you know of any material on applying machine learning models to Intrusion Detection Systems (IDS) ?
naisanza (1 year ago)
Why letter-by-letter, vs word-by-word?
yacine benameur (1 year ago)
really love your videos sir ! just a quick question,why the tanh and softmax are widely used in RNN instead of sigmoid function ?
Siraj is an example of what you will never find in a school because he gets to the point and quickly. Most CS subjects can be learned in weeks. The Nand to Tetris course is a great course that demonstrates how much time students waste. CS is easy compared to any math major. NNs just use the chain rule of calculus and PGMs just use the chain rule of probability. Go figure. It's elementary math. SVD is numerically more stable than PCA but autoencoders just outdate the whole math department. A little crunching generalizes better than any 17th century math obsession. However, CS departments are short on graphics and engineering when it comes to numerical methods like FEM. Needs to cover way more and much quicker. I still think people should stick to a math degree even if you want to do CS. Too superficial.
Kleber Stangherlin (1 year ago)
Congrats, great video. The cool thing is: we don't need to speed the video up, you talk fast already!
Mitchell (1 year ago)
Very much appreciate that the full program is coded in the video!
Zachary Lahey (1 year ago)
Hi Siraj, love your videos. Haven't found anywhere else that explains these concepts as well as you do. Any suggestions on where I can learn more about Echo State Networks?
Kaustav Tamuly (1 year ago)
Take me in as your apprentice xD
Patrick Mesana (1 year ago)
Great job Siraj!
gracchusBE (1 year ago)
Loved the video! Two remaks tough. I had to rewatch some parts once you go over the copy pasted code as It can get hard to see what part your talking about and it can get distracting once you start reading a wrong part. To still be able to speed things op I'd suggest to make the code appear line by line or block by block like in a presentation as this puts more focus that this part does this explanation. Secondly having a prebaked pie ready in the oven to show the end result is always cool to see. We get a gimps of where it is going at the end of the video but it would be fun to see it in a more completed state Anyway really enjoy the way you explain it :D great job!
arkoraa (1 year ago)
4:28 well that escalated quickly
Devanand T (1 year ago)
you copied image from matlab!! :P
Larry Lawrence (1 year ago)
thank you for Recurrent Neural Networks video
Rohita Kurdikeri (1 year ago)
Can you please make a video on wind forecasting using the hourly data and implementing it using recurrent neural networks ??
Minh Tâm Nguyễn (1 year ago)
this is amazing, im getting so excited
Ayush Agarwal (1 year ago)
Dhrumil Barot (1 year ago)
another good place to start on topic >http://machinelearningmastery.com/models-sequence-prediction-recurrent-neural-networks/?__s=tsdef8ssdsdgdvqwkm8e
gg (1 year ago)
Where do you learn all this?
12345a (1 year ago)
u r god
avatar098 (1 year ago)
17:00 I think you meant 'read-only' Great video! Definitely learning as much as I can, because I cannot wait until I get admitted into a Master's program :D
Kenneth Ibarra (1 year ago)
Hello Siraj, can you teach about NeuroEvolution especially NEAT (Neural Evolution of Augmenting Topologies). Looking forward on how you can implement it python :)
sikor02 (5 months ago)
Oh yea, I was thinking about it too for some time. I imagine ES-Hyper NEAT + recurrent neural networks with memory cells as ultimate A.I. tool.
Kenneth Ibarra (1 year ago)
really nice work sir! you help a lot of people, really intuitive videos with a bit of crazy, thank you so much for the hard work!
Siraj Raval (1 year ago)
coming soon
ayush sharma (1 year ago)
does python 3 will give you error in the code because i am getting error.
Fighting Badgers (1 year ago)
Hey Siraj, could you do a video about the "hashing trick" I keep hearing about? It is said it speeds things up significantly, but I am still a bit confused about it.
Lasse Buck (1 year ago)
Great video! In most use cases for RNN, all training sequences are picked from a population of same 'type' like "Shakespeare writings". A step more advanced - What if you have generated a lot of _random_ sequences and calculated a quality value for each sequence (for stuff like game theory or physical modelling). Can that quality value be integrated in the cost function, thus taking into account the value of each sequence? Could you sketch out how to do this or give reference to relevant links? Thanks!
Lasse Buck (1 year ago)
I managed to find some keywords that could guide me in the right direction: "supervised learning" and "Q-learning". One example was this one: http://bit.ly/2tvMdZQ
Deep_In_Depth (1 year ago)
Kafka crazy ? Isn't that a bit much ? Weird OK and DEEP maybe ;) Anyway thanks for all your nice and very instructive videos :)
Deep_In_Depth (1 year ago)
That's how I understood it. Genius sounds better definitely :) Thanks again for all the work you do
Siraj Raval (1 year ago)
hmm yeah in my head crazy == genius, but i can see how that could be interpreted as negative. thanks for the feedback!
GZ (1 year ago)
I wonder how well this would perform after some good training, compared with a simple Markov chain algorithm, in terms of generating words that make sense.
shitij bisht (1 year ago)
Siraj,how to know the shape required for weight and bias,etc
Rasaki (1 year ago)
Siraj. I'm hearing impaired (profound so I use pre-amp and headset). Could you please CC your videos so they appear on TV YouTube (Apple TV, etc) as it's very nice for hearing impaired like myself, who need the Closed captions on the screen while following on the Notebook. Perhaps I can contribute to a pipeline, if it doesn't already exist, of automating VTT and SRT generation from audio so you can include it with you videos; Google Speech API on GCP from buckets?
prakhar mishra (1 year ago)
Any specific reason for choosing character level generation over word ?
GZ (1 year ago)
I guess for chars you get around 100 such things in English language, like in this case 81 output neurons. For words, maybe at least 3000 to make something that makes any sense?
zach p (1 year ago)
Thank you for another fantastic lesson Professor! #schoolofsiraj
Dat Nerd (1 year ago)
Why go to college when you can listen to Siraj?
LETS STOP ARTICLE 13 (8 months ago)
To be tortured
Simon WoodburyForget (10 months ago)
*Scared Puppet* The point isn't to stay at minimum wage, the point is to bootstrap from it.
Data Ninja (10 months ago)
At what point did I claim that you can't be well off without going to college? Everybody is different and people want different things from life. I personally don't think that a minimum wage is sustainable in the long run if you want to have a family and provide them with good food, shelter, healthcare and so on; I maybe made the mistake of thinking that most people want those things. At the end, everyone should follow the path that he chooses in life.
Simon WoodburyForget (10 months ago)
*Scared Puppet* You're doing it again. You say one thing but do the other. You claimthat you cant be well off without going to college. Minimum wage, is more then enough to live on, espetially in your country. Complete nutrition daily food requirements only comes to approximately 3$ per day, people spend more because they have more. Getting the best job you can get wont give you the most out of your life. People who go to college get better jobs by being over worked, more then they usually would by them self. Getting the more paying job is not getting the better job.
Data Ninja (10 months ago)
I agree with your first comment that you don't go to college to learn. I absolutely don't think that what there is to life is to get that one job, but I'm also realistic, not everybody is going to be the next Mark Zuckerberg or win the lottery, most of us will spend their life working for someone else in order to make a living because that's the easiest/simplest thing to do (I'm saying that while doing whatever I can to avoid that fate). If you're not an incredible person, you won't get an incredible job. I totally agree. But you'll get a better job if you have a college degree than if you don't. And most people are not incredible people, that's the meaning of incredible.
Jorge (1 year ago)
lol, I'm reading metamorphosis right now and before you even said you were going to use metamorphosis, I was thinking about how Franz Kafka would be a good style to mimic with this, and BAM, seconds later that's exactly who you are using
Siraj Raval (1 year ago)
so awesome
Name (1 year ago)
Yeah! Please talk about stock stuff soon! :D
is there any usefulness to study this rnn? i mean i'm majoring in electrical engineering but i really love to learn all kind neural network? is that a waste of time?
ShawarmaLifeLiving nah. i love learn heavy stuff, quantum mechanic, quantum information theory, string theory, nuclear fussion-fission, em field, etc. so can this be apply to my field?
ShawarmaLifeLiving (1 year ago)
is learning anything new a waste of time?

Would you like to comment?

Join YouTube for a free account, or sign in if you are already a member.