Recurrent Neural Network - The Math of Intelligence (Week 5)

Recurrent neural networks let us learn from sequential data (time series, music, audio, video frames, etc ). We're going to build one from scratch in numpy (including backpropagation) to generate a sequence of words in the style of Franz Kafka. Code for this video: https://github.com/llSourcell/recurrent_neural_network Please Subscribe! And like. And comment. That's what keeps me going. More learning resources: https://www.youtube.com/watch?v=hWgGJeAvLws https://www.youtube.com/watch?v=cdLUzrjnlr4 https://medium.freecodecamp.org/dive-into-deep-learning-with-these-23-online-courses-bf247d289cc0 http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ https://deeplearning4j.org/lstm.html http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ And please support me on Patreon: https://www.patreon.com/user?u=3191693 Follow me: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w
very nice lesson thanks alot.. helped me to understand recurrent neural networks to make my conclusion work in computer enegineering degree
wait did nt he just copy this guy's code? https://gist.github.com/karpathy/d4dee566867f8291f086 Not that it really matter, but he should at least credit or something (the code was written in 2015)
This is one of your best videos. Please consider completing it with another video using LSTM. Thank you. Also will be very interesting to consider a model with two recurrent hidden layers. Thank you again.
This is the best rnn explanation out of any other video
Is it possible to vectorize the forward/back propagation for RNNs, like classification ANNs?
I freaking love your energy man, it's like you just realized you're conscious and you are determined to figure out how you're able to think.
i 'd like to know more about you .. How did you begin in this career and how long does it takes from you to reach this level ! .. i'm curious about your time management and for how many hours did you read and study .. how we can get motivated all the time i guess this is a good video idea !
i LOVE your tutorials and these two sentences in the beginning of all your videos make me love you more hhh "Hello world it's Siraj" ! hhh you are awesome man <3
Thanks Siraj , you help me alot
Check out my fork https://github.com/CT83/handwriting-synthesis, Handwriting Synthesis using RNNs
over a year old and this still applies, just shows you've been smarter to keep with up with life
love the crasyness :) rap with me: (input * weight + bias ) Activate
You are awesome. It's not easy to get this topic through.
Please attach the souce code in plain .py. I cant install anything else other than python due to restrictions.
Can you help in ConvolutionLSTM and DeConvloutionLSTM
Why there is in sample function ix = np.random.choice(range(vocab_size), p=p.ravel()) instead of argmax?
what if we are dealing with language that has no alphabet? Such as Mandarin/Chinese ? How do we implement RNN in that case?
Each kanji has some meaning associated to it. The total number of kanji are a few thousand. We can use embeddings to represent them.
Why do we need to use two different activation functions(sigmoid & tanh) in input gate in LSTM? and why do we need to use tanh in output gate in LSTM?...
I lost it on partial derivatives and computing deltas
So right now there is only one hidden layer which spits out value at t-1 which is used along with input to generate values at time stamp t. What happens if there are multiple hidden layers? for eg if the architecture is as follows i/o ----> h1--->h2 --->o/p How would the connections between the hidden layers be in a rnn of this type ?
This is a great video. Is there like a tensorflow implementation of this application?
All the part on the loss function is not very clearful.. can you explain what is dhraw and all those operations ?
Guy is taking public for a ride. The output of the project is garbage. What did you solve apart from some funky mathematics which includes linear algebra and derivatives. Don't take people for granted.
How do you make pictures of neural network?
17:03 : 'r' argument in open is not for "recursive", but "read" mode.
Where do I get this Kafka.txt?
Nice Vid Siraj, there are some developments in RNN field like the Echo State Network, maybe can you do a video on this https://www.quantamagazine.org/machine-learnings-amazing-ability-to-predict-chaos-20180418/ 🔥 https://github.com/cknd/pyESN
dat python2 doe...
You are a professor by nature... cool video... awesome... keep going...
Why is it that some rnn models I see online show the output from the previous timestep going into the hidden layer, however in this video you say to use the hidden layer from the previous timestep should added to the hidden later?
It was great, thanks a lot. It comes from your soul and all your cells. I could feel it.
This is very clear explanation.. recommended for the intermediate level learning. This is really help a lots
can we predict next number of a given integer sequence using RNN?
The final code gave me an error saying "sample is not defined". Please help.
Thanks Siraj. Learnt a lot from this video . Got a new better way to look at RNNs
Thanks a lot Siraj.....it is so helpful....
Hi Siraj, Can you please make a detailed coding video about different gradient descent optimizers ? like how to code momentum, or Adam etc.. Please..
The way you never code important parts makes things much harder, there is no step-by-step explanation. There is no difference between reading through that python notebook and watching your videos. The only use i see to those videos is to discover a new technology, so i can understand somewhere else...
I spent 10 second logging in youtube to just click the like button of this video.
Thank you for this series! This is awesome! When running the model for 500000+ iterations on the Kafka text it doesn't seem to get lower than a 40% loss. What would you suggest to optimize this particular model most efficiently? Greetings from the Netherlands
So for deep networks, on which hidden layer do I stick a past hidden to? All of them? Just one of them?
Every time he says "Chars"(Kars) as "Chaars".. It kills me!
"most" meaning every I have encountered
@Leon Tepe so do most of the professors at my (prestigious) school who are actually knowledgeable about the field
I actually pronounce it the same way as he does whats the deal
@Deepak Mishra: Sirajamander evolves into Sirajameleon evolves Sirajazard.
why did you use 0.01 as multiplicant of np.random.rand(.....)?
i think 'r' at 17:06 is read mode not recursive.
16:46 "one morning Gregor Samsa awoke from uneasy dreams he found himself transformed in his bed into a gigantic insect." You can't say blah blah :)
Can you explain why you have to format the input vector into a dictionary then to binary vector? You have for example: a:55, r: 47 c:22 which you map to a binary vector (80x1) -> a = 0, 0, 0 ... 0, 1, 0, 0... Could you not just have that dictionary of 80 characters and scale the integer representation to a float of 0->1, such that for example a:0.6875 c:0.5875 c:0.275. Then instead of an input vector of (80x1) your input is just a float value (1x1) representing a unique character. I know this probably wouldn't work, but I don't understand why. The reason I ask is because I'm trying to port your code to a time series waveform and I just have input data in float form from 0->1 and I don't know if I need to label each float point to a binary vector to represent each unique float value in the sequence. That doesn't seem like it would make sense.. please help :)
I may have found part of the answer to my question: label encoding vs 1-hot encoding. https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f
You have done a great amount of good job indeed. But, please please please no more singing or rapping again. I do not enjoy any second of it. Please keep this channel academical.
Hi Raj, Great video. I have a question about neural network. What is the difference between neural network, convolutional neural network and recurrent neural network?
Hi Siraj, Could you please give the reference text or source from where you are getting the forumlas and differnet diffrentiations? I am getting different answer for dLi/df_k than your answer (p_k - 1). Also, you only covered chain rule here but there is definately some advanced rules used here (product rule). Also, I am not sure how one would do derivation of summation of e^j where one of the j = k.
Hi Siraj! Thanks for the great material.. I am wondering if it is possible to use a Recurrent Neural Network to make a classifier? I would like to classify the events of a device based on some sensors like accelerometers, and other signals. I guess it should be similar to classifying the physical activity like running or walking. However, in my case the events are not periodic. I have everything to collect the labeled samples, but any idea about how large should be the dataset for the training part? Any idea would be much appreciated..
how to combine cnn and rnn to detect disease in plant leaf
So, you said you didn't care that much about using DL on financial data. Then you said you where going to talk more about it because WE cared about that. You put US first! You are awesome, dude.
Can I use this to generate recommended URLs?
love the video! easy to follow if you understand basic NN already
Why do we have 3 pairs (of 3 for input-hidden, hidden-hidden, hidden-output) instead of just one pair ?
Siraj....U ROCK :)
thanks Soumya :)
Thank you a lot. what does 'iteration' exactly mean? iteration happen when learning or generating?
Can anybody please tell me what does this mean xs[t][input[t]] whose value are we changing
At time (iteration) t, inputs[t] is the current character and xs[t] is the one-hot representation of that character (meaning that x[t] is a vector of zeros except for the component that corresponds to the character which has the value 1). Example: Let's consider that the vocabulary is {'a', 'b', 'c'}. Here, vocab_size = 3 and char_to_ix would look like : { 'a' -> 0, 'b' -> 1, 'c' -> 2}. For example, if inputs[t] is equal to 1 (representing the letter b), then xs[t] will be equal to [0, 1, 0]. So the vector representation of the character 'b' is [0, 1, 0]. Note that the size of the vector xs[t] is equal to vocab_size.
Xs[t][inputs[t]] what does this do
i never thought that i can be smart and cool before thanks a lot siraj
Theo Valentino (1 year ago)
Nasib Ullah (1 year ago)
Siraj Raval (1 year ago)
Hey SIraj, I am a huge fan of your videos, they have helped me a lot. Do you know of any material on applying machine learning models to Intrusion Detection Systems (IDS) ?
Why letter-by-letter, vs word-by-word?
really love your videos sir ! just a quick question,why the tanh and softmax are widely used in RNN instead of sigmoid function ?
Kleber Stangherlin (1 year ago)
Mitchell (1 year ago)
Zachary Lahey (1 year ago)
Kaustav Tamuly (1 year ago)
Patrick Mesana (1 year ago)
gracchusBE (1 year ago)
arkoraa (1 year ago)
Devanand T (1 year ago)
Larry Lawrence (1 year ago)
Rohita Kurdikeri (1 year ago)
Minh Tâm Nguyễn (1 year ago)
Ayush Agarwal (1 year ago)
another good place to start on topic >http://machinelearningmastery.com/models-sequence-prediction-recurrent-neural-networks/?__s=tsdef8ssdsdgdvqwkm8e
Where do you learn all this?
u r god
17:00 I think you meant 'read-only' Great video! Definitely learning as much as I can, because I cannot wait until I get admitted into a Master's program :D
Hello Siraj, can you teach about NeuroEvolution especially NEAT (Neural Evolution of Augmenting Topologies). Looking forward on how you can implement it python :)
Oh yea, I was thinking about it too for some time. I imagine ES-Hyper NEAT + recurrent neural networks with memory cells as ultimate A.I. tool.
really nice work sir! you help a lot of people, really intuitive videos with a bit of crazy, thank you so much for the hard work!
coming soon
does python 3 will give you error in the code because i am getting error.
Hey Siraj, could you do a video about the "hashing trick" I keep hearing about? It is said it speeds things up significantly, but I am still a bit confused about it.
Great video! In most use cases for RNN, all training sequences are picked from a population of same 'type' like "Shakespeare writings". A step more advanced - What if you have generated a lot of _random_ sequences and calculated a quality value for each sequence (for stuff like game theory or physical modelling). Can that quality value be integrated in the cost function, thus taking into account the value of each sequence? Could you sketch out how to do this or give reference to relevant links? Thanks!
I managed to find some keywords that could guide me in the right direction: "supervised learning" and "Q-learning". One example was this one: http://bit.ly/2tvMdZQ
Kafka crazy ? Isn't that a bit much ? Weird OK and DEEP maybe ;) Anyway thanks for all your nice and very instructive videos :)
That's how I understood it. Genius sounds better definitely :) Thanks again for all the work you do
hmm yeah in my head crazy == genius, but i can see how that could be interpreted as negative. thanks for the feedback!
I wonder how well this would perform after some good training, compared with a simple Markov chain algorithm, in terms of generating words that make sense.
Siraj,how to know the shape required for weight and bias,etc
Siraj. I'm hearing impaired (profound so I use pre-amp and headset). Could you please CC your videos so they appear on TV YouTube (Apple TV, etc) as it's very nice for hearing impaired like myself, who need the Closed captions on the screen while following on the Notebook. Perhaps I can contribute to a pipeline, if it doesn't already exist, of automating VTT and SRT generation from audio so you can include it with you videos; Google Speech API on GCP from buckets?
Any specific reason for choosing character level generation over word ?
I guess for chars you get around 100 such things in English language, like in this case 81 output neurons. For words, maybe at least 3000 to make something that makes any sense?
Thank you for another fantastic lesson Professor! #schoolofsiraj
Why go to college when you can listen to Siraj?
To be tortured
*Scared Puppet* The point isn't to stay at minimum wage, the point is to bootstrap from it.
At what point did I claim that you can't be well off without going to college? Everybody is different and people want different things from life. I personally don't think that a minimum wage is sustainable in the long run if you want to have a family and provide them with good food, shelter, healthcare and so on; I maybe made the mistake of thinking that most people want those things. At the end, everyone should follow the path that he chooses in life.
*Scared Puppet* You're doing it again. You say one thing but do the other. You claimthat you cant be well off without going to college. Minimum wage, is more then enough to live on, espetially in your country. Complete nutrition daily food requirements only comes to approximately 3$ per day, people spend more because they have more. Getting the best job you can get wont give you the most out of your life. People who go to college get better jobs by being over worked, more then they usually would by them self. Getting the more paying job is not getting the better job.
I agree with your first comment that you don't go to college to learn. I absolutely don't think that what there is to life is to get that one job, but I'm also realistic, not everybody is going to be the next Mark Zuckerberg or win the lottery, most of us will spend their life working for someone else in order to make a living because that's the easiest/simplest thing to do (I'm saying that while doing whatever I can to avoid that fate). If you're not an incredible person, you won't get an incredible job. I totally agree. But you'll get a better job if you have a college degree than if you don't. And most people are not incredible people, that's the meaning of incredible.
lol, I'm reading metamorphosis right now and before you even said you were going to use metamorphosis, I was thinking about how Franz Kafka would be a good style to mimic with this, and BAM, seconds later that's exactly who you are using
so awesome
Yeah! Please talk about stock stuff soon! :D
is learning anything new a waste of time?

