Using ChatGPT with YOUR OWN Data. This is magical. (LangChain OpenAI API)

1,287,936
1,993
Published 2023-06-19
Here's how to use ChatGPT on your own personal files and custom data. Source code: github.com/techleadhd/chatgpt-retrieval
Ace your coding interviews with ex-Google/ex-Facebook training. techinterviewpro.com/
Make passive income with crypto in DeFi Pro. defipro.dev/
đź’» Get access to 100+ programming interview problems explained: coderpro.com/
đź“· Learn to build a successful business on YouTube from scratch: youtubebackstage.com/
💻 I’ll send you FREE daily coding interview questions to practice your skills: dailyinterviewpro.com/

đź›’ My computer and camera gear: www.amazon.com/shop/techlead/list/UVRWWQKBFRR
⌨️ My favorite keyboards: iqunix.store/techlead

Follow me on social media for more tips & fun:
instagram.com/techleadhd/
twitter.com/techleadhd/

Disclaimer: This description may contain affiliate links. Cryptocurrencies are not investments and are subject to market volatil

All Comments (21)
  • @davidl.e5203
    This is rare. TechLead is actually uploading a useful coding tutorial instead of his opinions.
  • @jayhu6075
    This is the way to explain LangChain in the style of TechLead. You nail it. Hopefully more this stuff in the future. Thanks.
  • @TheRealTommyR
    This is exactly what I desired to do with my own data, but I haven’t spent any time yet to research and figure out a way to do it. I am glad there is a public way to do it.
  • @MacroAnarchy
    Im studying law at the moment and I’m seriously scared about how this will change the legal industry. Honestly could see it replace 90% of lawyering.
  • Awesome seeing TechLead do programming, the Maestro at work.
  • @adasi008
    By far one of the best ChatGPT video tutorials I've seen on YouTube. Great work
  • @RunningBugs
    So after digging into the code, I found that Langchain is actually doing the following things: 1. for all your data, store then in vector storage using embeddings; 2. when you query something, it first did a similarity search in the embeddings database, and find out the files that's related to your question; 3. After finding the related files, it takes all the text of that file, together with a context message: "Use the following pieces of context as the 1st system message to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n {your text data}". This somehow tells us these points: 1. why it's sometimes not having outside world's information? If the question you asked is not in your document, or if it's not trained on the data for your question, it will return nothing valuable as instructed. 2. Is there a limit on the sizes of your data? Yes, you can't use it with super large files because it's doing a document filtering and it will send all text related to the API server, recently the gpt-3.5-turbo-16k might be the good model to use and it's best the total size of related docs is less than 16k tokens. Which means the best practice would be grouping your data into different topics and try to ensure any query, if responded with similarity search, the total size of returned document is not exceeding the token size limit of the model. I think16k is roughly the size of a 13-15 pages paper. 3. By removing/changing the system message, you might get better results for common sense questions. I really don't like the system messaged by default, since in a playground, asking gpt-3.5-turbo-16k "Who is George Washington?" will give you better answers comparing to the langchain solution with an empty system message. 4. The langchain is using unstructured library (it reports errors when I didn't install it), which means you can not only use txt files, but also pdf files, word files, etc. Haven't tested it out but highly likely support query of multiple pdf files using similar code in the video. So you can put multiple pdfs in a folder, using a directory index creator and ask questions for your papers, I think (haven't tested it out) 5. The langchain not only supports ChatGPT models, but also other models in the chat_models package. Google PALM2 chat is also supported as of Jul 10, 2023, if someone has the key, you can use other models too. While I don't think PALM2 has the common sense knowledge as good as ChatGPT, but I think it is a better language generating model comparing to at least gpt-3.5-turbo-16k , so PALM2 may produce better results on your data and OpenAI's models are better in answering common sense questions after changing the default system message. OpenAI said general access to gpt-4 is starting, and people with history of successful payment using OpenAI API will get the access immediately a few days ago. The access to new developers will be rolled out until end of July. Also I think it's quite cool to be able to use your own data, if you want to create something like an AI assistant, you can always use code to collect current time, user information and put those in a folder, so the assistant will be able to do much more than current ones. Another very cool thing is auto-gpt which works great using gpt-4, gpt-3.5 is not smart enough and behaves much worse than gpt-4. If you asked auto-gpt something, it will be able to google itself and replied with the real time information. Also the example of auto-gpt is cool telling you how it could create a recipe based on the next holiday. Hopefully the access to gpt-4 is coming sooner.
  • @e-matesecom
    hours and hours of chatgpt courses... i learned more by watching 5 minutes of your video. congratulations for the clarity and the practical approachđź‘Ť
  • @andygilet5538
    Great video. I'm a junior data scientist in Belgium and it's actually helping me for one of my projects. You're totally right when you say that everyone should learn Python. I only learned C and C# during my studies but now that I've learned python I'm using it almost everyday.
  • @jsnmad
    This one video alone saves so much time. Instead of watching hours of some of the playlists out there. It's better to start here and then go straight to the Langchain docs to work out other use cases. Excellent TechLead.
  • @seize2581
    Thanks TechLead, it's nice to see this type of videos !
  • @jcollins519
    Semantra is a pretty cool tool to analyze your documents and be able to search them with natural language. It's probably more research-oriented since it links you to the different pages and snippets that match your query.
  • @ripern
    Awesome tutorial! Simply explained and so many good examples!
  • @hichamalaoui34
    May be 8 months late and Langchain has been updated since, but this is one of the best videos I watched. Thank you.
  • @fenchelteefee
    Great vid, especially the in end with MS’s case study of customer reviews for cars. For those, who actually struggling to find real world applications for the new AI stuff. Thank you!
  • @sr9814
    Loved this. I am a Sales guy with zero coding exp. I listen to content like yours to glean some nuggets to better understand the impacts and have meaningful conversations with my customers. Truly helpful content.
  • @ezit4me
    This was an amazing tutorial. Thank you for making it so easy to follow.
  • @danield.7359
    You made my day. I've been struggling with fine tuning a GPT 3 model with mediocre success and an enormous data collection and preparation effort. It would never even get close to the results achieved with langchain within 1 minute of coding and 9 minutes of data preparation.