hey everyone kevin here today i want to show you how you can bulk extract data out of a pdf file into an excel spreadsheet you could have hundreds of pdfs or even thousands of pdfs with some data that you want to get out we can do this using microsoft excel using functionality that comes with excel you don’t need any third-party tools at all it’s pretty simple so let’s jump on the pc and let’s find out how we can do this here i am now on my pc and i have five different pdf files and i want to extract data from these files and bring it into microsoft excel if you want to follow along with this video i’ve also included a link to all of these sample files so you can do exactly what i’m doing and this way you’ll learn how this works let’s click into one of these files to see what’s stored within this is a cookie order form for the kevin cookie company and it contains things like the order number the customer name what type of cookie they wanted to order and then how many of them along with some additional information now i want to get all of this data and i want to bring it into microsoft excel and with just one order form it’s not too hard i could probably even just copy and paste this or even just manually type the data over but here if i go back i have five different order forms that i want to bring in so if i do that manually that’s going to take a little bit of time but let’s imagine maybe i had 100 order forms or maybe i had 500 or maybe 10 000 order forms there’s no way that i’m going to do that manually so instead i want to let excel do the heavy lifting for me let’s jump over into microsoft excel and first off i’ll show you how we can just import one pdf once we do that then we’ll look at how we can import bulk pdf data into microsoft excel here i am now in microsoft excel and first let’s just import one pdf to see how this even works once you have excel open let’s go to the top tabs and click on the one called data we’re going to be working with data today over on the left hand side there’s a category called get and transform data we want to get data into excel over here on the left hand side we have a drop down for get data and the first option in this menu is called from file we want to get data from a pdf file if we look over at this menu there’s the option that says from pdf let’s start out by clicking on this next this opens up the windows file picker and you can navigate to where you have the pdf saved on your computer if you’re following along navigate to where you saved the sample files here i see my five pdfs now once again we’re going to start off with just extracting data from one pdf later on we’ll come back to how to do it for multiple pdfs i’ll simply click on the first file and then let’s click on import this now opens up a navigator and over here i can see that it has the pdf file selected down below it found two different items within the pdf file one of them is a table and then there’s also a page if i click here on table i can see all of the data that i have in the pdf and this is what we saw just a moment ago you have the order number the customer name all of the information is showing up here so this is what we want it to look like now we could transform the data and when we do the bulk import we’ll come back and do this but for now let’s just click on load just to see how we can get a simple pdf into excel this is now successfully imported all the data into excel here you see the order number the customer name the cookie type all of the data is showing up properly and you might be thinking well hey big deal i mean i could just copy and paste the information directly into excel why would i want to use this here i have the original pdf order form and here i see all those same details now here i could just highlight all that same data and i’ll click on copy and let’s try pasting it in it doesn’t paste it in the table format so it’s not nearly as neat as using this method to getting data in from a pdf so even just for a single pdf this is a more optimal way of getting your data in along with getting your data in so it just looks better and it’s in a table format over on the right hand side we see a pane for queries and connections now what’s really neat is all of the data that shows up here in this table is connected or basically has a live connection with that pdf so let’s say that we go back and we update that pdf we can refresh this view and it’ll pull in any of the updated information i’m going to jump back to the pdf and let’s make an update there and let’s see how this works i’m in microsoft word now and i used word to create all of these pdfs and here i have that cookie order form i’ll simply click into here it looks like megan wanted 30 cookies and here we see 30 cookies showing up in the spreadsheet now i have a hunch that megan actually wanted to order more cookies so maybe instead of 30 let’s go with let’s say 999 we’re just going to overwhelm her with cookies next i’m going to go up to file let’s save a copy and let’s save this as a pdf so i’ll select pdf here and let me just overwrite the existing pdf that we imported into excel so i’ll save this and we’re going to overwrite that file back in excel now over on the right hand side once again under this queries and connections we can click on this refresh icon when we click on refresh here you see now that it’s pulled in the latest data from that pdf so once again this is a connected file so excel is connected to that pdf and as any information within the pdf updates it also updates in excel along with being able to manually check for new data or updated data i can also come over here to the table i can right click on it and then i can go down to properties within properties i can define how often i wanted to check for updates maybe i want to set a predefined amount of time so maybe after 5 minutes or maybe after 60 minutes i can have it automatically check for updates i could also have it check for refresh data when i open the file so i have a few different options on how often i wanted to refresh for an individual pdf it probably doesn’t really make that much of a difference and i won’t have to check for updates that often but in a moment when we bulk import data into excel from many different pdfs let’s say that maybe a new order comes in or maybe you have a new invoice come in you want to check for new information being added so this will come in really handy and we’ll see how this works as we progress through this tutorial now that we know how to import just an individual pdf into excel let’s switch gears and see how we can bulk import data into excel and it’s going to be pretty similar but there are a few differences for this once again let’s go up to the top tabs and click on the one called data over on the far left hand side once again let’s click on get data let’s click on from file and this time instead of going to from pdf this time we want to select pdfs from a folder i have a folder on my computer that contains all of the pdfs that i want to import so i’ll click on this option this opens up the windows file picker and here i’ve navigated to the folder where i have all of my pdfs now i won’t see anything within the folder we’re simply using this to tell excel where the folder is so just navigate to the folder that contains all of your pdfs and once you navigate there let’s click on open this opens up a prompt and it looks like we’re off to a good start it found all five of the pdfs that i have within that folder so so far so good if we look in the bottom right hand corner though we now have to make a decision and there are a lot of buttons here which one do we choose and which one do we use well let’s run through these to see what they do and we’re going to start with the one called load so right here i can either load or load too and basically what that will do is it’ll just load this table as we see it into excel but if you look closely at this table we don’t have any of the details of the cookie order form we don’t have the customer name we don’t have the order date any of the details here so we don’t want to just load it as is to the right of that we also have the option to transform the data what does that mean well transform is just a fancy way of saying tweak the data before you bring it into excel so this will allow us to tweak what’s up here in this table but once again we don’t have any of the order details we don’t have the customer name we don’t have any of that information so we also don’t want to click into transform data also this one’s probably pretty obvious but if we click on cancel it’ll just bring us back to our worksheet so we also don’t want to choose that so that leaves us with one button over on the left hand side and it says combine i’ll click on this to see what’s within this menu so we have a few options we could combine and transform or we can combine and load so if we were to combine and load it would pull out the details from each one of these pdfs and drop it into the excel sheet as is so if you remember with the single pdf we simply combined and loaded so it just brought it in as is but let’s say maybe you want to clean up your data maybe you want to adjust some of the columns before you bring it into excel that’s when we want to transform the data and with these order forms i want to clean it up a little bit before i bring it in if you don’t want to do any cleanup you can just combine and load and you’re done this tutorial is over thanks for watching but if you want to transform your data and clean things up let’s click on this option called combine and transform this opens up the next screen that allows us to combine our files and right up top you’ll see a drop down list and here we can choose a sample file so this will allow us to look at one of our files to make sure that we’re selecting the writing data and here i see all five of my files and the first file this is just the first one there i’ll select that down below i can see the data that it found within this file and just like we did with a single pdf i could select the data or i could select the entire page i’ll just select a table and here i see all of the data that i was expecting this all looks good so now let’s click on ok this now drops us into excel’s power query editor and at this point you might be thinking wow i’ve never seen this interface before and it looks pretty complicated i just wanted to simply extract data from many different pdfs and bring it into excel why am i here we’re going to walk through this step by step and i’ll show you exactly how this works so you can extract data from your pdfs in the exact format that you want think of it as a small investment of time now for a big savings of time in the future right here in the center of the power query editor we can see what the output looks like if we had just combined and loaded it here we see all the different pdf file names and here we see all of the data from within those pdfs so we’re making progress now we can see what’s contained within those pdfs and here i see all the different details but right now it’s not really structured in such a good way i want all of this data to be in columns i want to call them with the order number a column with the customer name and so on so how do we do that well over on the left hand side we’re going to work with this thing called the transform sample file we’ll be able to reorder and organize how our data should look using this sample file once we make our changes to the sample file it’ll apply to the overall data set so let’s click into the one called transform sample file this now drops us into the sample file and here we can see what an individual order form looks like so just like when we extracted the pdf into excel before this matches that exactly so so far all of our data is showing up just like we’d expect and at this point we want to start transforming it and i said before we want to get some of these rows into columns so up here we have all of these different tabs with all of these different controls right here you can see that you can do things like remove rows over here within transform we could transpose we could reverse rows you could change data types you have all of these different options you can even add some additional columns now we’re going to walk through a few examples to clean up this data so hopefully by doing that you’ll start to understand what’s possible first i want to take all of my rows and i want to turn them into columns and to do that we’re going to use something called transpose up here i have transform selected up here as one of the top tabs and right here there is the option to transpose let’s click on this and check that out it now took everything and it transposed it so here i see order number your full name as a column header but it’s not a true header yet here you’ll see that the header says column one column two i want this the order number the full name i want all of that to be in the column header once again over on the top ribbon let’s click into transform and right here there’s the option to use the first row as headers i’ll click on that and look at that it automatically converted it to the header so we’re making some good progress at cleaning up this data everything is looking pretty good but when i look more closely at my data here i see the order number it has the order number in row one it has a customer name but then for whatever reason on the order form cookie type and quantity were combined so here i have chocolate chip and then there’s another row with the quantity that the customer wanted to order and that shows up on row 2. now i don’t want this to be another row i want the quantity order to be another column so it’s not that hard to add another column here we’re just going to go to the top where it says add column and here we can add a custom column let’s click on this this opens up a prompt now where i can define a custom column and once again i want to take the quantity down here and i want to pull that into its own column so here why don’t we call this column quantity and then i’ll put in a colon just so it matches the formatting of all of the other columns next i need to define a custom column formula so i can get the quantity over into my new column if you’ve never used column formulas before down here you can learn about power query formulas it opens up a pretty nice article that contains all of the different formulas and then how you use them what you need to feed in so this will provide you with a lot more information in my specific example i want to take the value from this specific cell and i want to populate it in this new column to do that i’m going to use a formula called record dot field as i type it in here you’ll see that there are many different formulas or functions that you can choose from once again you could go through the help article to see what all of the different options are with record dot field i need to feed in the row and also the column that i want to record and as i’m doing this i have to refer back to one of these previous steps for this example just to keep it simple i’ll refer back to the previous step called change type so over here let’s open up the parentheses and here let’s first refer back to the previous step so i’ll refer back to change to type so i’ll type that in now that i’m referring back to this next i need to define well what row do i want to use for this i’ll insert a curly bracket i’ll enter one and then i’ll close the curly bracket now you might be wondering why did i enter a number one this value right here is in the second row well it turns out that the first row here is actually the zero row and this is the one row so that’s why we refer to this as one so it’s really the second row that i’m pulling in next i also need to define the column and the column that it’s in is the cookie type and quantity so here i’ll insert a comma i’ll insert some quotes and then here let’s type in that column header i’ll type in cookie type and quantity i need to make sure that i type it in exactly as it appears above so i also want to make sure i include that colon so it has to match exactly now this is all i need to type in for my formula now that i’m all done i’ll close the parentheses and this should pull that value from this cell right here and it’ll insert it into a new column so let’s click on ok and check that out i now have 999 populated in this column that’s perfect so i no longer need this row down here and i can just go ahead and delete that row now to delete it let’s go to home and right here there’s the option to remove a row i’ll click on this and then here’s an option for remove bottom rows let’s click on that this opens up a prompt where i can define which bottom rows i want to remove now i just want to remove the one row at the bottom so i’ll type in a number one and then click on ok and that now successfully removes the bottom row also when we look up at our data here we have a column header called cookie type and quantity but we just remove the quantity into its own column so let’s double click on this and we can change the title of this column header so it just says cookie type that looks good as we’ve been going through and making all of these different changes you might have noticed over on the right hand side there’s something called applied steps and this keeps track of all of the different transformations that we’re making to our data if you’ve ever used macros before it’s kind of the same idea basically records all of the changes that you make and here you see that we just renamed that column and the last step says renamed columns if let’s say maybe i didn’t want to rename that column i can click on the x and in a sense that’s like undoing that change and here i can click through all the steps and i can basically go back in time as i was making these different changes so here if i go right up to the top this brings me back to what the data looked like at the very beginning so it’s just a quick way to refresh yourself on what all the different changes are now i have all of the data in a pretty good state and i now want to get this into microsoft excel as a next step let’s go back over onto the left hand side and let’s click into pdf right here i’ve now clicked into pdf and this doesn’t look good it looks like we are confronted with an error message here it says the column column one of the table wasn’t found now if you remember as we went through this sample file we changed a whole bunch of these different column headers we made changes to all the different data here and so this is still referring to outdated data here if you look up above you’ll notice that it’s no longer column one we switch the rows to be the column headers so right up here we have a step called change type we could simply delete that step there and now we have all of the data showing up properly so that’s exactly how we want it to look once again over on the left hand side we currently have the pdf query selected and this shows us all of the data combined and one thing you’ll notice is it took all of our transformations from the sample file and it applied it to the overall file here in pdf so now we have all of the aggregated data and look at that it’s pulling together every single pdf and here it’s showing us all of the data across all of them so it looks pretty good before we import it into excel though there is one additional change that i want to make if we look at the different columns here you’ll notice that the data type isn’t quite right for all of them especially here when i go over and i look at delivery date it’s currently not set to a date format and i want to set that before i bring it into excel if i just bring it in as is it’ll bring in a really weird number that corresponds with this date format so we can select all of this data press ctrl a and then let’s go to the top ribbon click on transform and there’s the option to detect the data type let’s click on this and check that out it automatically identified the data type here it saw that this is a date it saw that this is a number here it saw that this was text so it set the appropriate data type for all of these also over on the left hand side i don’t need the source name this isn’t a column i need to remove this column let’s click on home over here and right here there’s the option to remove a column i’m simply going to click on remove column and that now deleted it once again it also added a step over here so if we made a change that we didn’t intend to make we could also remove that last step okay so we’ve been spending a bunch of time in the power query editor let’s now get this data into excel up in the top left hand corner let’s now click on close and load and let’s bring it in and check that out it has now successfully extracted all of the data from those five different pdfs directly into microsoft excel and not only has it imported all of the data it’s also taken all of our transformations so now everything is ordered by column this is really looking great now it’s one thing to pull in all of the data and get it into excel but let’s say another order comes in how do we make sure we capture it and right here i’m just going to paste in a new order you know at the kevin cookie company we have a lot of people ordering cookies so these order forms come in all the time so now i not only have these five orders but we now have a sixth one how do i make sure that it shows up in my excel sheet now once again if you remember when we first launched in the power query i said that you’re going to save time in the long run by putting in the work up front and so i now have my sixth pdf you don’t see it here yet but once again we need to refresh this if i go over here i can manually refresh the sheet i’ll click on this and look at that it automatically pulled in the new order so now i have six orders here so anytime a new file gets added to that folder this will automatically reflect that just like we looked at earlier one option is to manually refresh but we can also have excel periodically check for us we simply right click here and then we can go down to properties and here you can set how often you want to refresh so this way you’ll always have the latest data within your sheet all right well that’s how easy it is if this video helped you please give it a thumbs up to see more videos like this in the future make sure to hit that subscribe button also if you want to see me cover any other topics on this channel leave a note down below alright well that’s all i had for you today i hope you enjoyed and as always i hope to see you next time bye [Music]