How To Clean Data in Excel
You’ll learn how to “clean up” data in this lesson by using the TRIM, PROPER, and CLEAN functions to remove extra spaces, non-printable characters, and fix the capitalization of words – and you’ll apply it directly to the customer due diligence file we’ve been working with.
Files and Resources
Quick Reference Guides
Text Manipulation Formulas (Across all PC and Mac versions):
- =TRIM: Remove extra spaces
- =PROPER: Makes first letter in each word uppercase
- =CLEAN: Removes all non-printable characters from text
- =VALUE: Converts text to number
- =TEXT: Converts number or text to new format
PC 2007+ Shortcuts:
- Alt + A + E: Text to Columns
- Ctrl + C: Copy
- Ctrl + Alt + V + F: Paste Formulas
- Ctrl + Alt + V + V: Paste Values
PC 2003 Shortcuts:
- Alt + D + E: Text to Columns
- Ctrl + C: Copy
- Alt + E + S + F: Paste Formulas
- Alt + E + S + V: Paste Values
- No equivalent: Text to Columns
- ⌘ + C: Copy
- Ctrl + ⌘ + V, ⌘ + F: Paste Formulas
- Ctrl + ⌘ + V, ⌘ + V: Paste Values
Useful Function Combos:
- =TRIM(PROPER()): Removes extra spaces and capitalizes each word of text
- =VALUE(TRIM(CLEAN())): Converts numbers to text and removes non-printable characters
How To Clean Data in Excel – Transcript
Hello and welcome to our lesson on cleaning up data, which is one of the final ones in this Formatting Module. What we’re going to be doing in this lesson is using some of these functions that you see onscreen. So trim, trim with spaces, proper to properly capitalize. Well, not really capitalize, but capitalize each word in text. Clean to remove non-printable characters, and then value and text possibly, to change around the formatting of text and numbers.
We’ll be using those along with some of the copy and paste functions in Excel to take our address data over here in this Customer Order due diligence file that shows us customer orders of the past 10 years or over a 10-year period. We’re going to go in and fix all these addresses. You can see the problem right now that some parts of them, like the state abbreviations are properly capitalized whereas the cities are not, and the addresses themselves are also not properly capitalized.
So we’re going to go in and fix all that using these functions. What I’m going to do here is since we’re really just going over functions that we’ve learned previously, I’m going to do it the first time, very quickly. Then you’re going to try it on your own via an exercise and then if you don’t get it or you forget the keys or you can’t reproduce it, then we’ll walk through it more slowly together, and I’ll go through each step and all the shortcut keys in detail.
So here is what we’re going to do first. The first thing I like to do here is create some extra columns, so that we have actual space to do this. Now here we can see that we basically have four fields. We have the address itself; then we have the city, then the state, and then the zip code. So we’re going to need four columns. ‘CTRL + Spacebar’, and then ‘CTRL + SHFT + plus’. We can change the width here with ‘ALT + OCW’. I’ll change that to 10, and then ‘CTRL + Spacebar’, ‘CTRL + SHFT + plus’. Press ‘F4’ to do that a few more times or ‘CMD + Y’ on the Mac.
‘ALT + OCW’ to change the width, and then what we’re going to do here is press ‘SHFT + CTRL + down arrow key’, and then we’re going to use the text to columns feature in Excel that we went over briefly in the import/export lesson. Press ‘ALT + A’ to go to data or ‘ALT + D’ for data in Excel 2003 and older versions, ‘ALT + AE’ for text to columns. On the Mac remember there’s no shortcut key, so you have to use the mouse to go up to the ribbon to do this.
We want to select delimited here, and then next, and we want commas, and then for the destination what we want to do is select F3 instead. So I actually use the mouse, you could just press ‘F2’ or ‘CTRL + U’ on the Mac to exit out and go and do that. And then I’m going to say finish, and then you can see how all of these are separate fields now.
Now we still have a problem, which is that the states here, these are not correct. We have some issues here and we have some extra spaces, so we’re going to have to fix that. But what we can do now is go in and insert some extra columns. Now before I even get started with this, make sure that you go to ‘ALT + TO’ in Excel or ‘CMD + comma’ on the Mac.
Go to formulas or calculations. Make sure that workbook calculations are set to ‘automatic’. If they’re not then this is not going to work correctly the first time around, so make sure these are set to ‘automatic’. And so what I’m going to do here is use the trim and proper functions to get rid of all these extra spaces, and then to get the capitalization correct.
So I’m going to say equals proper trim and we’re going to target this text. We’re going to copy this formula all the way down with ‘CTRL + C’ or ‘CMD + C’ on the Mac; ‘CTRL + down arrow key’, and then ‘ALT + ESF’ or ‘CTRL + ALT + VF’ or ‘CMD + CTRL + V’ and then ‘CMD + F’ on the Mac. Then we can just copy this over with ‘CTRL + C’ and ‘CTRL + V’ or ‘CMD + C’ and ‘CMD + V’ on the Mac. We have that.
And then we have all of these in place, so all of these are correct now. What I am now going to do is copy and paste values. So I’m going to use the ‘SHFT + right arrow keys’, ‘SHFT + CTRL + down’, ‘CTRL + C’ or ‘CMD + C’ on the Mac, then ‘ALT + ESV’ or ‘CTRL + ALT + VV’ or ‘CMD + V’ on the Mac. So you have those. And then what I can do now is delete the unnecessary columns. So I’m going to highlight this first one, which does not have the correct capitalization, ‘CTRL + Spacebar’, ‘CTRL + minus’.
Do the same thing for the cities, ‘CTRL + Spacebar’, ‘CTRL + minus’, and then do the same thing for the states and zip codes; ‘CTRL + Spacebar’, ‘CTRL + minus’. Now let’s auto-fit these column widths so we can see everything. So ‘SHFT + CTRL’, and then ‘SHFT + right arrow key’, ‘ALT + OCA’. On the Mac you cannot do this. You have to go to the home ribbon, and go to format and then auto-fit column width instead.
So let’s take a look at this. The addresses here look much better. The cities look much better, but these states and zip codes are still not quite right. What can we do to fix this? The easiest thing to do here is to actually insert a few extra columns, so I’m going to press ‘CTRL + spacebar’, and then ‘CTRL + SHFT + plus’ to insert those two columns twice. Now what we can do here is use the text to columns feature, just on this part.
So I’m going to highlight this whole thing, ‘SHFT + CTRL + down arrow key’, and then ‘ALT + AE’ or ‘ALT + DE’ in older versions of Excel. On the Mac you have to use the mouse, go up to data, delimited. We are separating these by a space, so let’s do that. And then for the destination cell let’s say I3, and we’ll say OK to this warning message error that Excel gives us, so we have that.
So we have all of these in place now, and so what we need to do very quickly is just uppercase, capitalize properly everything in these state abbreviations. So let’s go through and do that quickly. I’m going to delete the contents of this column, because we don’t need it anymore. And then we’re just going to use the upper function. Remember this converts a text string to all uppercase letters.
Use that on the states, and then ‘CTRL + down arrow key’, and then ‘SHFT + CTRL + up’, then ‘ALT + ESF’ or ‘CTRL + ALT + VF’ or ‘CMD + CTRL + V’ and then ‘CMD + F’ on the Mac. We have this. And then once again, we can just copy and paste these values using the same shortcut combinations, except we select values under paste special now and then ‘CTRL + spacebar’ to highlight this column, and then ‘CTRL + minus sign’ to delete it. So now if we go and look at some of these, you can see that it’s much better overall.
Now there are still some issues here. For example, we’re missing a trailing zero in front of this zip code. And if you really look and dig into the data here, you’ll see that there are some other problems. For example, whenever we have NE or SW here, or something of that nature. Well, we have an issue because it’s not quite displaying correctly. What’s going on here is that the NE that is capital N, and lowercase E rather than all caps there.
So there are some issues and with these zip codes, similar problems apply. And if you really want to, if you want an added challenge you could go in and fix this. Probably the easiest way to do is to look in this column. Look for anything that only has two letters in it, and then make sure you apply proper to that. So you can do a search and then apply the upper function to that. So that’s probably the easiest thing to do here, if you want an added challenge, but we’re just going to leave it at this, because I don’t want this lesson to go on for too long.
So what you should do now is replicate everything that I just showed you. So take this data and convert it into the columns that you see here using the steps that I just went through. What I’m going to do now is just delete all of this data, so that you do not see it anymore, and then you can go in and try it yourself, and maybe you’ll even find a slightly better way to do it, over what I showed you just now. So I’m going to delete this. I am going to remove these columns.
So pause this video right now. Give it a shot yourself. If you get confused or you can’t quite get it, un-pause it, then we’ll walk through it, and I’ll go through it a bit more slowly this time around and show you all the shortcut keys. Okay, good. So here’s what we’re going to do. Once again, we’re going to start by inserting extra columns over here. So I’m going to press ‘CTRL + spacebar’, and then ‘CTRL + SHFT + plus’ a few times to insert some extra columns. I’m going to change the width of all these columns by going ‘ALT + OCW’. I’ll change it to 10.
And then what we’re going to do is select this whole area and say, ‘SHFT + CTRL + Down’, and ‘ALT + AE’ for text to columns or ‘ALT + DE’ in older versions of Excel or use the mouse on the Mac. We want delimited. So we’ll say next, and then we are going to have a space, actually we’re going to have a comma separating all of these values for now. So we have a comma in between all of these. I’m going to hit next once again using ‘ALT + N’ or the mouse.
We’re going to stick to general. For destination, I’m going to say F3 right here, and then we’re going to press ‘ALT + F’ for finish or just click this. And Excel is giving this warning. We’ll just say OK. So we have all of this in place now, and we can actually delete this data, because we don’t need it anymore. So ‘SHFT + CTRL + down arrow key’, and then hit the delete key. Now this time around I’m going to do some more thinking about how to properly do this, and maybe how to do it a little bit more efficiently.
So what we can do now is use the proper function and I’m just going to apply it to these two columns, because clearly we don’t need to apply a proper to these state abbreviations. So that can save us some time this time around. So let’s say proper and then trim. Trim is to get rid of the extra spaces in between here on this data. So we have that, and we can see it’s already looking much better.
And then we’ll insert another column here, so ‘CTRL + spacebar’, and then ‘CTRL + SHFT + plus’, and then we’re going to apply the same function right here to these cities. So now let’s copy these formulas down, so ‘CTRL + C’, and then ‘CTRL + down arrow key’, and then ‘SHFT + CTRL + up arrow key’, and then we can use ‘ALT + ESF’, and then ‘CMD + CTRL + V’ and ‘CMD + F’ on the Mac to paste all of these in, paste these formulas. We have that.
And then we’re going to do the same thing for these others over here. So just take this formula, ‘CTRL + C’ or ‘CMD + C’ on the Mac, ‘CTRL + Down arrow key’, and then ‘SHFT + CTRL + up arrow key’, and then ‘ALT + ESF’ or ‘CMD + CTRL + V’, ‘CMD + F’ to paste formulas, and we have all of these in place, and the capitalization is fixed for the most part. Now what we can do is just paste all these as values and get rid of the unnecessary data. So I’m going to use the ‘SHFT + right arrow key’, ‘CTRL + C’ or ‘CMD + C’, and then ‘SHFT + CTRL + up’. So copy this whole area, ‘CTRL + C’ or ‘CMD + C’.
And then special paste as values, so ‘ALT + ESV’ on the PC or ‘CMD + CTRL + V’, ‘CMD + V’ on the Mac. So we have that, and all of these are in place now. And so now what we can do is get rid of some of the data that we don’t need with poor capitalization, so this column for example. ‘CTRL + spacebar’, ‘CTRL + minus key’ gets rid of it. And then the same thing right over here.
Now these state abbreviations, what do we actually need to fix here? Well, the capitalization is already all correct, but we want to do two things. Number one is we want to separate the states and the zip codes, and then number two is we want to get rid of all these extra trailing spaces. So to do that what I can do is go to the top, and enter the trim function, equals trim. Apply it to these state abbreviations, state, and zip codes really, so we have that.
And then same idea ‘CTRL + C’, go to the bottom and then ‘ALT + ESF’ or ‘CMD + CTRL + V’, ‘CMD + F’ on the Mac, paste formulas, so we have that. And then once again, we can copy and paste this whole area, ‘SHFT + CTRL + up arrow key’, and then ‘CTRL + C’, ‘ALT + ESV’ or ‘CMD + CTRL + V’ or ‘CMD + V’ on the Mac. And so we have this, and you can see how we’ve gotten rid of the extra trailing spaces, and possibly the extra following spaces here.
So let’s delete this unnecessary column now. And then the last thing I’m going to do is just separate the state and zip code here. So I’m going to insert a new column, ‘CTRL + spacebar’, and then ‘CTRL + SHFT + plus sign’ to do that. Highlight this whole with ‘SHFT + CTRL + down arrow’, ‘ALT + AE’ for text to columns or ‘ALT + DE’ in older versions of Excel.
We want delimited. This time we have a space separating both of these, so we want that. So we selected space, and then ‘ALT + N’ or just click on next. And then destination cell, we want H3 right here. So I’m going to go finish or ‘ALT + F’ for that. We’re going to ignore this warning and you can just say OK, and so we have that. We can see that once again, it’s not perfect, some of our zip codes here are not quite right. So this zero for example, Excel has deleted this inadvertently, but overall the data is in much better shape. So we have that.
And now as the final step what we can do is just write address at the top here, and then city, state, zip, and we can auto-fit all of these columns ‘SHFT + CTRL + down’, hold down ‘SHFT’ and release ‘CTRL’ and then right arrow key a few times. And then a couple of ways we can auto-fit. We can go to ‘ALT + OCW’ or ‘ALT + H’ for home. O for format. and I for auto-fit. Column width on the Mac you have to use the mouse, go to the home ribbon, format, and then do the same things, and we have that set and formatted now.
So again, this is not perfect. Some of the data here will still need to be cleaned up. For example, the issue that I mentioned before, the abbreviations for the addresses not quite being right. There’s the issue with the zip codes here, and how some of the zeros have been deleted. So if you want an extra challenge, you can go in now and try to use some of the functions we went over before, like length for example, to fix this issue with the zip codes. And do the same thing really with the addresses here. So as an extra challenge if you want to further fix the data, you can do that.
But for now we’re going to stop, because this is already looking much better. It’s much more readable, and understandable, and easier to manipulate in Excel. So that’s it for our lesson on how to clean up data by using all of these functions; trim, proper, clean, and then the normal, copy and paste formulas and values, and text to column functions; as well as the basic navigation control arrow keys, and ‘CTRL + SHFT + arrow keys’ in Excel.
Now you should have a better idea of how to use this, and how you can use it in real life to clean up data. So that’s it for this lesson. Coming up next we’re going to be jumping into sorting and filtering data, and then go into a few more advanced topics on the subject of formatting within Excel.