Download from a List of Files from Windows


TL;DR – You can use a FOR loop and curl to download a list of URLs from a stock Windows 10 distribution.

Updated December 7, 2020

Sometimes you need to download a bunch of files from a site, such as from the DAV bulk download. Enough that you don’t want to click and wait for each one in a browser. Perhaps you even have a list of those files. In many cases, you can just use a download tool like wget or uGet. Wget comes stock in Linux, but neither of those come stock on Windows. Easy enough to get them, but not if you’re in a situation where you can’t install programs on your machine. This is pretty common in the public sector and I wouldn’t be surprised if it’s the norm in the private sector too. In that case, how do you do it? I’m assuming you don’t have any programming tools, like Python, Perl, Rust, etc., installed and that route isn’t open to you. All you have is whatever ships with Windows 10.

Luckily Windows 10 does ship with some things in the cmd.exe program that can get you there. Let’s assume you’ve got all your URLs to download in a file called lof.txt, consisting of one URL per line. What we’re going to need to do is:

  1. read a URL from the file one at a time;
  2. figure out an output name for the URL;
  3. download the file and store it as the output name; and
  4. repeat for the next line.

For all of this, you’ll need to use the cmd.exe window. If that’s new to you, just type ‘cmd.exe’ in the search in the Windows menu bar. You should get a black window with the title ‘Command Prompt’. Later you’ll see that you really need ‘cmd.exe /v’, so you might want to go with that now.

Read through the file

To read a line at a time from our file (lof.txt), we can use the following command. We’ll print out each line just so we can see it.

FOR /f "tokens=*" %a in (lof.txt) do @echo url=%a 

Download file

That’s the basics of how we’ll do step 1. Let’s talk about step 3 since it will show why we have to do step 2. To download the content of a URL, you can use the built-in curl.exe command. Type curl -h in your command window to see the help for it. At the most basic, you can just give curl a URL as an argument and it will spew back the contents of that URL to the screen. For example, try:

curl https://coast.noaa.gov

Get a filename

However, we have a bunch of URLs and we don’t want them sent to the screen, we want them stored in files with names that make some sense. We can store them in a file by using the -o or –output option. That means we need to figure out an output name. It makes sense to base that output name on the URL in some way. For the moment, I’ll assume all our URLs have the same basic location, for example they might all be *.laz files under https://coast.noaa.gov/htdata/lidar4_z/geoid18/data/8937/ms/. If our first file was https://coast.noaa.gov/htdata/lidar4_z/geoid18/data/8937/ms/20191110_NCMP_MS_16RCU6542.laz, we’d want the output file to be 20191110_NCMP_MS_16RCU6542.laz. Our curl command for that would look like:

curl -o 20191110_NCMP_MS_16RCU6542.laz https://coast.noaa.gov/htdata/lidar4_z/geoid18/data/8937/ms/20191110_NCMP_MS_16RCU6542.laz

Delayed Expansion (NOT)

My original post had a whole section here on dealing with delayed expansion. However, a kind reader noted that there are options in the FOR loop to make that unnecessary, so I’ve updated with his suggestions. By using the ~ notation when referencing the variable we can get just the filename and extension for our output file. Specifically, if our FOR loop had variable %a, then we can use %~nxa to get the filename and extension part. That means we can wrap our curl command in a FOR loop with:

FOR /f "tokens=*" %a in (lof.txt) do curl -o %~nxa %a

Batch it

If you’d rather do this in a batch file instead of on the command line, you can. Essentially, you just put all that in a file that ends in .bat and you can run that or double click it. There is a change you have to make though. In a batch file, the loop variable is done with double percent signs (%%a instead of %a).


FOR /f "tokens=*" %%a in (lof.txt) do  curl -o %%~nxa %%a   

Caveats

This still has some problems. It won’t handle subdirectories correctly. If our list of URLs had some files under “ms” and some under “al” and we needed to keep them separated that way, our command above won’t do it because it will fail to output into file ms/something.laz. One approach to handle that if there are only a few directories is to create needed output directories first and then do an extra substitution to change the / to \ in the filename. A more complicated approach is to test for the existence of the needed directories and create them if needed and then do the substitutions. That is left as an exercise for the reader.

One last caveat – I don’t do a lot of windows command-line scripting, so there may be a better way. Please comment with any improvements (Thanks to Mike Brown for the tip on FOR loop options). It also may be easier to do in Powershell, but I believe restrictions are often put on running Powershell, so I’ve assumed that isn’t an option.

3 comments

  1. No need for delayed expansion and setting temporary variables. Just use the variable substitution mentioned near the end of “for /?”:

    for /f “tokens=*” %a in (lof.txt) do curl -o %~nxa %a

    Liked by 1 person

      • First thank you for the post!

        I have been looking a long time for how to loop over URL’s in a windows setting. Coming from using Linux environment and being used to ‘wget’ it was a pain to try and find a native solution. So thanks for pointing towards using FOR along with curl command.
        Here is a tip, which is not documented for Curl, but it works!

        use -O (capital letter O) as an option for output of curl. That will just the current file name and store it as it is. That is, if you want to keep the file name as original. This was, just read the URL’s from the file and curl will download and save the file with the original name. The command then becomes:

        for /f “tokens=*” %a in (lof.txt) do curl -O %a

        Hope it might help someone else
        Cheers

        Like

Leave a Reply. Comments are moderated.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.