Delete all but the most recent X files in bash

Ezra UNIX/Linux

<div class="paragraph"> <p>To give a bit more of a concrete example, imagine some cron job writing out a file (say, a log file or a tar-ed up backup) to a directory every hour. I’d like a way to have another cron job running which would remove the oldest files in that directory until there are less than, say, 5.</p> </div> <div class="paragraph"> <p>And just to be clear, there’s only one file present, it should never be deleted.</p> </div> <div class="paragraph"> <p>Here’s a pragmatic, POSIX-compliant solution that comes with only one caveat: it cannot handle filenames with embedded newlines - but I don’t consider that a real-world concern for most people.</p> </div> <div class="quoteblock"> <blockquote> <div class="paragraph"> <p>For the record, here’s the explanation for <a href="http://mywiki.wooledge.org/ParsingLs">why it’s generally not a good idea to parse ls output</a>.</p> </div> </blockquote> </div> <div class="listingblock"> <div class="content"> <pre class="highlight"><code class="language-sh" data-lang="sh">ls -tp | grep -v '/$' | tail -n +6 | xargs -I {} rm — {}</code></pre> </div> </div> <div class="admonitionblock note"> <table> <tr> <td class="icon"> <div class="title">Note</div> </td> <td class="content"> <div class="paragraph"> <p>This command operates in the <strong>current directory</strong>; to target a directory explicitly, use a subshell <code>…​</code> with <code>cd</code>:</p> </div> <div class="listingblock"> <div class="content"> <pre class="highlight"><code class="language-sh" data-lang="sh">(cd /path/to && ls -tp | grep -v '/$' | tail -n +6 | xargs -I {} rm — {})</code></pre> </div> </div> <div class="paragraph"> <p>The same <strong>applies analogously</strong> to the commands below.</p> </div> </td> </tr> </table> </div> <div class="paragraph"> <p>The above is <strong>inefficient</strong>, because <code>xargs</code> has to invoke <code>rm</code> separately <strong>for each filename</strong>.</p> </div> <div class="paragraph"> <p>However, your platform’s specific <code>xargs</code> implementation may allow you to solve this problem:</p> </div> <div class="paragraph"> <p>A solution that works with GNU <code>xargs</code> is to use <code>-d '\n'</code>, which makes <code>xargs</code> consider each input line a separate argument, yet passes as many arguments as will fit on a command line at once:</p> </div> <div class="listingblock"> <div class="content"> <pre class="highlight"><code class="language-sh" data-lang="sh">ls -tp | grep -v '/$' | tail -n +6 | xargs -d '\n' -r rm --</code></pre> </div> </div> <div class="admonitionblock note"> <table> <tr> <td class="icon"> <div class="title">Note</div> </td> <td class="content"> <div class="paragraph"> <p>Option <code>-r</code> (<code>--no-run-if-empty</code>) ensures that <code>rm</code> is not invoked if there’s <strong>no input</strong>.</p> </div> </td> </tr> </table> </div> <div class="paragraph"> <p>A solution that <strong>works with both GNU <code>xargs</code> and BSD <code>xargs</code></strong> (including on <strong>macOS</strong>) - though technically still <strong>not POSIX-compliant</strong> - is to use <code>-0</code> to handle <code>NUL</code>-separated input, after first translating newlines to <code>NUL</code> (<code>0x0</code>) chars., which also passes (typically) all filenames at once:</p> </div> <div class="listingblock"> <div class="content"> <pre class="highlight"><code class="language-sh" data-lang="sh">ls -tp | grep -v '/$' | tail -n +6 | tr '\n' '\0' | xargs -0 rm --</code></pre> </div> </div> <div class="paragraph"> <p>Explanation:</p> </div> <div class="ulist"> <ul> <li> <p><code>ls -tp</code> prints the names of filesystem items sorted by how recently they were modified , in descending order (most recently modified items first) (<code>-t</code>), with directories printed with a trailing <code>/</code> to mark them as such (<code>-p</code>).</p> <div class="ulist"> <ul> <li> <p><strong>Note</strong>: It is the fact that <code>ls -tp</code> always outputs file/directory names only, not full paths, that necessitates the subshell approach mentioned above for targeting a directory other than the current one (<code>(cd /path/to && ls -tp …​)</code>).</p> </li> </ul> </div> </li> <li> <p><code>grep -v '/$'</code> then weeds out directories from the resulting listing, by omitting (<code>-v</code>) lines that have a trailing <code>/</code> (<code>/$</code>).</p> <div class="ulist"> <ul> <li> <p><strong>Caveat</strong>: Since a symlink that points to a directory is technically not itself a directory, such symlinks will not be excluded.</p> </li> </ul> </div> </li> <li> <p><code>tail -n +6</code> skips the first <strong>5 entries</strong> in the listing, in effect returning all but the 5 most recently modified files, if any.</p> <div class="ulist"> <ul> <li> <p><strong>Note</strong> that in order to exclude <code>N</code> files, <code>N+1</code> must be passed to <code>tail -n +</code>.</p> </li> </ul> </div> </li> <li> <p><code>xargs -I {} rm — {}</code> (and its variations) then invokes on <code>rm</code> on all these files; if there are no matches at all, <code>xargs</code> won’t do anything.</p> <div class="ulist"> <ul> <li> <p><code>xargs -I {} rm — {}</code> defines placeholder <code>{}</code> that represents each input line as a whole, so <code>rm</code> is then invoked once for each input line, but with filenames with embedded spaces handled correctly.</p> </li> <li> <p><code>--</code> in all cases ensures that any filenames that happen to start with <code>-</code> aren’t mistaken for options by <code>rm</code>.</p> </li> </ul> </div> </li> </ul> </div> <hr> <div class="paragraph"> <p>A <strong>variation</strong> on the original problem, <strong>in case the matching files need to be processed individually or collected in a shell array</strong>:</p> </div> <div class="listingblock"> <div class="content"> <pre class="highlight"><code class="language-sh" data-lang="sh"># One by one, in a shell loop (POSIX-compliant): ls -tp | grep -v '/$' | tail -n +6 | while IFS= read -r f; do echo "$f"; done</code></pre> </div> </div> <div class="listingblock"> <div class="content"> <pre class="highlight"><code class="language-sh" data-lang="sh"># One by one, but using a Bash process substitution (<(…​), # so that the variables inside the while loop remain in scope: while IFS= read -r f; do echo "$f"; done < <(ls -tp | grep -v '/$' | tail -n +6)</code></pre> </div> </div> <div class="listingblock"> <div class="content"> <pre class="highlight"><code class="language-sh" data-lang="sh"># Collecting the matches in a Bash array: IFS=$'\n' read -d '' -ra files < <(ls -tp | grep -v '/$' | tail -n +6) printf '%s\n' "${files[@]}" # print array elements</code></pre> </div> </div>