FilenameUtils.getName 函數源碼分析

Posted on 2022-09-27 by WalkonNet

一、背景

最近用到瞭 org.apache.commons.io.FilenameUtils#getName 這個方法，該方法可以傳入文件路徑，獲取文件名。簡單看瞭下源碼，雖然並不復雜，但和自己設想略有區別，值得學習，本文簡單分析下。

二、源碼分析

org.apache.commons.io.FilenameUtils#getName

 /**
     * Gets the name minus the path from a full fileName.
     * <p>
     * This method will handle a file in either Unix or Windows format.
     * The text after the last forward or backslash is returned.
     * 
<pre>
     * a/b/c.txt --&gt; c.txt
     * a.txt     --&gt; a.txt
     * a/b/c     --&gt; c
     * a/b/c/    --&gt; ""
     * </pre>
* <p>
     * The output will be the same irrespective of the machine that the code is running on.
     *
     * @param fileName  the fileName to query, null returns null
     * @return the name of the file without the path, or an empty string if none exists.
     * Null bytes inside string will be removed
     */
    public static String getName(final String fileName) {
     // 傳入 null 直接返回 null 
        if (fileName == null) {
            return null;
        }
        // NonNul 檢查
        requireNonNullChars(fileName);
       //  查找最後一個分隔符
        final int index = indexOfLastSeparator(fileName);
     // 從最後一個分隔符竊到最後
        return fileName.substring(index + 1);
    }

2.1 問題1：為什麼需要 NonNul 檢查？

2.1.1 怎麼檢查的？

org.apache.commons.io.FilenameUtils#requireNonNullChars

   /**
     * Checks the input for null bytes, a sign of unsanitized data being passed to to file level functions.
     *
     * This may be used for poison byte attacks.
     *
     * @param path the path to check
     */
    private static void requireNonNullChars(final String path) {
        if (path.indexOf(0) >= 0) {
            throw new IllegalArgumentException("Null byte present in file/path name. There are no "
                + "known legitimate use cases for such data, but several injection attacks may use it");
        }
    }

java.lang.String#indexOf(int) 源碼：

 /**
     * Returns the index within this string of the first occurrence of
     * the specified character. If a character with value
     * {@code ch} occurs in the character sequence represented by
     * this {@code String} object, then the index (in Unicode
     * code units) of the first such occurrence is returned. For
     * values of {@code ch} in the range from 0 to 0xFFFF
     * (inclusive), this is the smallest value <i>k</i> such that:
     * <blockquote><pre>
     * this.charAt(<i>k</i>) == ch
     * </pre></blockquote>
     * is true. For other values of {@code ch}, it is the
     * smallest value <i>k</i> such that:
     * <blockquote><pre>
     * this.codePointAt(<i>k</i>) == ch
     * </pre></blockquote>
     * is true. In either case, if no such character occurs in this
     * string, then {@code -1} is returned.
     *
     * @param   ch   a character (Unicode code point).
     * @return  the index of the first occurrence of the character in the
     *          character sequence represented by this object, or
     *          {@code -1} if the character does not occur.
     */
    public int indexOf(int ch) {
        return indexOf(ch, 0);
    }

可知，indexOf(0) 目的是查找 ASCII 碼為 0 的字符的位置，如果找到則拋出 IllegalArgumentException異常。搜索 ASCII 對照表，得知 ASCII 值為 0 代表控制字符 NUT，並不是常規的文件名所應該包含的字符。

2.1.2 為什麼要做這個檢查呢？

null 字節是一個值為 0 的字節，如十六進制中的 0x00。存在與 null 字節有關的安全漏洞。因為 C 語言中使用 null 字節作為字符串終結符，而其他語言（Java，PHP等）沒有這個字符串終結符；例如，Java Web 項目隻允許用戶上傳 .jpg 格式的圖片，但利用這個漏洞就可以上傳 .jsp 文件。如用戶上傳 hack.jsp<NUL>.jpg 文件， Java 會認為符合 .jpg 格式，實際調用 C 語言系統函數寫入磁盤時講當做字符串分隔符，結果將文件保存為 hack.jsp。有些編程語言不允許在文件名中使用 ·· <NUL>，如果你使用的編程語言沒有對此處理，就需要自己去處理。因此，這個檢查很有必要。

代碼示例：

package org.example;
import org.apache.commons.io.FilenameUtils;
public class FilenameDemo {
    public static void main(String[] args) {
        String filename= "hack.jsp\0.jpg";
        System.out.println( FilenameUtils.getName(filename));
    }
}

報錯信息：

Exception in thread "main" java.lang.IllegalArgumentException: Null byte present in file/path name. There are no known legitimate use cases for such data, but several injection attacks may use it
   at org.apache.commons.io.FilenameUtils.requireNonNullChars(FilenameUtils.java:998)
   at org.apache.commons.io.FilenameUtils.getName(FilenameUtils.java:984)
   at org.example.FilenameDemo.main(FilenameDemo.java:8)

如果去掉校驗：

package org.example;
import org.apache.commons.io.FilenameUtils;
public class FilenameDemo {
    public static void main(String[] args) {
        String filename= "hack.jsp\0.jpg";
        // 不添加校驗
        String name = getName(filename);
        // 獲取拓展名
        String extension = FilenameUtils.getExtension(name);
        System.out.println(extension);
    }
    public static String getName(final String fileName) {
        if (fileName == null) {
            return null;
        }
        final int index = FilenameUtils.indexOfLastSeparator(fileName);
        return fileName.substring(index + 1);
    }
}

Java 的確會將拓展名識別為 jpg

jpg

JDK 8 及其以上版本試圖創建 hack.jsp\0.jpg 的文件時，底層也會做類似的校驗，無法創建成功。

大傢感興趣可以試試使用 C 語言寫入名為 hack.jsp\0.jpg 的文件，最終很可能文件名為 hack.jsp。

2.2 問題2：為什麼不根據當前系統類型來獲取分隔符？

查找最後一個分隔符 org.apache.commons.io.FilenameUtils#indexOfLastSeparator

 /**
     * Returns the index of the last directory separator character.
     * <p>
     * This method will handle a file in either Unix or Windows format.
     * The position of the last forward or backslash is returned.
     * <p>
     * The output will be the same irrespective of the machine that the code is running on.
     *
     * @param fileName  the fileName to find the last path separator in, null returns -1
     * @return the index of the last separator character, or -1 if there
     * is no such character
     */
    public static int indexOfLastSeparator(final String fileName) {
        if (fileName == null) {
            return NOT_FOUND;
        }
        final int lastUnixPos = fileName.lastIndexOf(UNIX_SEPARATOR);
        final int lastWindowsPos = fileName.lastIndexOf(WINDOWS_SEPARATOR);
        return Math.max(lastUnixPos, lastWindowsPos);
    }

該方法的語義是獲取文件名，那麼從函數的語義層面上來說，不管是啥系統的文件分隔符都必須要保證得到正確的文件名。試想一下，在 Windows 系統上調用該函數，傳入一個 Unix 文件路徑，得不到正確的文件名合理嗎？函數設計本身就應該考慮兼容性。因此不能獲取當前系統的分隔符來截取文件名。源碼中分別獲取 Window 和 Unix 分隔符，有哪個用哪個，顯然更加合理。

三、Zoom Out

3.1 代碼健壯性

我們日常編碼時，要做防禦性編程，對於錯誤的、非法的輸入都要做好預防。

3.2 代碼嚴謹性

我們寫代碼一定不要想當然。我們先想清楚這個函數究竟要實現怎樣的功能，而且不是做一個 “CV 工程師”，無腦“拷貝”代碼。同時，我們也應該寫好單測，充分考慮各種異常 Case ，保證正常和異常的 Case 都覆蓋到。

3.3 如何寫註釋

org.apache.commons.io.FilenameUtils#requireNonNullChars 函數註釋部分就給出瞭這麼設計的原因：This may be used for poison byte attacks.

註釋不應該“喃喃自語”講一些顯而易見的廢話。對於容易讓人困惑的設計，一定要通過註釋講清楚設計原因。

此外，結合工作經驗，推薦一些其他註釋技巧：（1）對於稍微復雜或者重要的設計，可以通過註釋給出核心的設計思路；如： java.util.concurrent.ThreadPoolExecutor#execute

    /**
     * Executes the given task sometime in the future.  The task
     * may execute in a new thread or in an existing pooled thread.
     *
     * If the task cannot be submitted for execution, either because this
     * executor has been shutdown or because its capacity has been reached,
     * the task is handled by the current {@link RejectedExecutionHandler}.
     *
     * @param command the task to execute
     * @throws RejectedExecutionException at discretion of
     *         {@code RejectedExecutionHandler}, if the task
     *         cannot be accepted for execution
     * @throws NullPointerException if {@code command} is null
     */
    public void execute(Runnable command) {
        if (command == null)
            throw new NullPointerException();
        /*
         * Proceed in 3 steps:
         *
         * 1. If fewer than corePoolSize threads are running, try to
         * start a new thread with the given command as its first
         * task.  The call to addWorker atomically checks runState and
         * workerCount, and so prevents false alarms that would add
         * threads when it shouldn't, by returning false.
         *
         * 2. If a task can be successfully queued, then we still need
         * to double-check whether we should have added a thread
         * (because existing ones died since last checking) or that
         * the pool shut down since entry into this method. So we
         * recheck state and if necessary roll back the enqueuing if
         * stopped, or start a new thread if there are none.
         *
         * 3. If we cannot queue task, then we try to add a new
         * thread.  If it fails, we know we are shut down or saturated
         * and so reject the task.
         */
        int c = ctl.get();
        if (workerCountOf(c) < corePoolSize) {
            if (addWorker(command, true))
                return;
            c = ctl.get();
        }
        if (isRunning(c) && workQueue.offer(command)) {
            int recheck = ctl.get();
            if (! isRunning(recheck) && remove(command))
                reject(command);
            else if (workerCountOf(recheck) == 0)
                addWorker(null, false);
        }
        else if (!addWorker(command, false))
            reject(command);
    }

（2)對於關聯的代碼，可以使用 @see 或者 {@link } 的方式，在代碼中提供關聯代碼的快捷跳轉方式。

    /**
     * Sets the core number of threads.  This overrides any value set
     * in the constructor.  If the new value is smaller than the
     * current value, excess existing threads will be terminated when
     * they next become idle.  If larger, new threads will, if needed,
     * be started to execute any queued tasks.
     *
     * @param corePoolSize the new core size
     * @throws IllegalArgumentException if {@code corePoolSize < 0}
     *         or {@code corePoolSize} is greater than the {@linkplain
     *         #getMaximumPoolSize() maximum pool size}
     * @see #getCorePoolSize
     */
    public void setCorePoolSize(int corePoolSize) {
        if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
            throw new IllegalArgumentException();
        int delta = corePoolSize - this.corePoolSize;
        this.corePoolSize = corePoolSize;
        if (workerCountOf(ctl.get()) > corePoolSize)
            interruptIdleWorkers();
        else if (delta > 0) {
            // We don't really know how many new threads are "needed".
            // As a heuristic, prestart enough new workers (up to new
            // core size) to handle the current number of tasks in
            // queue, but stop if queue becomes empty while doing so.
            int k = Math.min(delta, workQueue.size());
            while (k-- > 0 && addWorker(null, true)) {
                if (workQueue.isEmpty())
                    break;
            }
        }
    }

（2）在日常業務開發中，非常推薦講相關的文檔、配置頁面鏈接也放到註釋中，極大方便後期維護。如：

    /**
     * 某某功能
     *
     * 相關文檔：
     * <a href="https://blog.csdn.net/w605283073" rel="external nofollow"  rel="external nofollow" >設計文檔</a>
     * <a href="https://blog.csdn.net/w605283073" rel="external nofollow"  rel="external nofollow" >三方API地址</a>
     */
    public void demo(){
        // 省略
    }

（4)對於工具類可以考慮講給出常見的輸入對應的輸出。如 org.apache.commons.lang3.StringUtils#center(java.lang.String, int, char)

 /**
     * <p>Centers a String in a larger String of size {@code size}.
     * Uses a supplied character as the value to pad the String with.</p>
     *
     * <p>If the size is less than the String length, the String is returned.
     * A {@code null} String returns {@code null}.
     * A negative size is treated as zero.</p>
     *
     * <pre>
     * StringUtils.center(null, *, *)     = null
     * StringUtils.center("", 4, ' ')     = "    "
     * StringUtils.center("ab", -1, ' ')  = "ab"
     * StringUtils.center("ab", 4, ' ')   = " ab "
     * StringUtils.center("abcd", 2, ' ') = "abcd"
     * StringUtils.center("a", 4, ' ')    = " a  "
     * StringUtils.center("a", 4, 'y')    = "yayy"
     * </pre>
     *
     * @param str  the String to center, may be null
     * @param size  the int size of new String, negative treated as zero
     * @param padChar  the character to pad the new String with
     * @return centered String, {@code null} if null String input
     * @since 2.0
     */
    public static String center(String str, final int size, final char padChar) {
        if (str == null || size <= 0) {
            return str;
        }
        final int strLen = str.length();
        final int pads = size - strLen;
        if (pads <= 0) {
            return str;
        }
        str = leftPad(str, strLen + pads / 2, padChar);
        str = rightPad(str, size, padChar);
        return str;
    }

（5) 對於廢棄的方法，一定要註明廢棄的原因，給出替代方案。如：java.security.Signature#setParameter(java.lang.String, java.lang.Object)

    /**
     * 省略部分
     * 
     * @see #getParameter
     *
     * @deprecated Use
     * {@link #setParameter(java.security.spec.AlgorithmParameterSpec)
     * setParameter}.
     */
    @Deprecated
    public final void setParameter(String param, Object value)
            throws InvalidParameterException {
        engineSetParameter(param, value);
    }

四、總結

很多優秀的開源項目的代碼設計都非常嚴謹，往往簡單的代碼中也蘊藏著縝密的思考。我們有時間可以看看一些優秀的開源項目，可以從簡單的入手，可以先想想如果自己寫大概該如何實現，然後和作者的實現思路對比，會有更大收獲。平時看源碼時，不僅要知道源碼長這樣，更要瞭解為什麼這麼設計。

以上就是FilenameUtils.getName 函數源碼分析的詳細內容，更多關於FilenameUtils.getName 函數的資料請關註WalkonNet其它相關文章！

FilenameUtils.getName 函數源碼分析

目錄

一、背景

二、源碼分析

2.1 問題1：為什麼需要 NonNul 檢查？

2.1.1 怎麼檢查的？

2.1.2 為什麼要做這個檢查呢？

2.2 問題2：為什麼不根據當前系統類型來獲取分隔符？

三、Zoom Out

3.1 代碼健壯性

3.2 代碼嚴謹性

3.3 如何寫註釋

四、總結

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

一、背景

二、源碼分析

2.1 問題1：為什麼需要 NonNul 檢查 ？

2.1.1 怎麼檢查的？

2.1.2 為什麼要做這個檢查呢？

2.2 問題2： 為什麼不根據當前系統類型來獲取分隔符？

三、Zoom Out

3.1 代碼健壯性

3.2 代碼嚴謹性

3.3 如何寫註釋

四、總結

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

2.1 問題1：為什麼需要 NonNul 檢查？

2.2 問題2：為什麼不根據當前系統類型來獲取分隔符？

發佈留言取消回覆