FilenameUtils.getName 函數源碼分析

一、背景

最近用到瞭 org.apache.commons.io.FilenameUtils#getName 這個方法,該方法可以傳入文件路徑,獲取文件名。 簡單看瞭下源碼,雖然並不復雜,但和自己設想略有區別,值得學習,本文簡單分析下。

二、源碼分析

org.apache.commons.io.FilenameUtils#getName

 /**
     * Gets the name minus the path from a full fileName.
     * <p>
     * This method will handle a file in either Unix or Windows format.
     * The text after the last forward or backslash is returned.
     * 
<pre>
     * a/b/c.txt --&gt; c.txt
     * a.txt     --&gt; a.txt
     * a/b/c     --&gt; c
     * a/b/c/    --&gt; ""
     * </pre>
* <p>
     * The output will be the same irrespective of the machine that the code is running on.
     *
     * @param fileName  the fileName to query, null returns null
     * @return the name of the file without the path, or an empty string if none exists.
     * Null bytes inside string will be removed
     */
    public static String getName(final String fileName) {
     // 傳入 null 直接返回 null 
        if (fileName == null) {
            return null;
        }
        // NonNul 檢查
        requireNonNullChars(fileName);
       //  查找最後一個分隔符
        final int index = indexOfLastSeparator(fileName);
     // 從最後一個分隔符竊到最後
        return fileName.substring(index + 1);
    }

2.1 問題1:為什麼需要 NonNul 檢查 ?

2.1.1 怎麼檢查的?

org.apache.commons.io.FilenameUtils#requireNonNullChars

   /**
     * Checks the input for null bytes, a sign of unsanitized data being passed to to file level functions.
     *
     * This may be used for poison byte attacks.
     *
     * @param path the path to check
     */
    private static void requireNonNullChars(final String path) {
        if (path.indexOf(0) >= 0) {
            throw new IllegalArgumentException("Null byte present in file/path name. There are no "
                + "known legitimate use cases for such data, but several injection attacks may use it");
        }
    }

java.lang.String#indexOf(int) 源碼:

 /**
     * Returns the index within this string of the first occurrence of
     * the specified character. If a character with value
     * {@code ch} occurs in the character sequence represented by
     * this {@code String} object, then the index (in Unicode
     * code units) of the first such occurrence is returned. For
     * values of {@code ch} in the range from 0 to 0xFFFF
     * (inclusive), this is the smallest value <i>k</i> such that:
     * <blockquote><pre>
     * this.charAt(<i>k</i>) == ch
     * </pre></blockquote>
     * is true. For other values of {@code ch}, it is the
     * smallest value <i>k</i> such that:
     * <blockquote><pre>
     * this.codePointAt(<i>k</i>) == ch
     * </pre></blockquote>
     * is true. In either case, if no such character occurs in this
     * string, then {@code -1} is returned.
     *
     * @param   ch   a character (Unicode code point).
     * @return  the index of the first occurrence of the character in the
     *          character sequence represented by this object, or
     *          {@code -1} if the character does not occur.
     */
    public int indexOf(int ch) {
        return indexOf(ch, 0);
    }

可知,indexOf(0) 目的是查找 ASCII 碼為 0 的字符的位置,如果找到則拋出 IllegalArgumentException異常。 搜索 ASCII 對照表,得知 ASCII 值為 0 代表控制字符 NUT,並不是常規的文件名所應該包含的字符。

2.1.2 為什麼要做這個檢查呢?

null 字節是一個值為 0 的字節,如十六進制中的 0x00。 存在與 null 字節有關的安全漏洞。 因為 C 語言中使用 null 字節作為字符串終結符,而其他語言(Java,PHP等)沒有這個字符串終結符; 例如,Java Web 項目隻允許用戶上傳 .jpg 格式的圖片,但利用這個漏洞就可以上傳 .jsp 文件。 如用戶上傳 hack.jsp<NUL>.jpg 文件, Java 會認為符合 .jpg 格式,實際調用 C 語言系統函數寫入磁盤時講 當做字符串分隔符,結果將文件保存為 hack.jsp。 有些編程語言不允許在文件名中使用 ·· <NUL>,如果你使用的編程語言沒有對此處理,就需要自己去處理。 因此,這個檢查很有必要。

代碼示例:

package org.example;
import org.apache.commons.io.FilenameUtils;
public class FilenameDemo {
    public static void main(String[] args) {
        String filename= "hack.jsp\0.jpg";
        System.out.println( FilenameUtils.getName(filename));
    }
}

報錯信息:

Exception in thread "main" java.lang.IllegalArgumentException: Null byte present in file/path name. There are no known legitimate use cases for such data, but several injection attacks may use it
    at org.apache.commons.io.FilenameUtils.requireNonNullChars(FilenameUtils.java:998)
    at org.apache.commons.io.FilenameUtils.getName(FilenameUtils.java:984)
    at org.example.FilenameDemo.main(FilenameDemo.java:8)

如果去掉校驗:

package org.example;
import org.apache.commons.io.FilenameUtils;
public class FilenameDemo {
    public static void main(String[] args) {
        String filename= "hack.jsp\0.jpg";
        // 不添加校驗
        String name = getName(filename);
        // 獲取拓展名
        String extension = FilenameUtils.getExtension(name);
        System.out.println(extension);
    }
    public static String getName(final String fileName) {
        if (fileName == null) {
            return null;
        }
        final int index = FilenameUtils.indexOfLastSeparator(fileName);
        return fileName.substring(index + 1);
    }
}

Java 的確會將拓展名識別為 jpg

jpg

JDK 8 及其以上版本試圖創建 hack.jsp\0.jpg 的文件時,底層也會做類似的校驗,無法創建成功。

大傢感興趣可以試試使用 C 語言寫入名為 hack.jsp\0.jpg 的文件,最終很可能文件名為 hack.jsp

2.2 問題2: 為什麼不根據當前系統類型來獲取分隔符?

查找最後一個分隔符 org.apache.commons.io.FilenameUtils#indexOfLastSeparator

 /**
     * Returns the index of the last directory separator character.
     * <p>
     * This method will handle a file in either Unix or Windows format.
     * The position of the last forward or backslash is returned.
     * <p>
     * The output will be the same irrespective of the machine that the code is running on.
     *
     * @param fileName  the fileName to find the last path separator in, null returns -1
     * @return the index of the last separator character, or -1 if there
     * is no such character
     */
    public static int indexOfLastSeparator(final String fileName) {
        if (fileName == null) {
            return NOT_FOUND;
        }
        final int lastUnixPos = fileName.lastIndexOf(UNIX_SEPARATOR);
        final int lastWindowsPos = fileName.lastIndexOf(WINDOWS_SEPARATOR);
        return Math.max(lastUnixPos, lastWindowsPos);
    }

該方法的語義是獲取文件名,那麼從函數的語義層面上來說,不管是啥系統的文件分隔符都必須要保證得到正確的文件名。 試想一下,在 Windows 系統上調用該函數,傳入一個 Unix 文件路徑,得不到正確的文件名合理嗎? 函數設計本身就應該考慮兼容性。 因此不能獲取當前系統的分隔符來截取文件名。 源碼中分別獲取 Window 和 Unix 分隔符,有哪個用哪個,顯然更加合理。

三、Zoom Out

3.1 代碼健壯性

我們日常編碼時,要做防禦性編程,對於錯誤的、非法的輸入都要做好預防。

3.2 代碼嚴謹性

我們寫代碼一定不要想當然。 我們先想清楚這個函數究竟要實現怎樣的功能,而且不是做一個 “CV 工程師”,無腦“拷貝”代碼。 同時,我們也應該寫好單測,充分考慮各種異常 Case ,保證正常和異常的 Case 都覆蓋到。

3.3 如何寫註釋

org.apache.commons.io.FilenameUtils#requireNonNullChars 函數註釋部分就給出瞭這麼設計的原因:This may be used for poison byte attacks.

註釋不應該“喃喃自語”講一些顯而易見的廢話。 對於容易讓人困惑的設計,一定要通過註釋講清楚設計原因。

此外,結合工作經驗,推薦一些其他註釋技巧: (1)對於稍微復雜或者重要的設計,可以通過註釋給出核心的設計思路; 如: java.util.concurrent.ThreadPoolExecutor#execute

    /**
     * Executes the given task sometime in the future.  The task
     * may execute in a new thread or in an existing pooled thread.
     *
     * If the task cannot be submitted for execution, either because this
     * executor has been shutdown or because its capacity has been reached,
     * the task is handled by the current {@link RejectedExecutionHandler}.
     *
     * @param command the task to execute
     * @throws RejectedExecutionException at discretion of
     *         {@code RejectedExecutionHandler}, if the task
     *         cannot be accepted for execution
     * @throws NullPointerException if {@code command} is null
     */
    public void execute(Runnable command) {
        if (command == null)
            throw new NullPointerException();
        /*
         * Proceed in 3 steps:
         *
         * 1. If fewer than corePoolSize threads are running, try to
         * start a new thread with the given command as its first
         * task.  The call to addWorker atomically checks runState and
         * workerCount, and so prevents false alarms that would add
         * threads when it shouldn't, by returning false.
         *
         * 2. If a task can be successfully queued, then we still need
         * to double-check whether we should have added a thread
         * (because existing ones died since last checking) or that
         * the pool shut down since entry into this method. So we
         * recheck state and if necessary roll back the enqueuing if
         * stopped, or start a new thread if there are none.
         *
         * 3. If we cannot queue task, then we try to add a new
         * thread.  If it fails, we know we are shut down or saturated
         * and so reject the task.
         */
        int c = ctl.get();
        if (workerCountOf(c) < corePoolSize) {
            if (addWorker(command, true))
                return;
            c = ctl.get();
        }
        if (isRunning(c) && workQueue.offer(command)) {
            int recheck = ctl.get();
            if (! isRunning(recheck) && remove(command))
                reject(command);
            else if (workerCountOf(recheck) == 0)
                addWorker(null, false);
        }
        else if (!addWorker(command, false))
            reject(command);
    }

(2)對於關聯的代碼,可以使用 @see 或者 {@link } 的方式,在代碼中提供關聯代碼的快捷跳轉方式。

    /**
     * Sets the core number of threads.  This overrides any value set
     * in the constructor.  If the new value is smaller than the
     * current value, excess existing threads will be terminated when
     * they next become idle.  If larger, new threads will, if needed,
     * be started to execute any queued tasks.
     *
     * @param corePoolSize the new core size
     * @throws IllegalArgumentException if {@code corePoolSize < 0}
     *         or {@code corePoolSize} is greater than the {@linkplain
     *         #getMaximumPoolSize() maximum pool size}
     * @see #getCorePoolSize
     */
    public void setCorePoolSize(int corePoolSize) {
        if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
            throw new IllegalArgumentException();
        int delta = corePoolSize - this.corePoolSize;
        this.corePoolSize = corePoolSize;
        if (workerCountOf(ctl.get()) > corePoolSize)
            interruptIdleWorkers();
        else if (delta > 0) {
            // We don't really know how many new threads are "needed".
            // As a heuristic, prestart enough new workers (up to new
            // core size) to handle the current number of tasks in
            // queue, but stop if queue becomes empty while doing so.
            int k = Math.min(delta, workQueue.size());
            while (k-- > 0 && addWorker(null, true)) {
                if (workQueue.isEmpty())
                    break;
            }
        }
    }

(2)在日常業務開發中,非常推薦講相關的文檔、配置頁面鏈接也放到註釋中,極大方便後期維護。 如:

    /**
     * 某某功能
     *
     * 相關文檔:
     * <a href="https://blog.csdn.net/w605283073" rel="external nofollow"  rel="external nofollow" >設計文檔</a>
     * <a href="https://blog.csdn.net/w605283073" rel="external nofollow"  rel="external nofollow" >三方API地址</a>
     */
    public void demo(){
        // 省略
    }

(4)對於工具類可以考慮講給出常見的輸入對應的輸出。 如 org.apache.commons.lang3.StringUtils#center(java.lang.String, int, char)

 /**
     * <p>Centers a String in a larger String of size {@code size}.
     * Uses a supplied character as the value to pad the String with.</p>
     *
     * <p>If the size is less than the String length, the String is returned.
     * A {@code null} String returns {@code null}.
     * A negative size is treated as zero.</p>
     *
     * <pre>
     * StringUtils.center(null, *, *)     = null
     * StringUtils.center("", 4, ' ')     = "    "
     * StringUtils.center("ab", -1, ' ')  = "ab"
     * StringUtils.center("ab", 4, ' ')   = " ab "
     * StringUtils.center("abcd", 2, ' ') = "abcd"
     * StringUtils.center("a", 4, ' ')    = " a  "
     * StringUtils.center("a", 4, 'y')    = "yayy"
     * </pre>
     *
     * @param str  the String to center, may be null
     * @param size  the int size of new String, negative treated as zero
     * @param padChar  the character to pad the new String with
     * @return centered String, {@code null} if null String input
     * @since 2.0
     */
    public static String center(String str, final int size, final char padChar) {
        if (str == null || size <= 0) {
            return str;
        }
        final int strLen = str.length();
        final int pads = size - strLen;
        if (pads <= 0) {
            return str;
        }
        str = leftPad(str, strLen + pads / 2, padChar);
        str = rightPad(str, size, padChar);
        return str;
    }

(5) 對於廢棄的方法,一定要註明廢棄的原因,給出替代方案。 如:java.security.Signature#setParameter(java.lang.String, java.lang.Object)

    /**
     * 省略部分
     * 
     * @see #getParameter
     *
     * @deprecated Use
     * {@link #setParameter(java.security.spec.AlgorithmParameterSpec)
     * setParameter}.
     */
    @Deprecated
    public final void setParameter(String param, Object value)
            throws InvalidParameterException {
        engineSetParameter(param, value);
    }

四、總結

很多優秀的開源項目的代碼設計都非常嚴謹,往往簡單的代碼中也蘊藏著縝密的思考。 我們有時間可以看看一些優秀的開源項目,可以從簡單的入手,可以先想想如果自己寫大概該如何實現,然後和作者的實現思路對比,會有更大收獲。 平時看源碼時,不僅要知道源碼長這樣,更要瞭解為什麼這麼設計。

以上就是FilenameUtils.getName 函數源碼分析的詳細內容,更多關於FilenameUtils.getName 函數的資料請關註WalkonNet其它相關文章!

推薦閱讀: