Class RawParseUtils


  • public final class RawParseUtils
    extends Object
    Handy utility functions to parse raw object contents.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static int author​(byte[] b, int ptr)
      Locate the "author " header line data.
      static int commitMessage​(byte[] b, int ptr)
      Locate the position of the commit message body.
      static int committer​(byte[] b, int ptr)
      Locate the "committer " header line data.
      static String decode​(byte[] buffer)
      Decode a buffer under UTF-8, if possible.
      static String decode​(byte[] buffer, int start, int end)
      Decode a buffer under UTF-8, if possible.
      static String decode​(Charset cs, byte[] buffer)
      Decode a buffer under the specified character set if possible.
      static String decode​(Charset cs, byte[] buffer, int start, int end)
      Decode a region of the buffer under the specified character set if possible.
      static String decodeNoFallback​(Charset cs, byte[] buffer, int start, int end)
      Decode a region of the buffer under the specified character set if possible.
      static int encoding​(byte[] b, int ptr)
      Locate the "encoding " header line.
      static int endOfFooterLineKey​(byte[] raw, int ptr)
      Locate the end of a footer line key string.
      static int endOfParagraph​(byte[] b, int start)
      Locate the end of a paragraph.
      static String extractBinaryString​(byte[] buffer, int start, int end)
      Decode a region of the buffer under the ISO-8859-1 encoding.
      static int formatBase10​(byte[] b, int o, int value)
      Format a base 10 numeric into a temporary buffer.
      static int headerEnd​(byte[] b, int ptr)
      Locate the end of the header.
      static int headerStart​(byte[] headerName, byte[] b, int ptr)
      Find the start of the contents of a given header.
      static int lastIndexOfTrim​(byte[] raw, char ch, int pos)
      Get last index of ch in raw, trimming spaces.
      static IntList lineMap​(byte[] buf, int ptr, int end)
      Index the region between [ptr, end) to find line starts.
      static IntList lineMapOrBinary​(byte[] buf, int ptr, int end)
      Like lineMap(byte[], int, int) but throw BinaryBlobException if a NUL byte is encountered.
      static int match​(byte[] b, int ptr, byte[] src)
      Determine if b[ptr] matches src.
      static int next​(byte[] b, int ptr, char chrA)
      Locate the first position after a given character.
      static int nextLF​(byte[] b, int ptr)
      Locate the first position after the next LF.
      static int nextLF​(byte[] b, int ptr, char chrA)
      Locate the first position after either the given character or LF.
      static int parseBase10​(byte[] b, int ptr, MutableInteger ptrResult)
      Parse a base 10 numeric from a sequence of ASCII digits into an int.
      static Charset parseEncoding​(byte[] b)
      Parse the "encoding " header into a character set reference.
      static String parseEncodingName​(byte[] b)
      Parse the "encoding " header as a string.
      static int parseHexInt16​(byte[] bs, int p)
      Parse 4 character base 16 (hex) formatted string to unsigned integer.
      static int parseHexInt32​(byte[] bs, int p)
      Parse 8 character base 16 (hex) formatted string to unsigned integer.
      static int parseHexInt4​(byte digit)
      Parse a single hex digit to its numeric value (0-15).
      static long parseHexInt64​(byte[] bs, int p)
      Parse 16 character base 16 (hex) formatted string to unsigned long.
      static long parseLongBase10​(byte[] b, int ptr, MutableInteger ptrResult)
      Parse a base 10 numeric from a sequence of ASCII digits into a long.
      static PersonIdent parsePersonIdent​(byte[] raw, int nameB)
      Parse a name line (e.g.
      static PersonIdent parsePersonIdent​(String in)
      Parse a name string (e.g.
      static PersonIdent parsePersonIdentOnly​(byte[] raw, int nameB)
      Parse a name data (e.g.
      static int parseTimeZoneOffset​(byte[] b, int ptr)
      Parse a Git style timezone string.
      static int parseTimeZoneOffset​(byte[] b, int ptr, MutableInteger ptrResult)
      Parse a Git style timezone string.
      static int prev​(byte[] b, int ptr, char chrA)
      Locate the first position before a given character.
      static int prevLF​(byte[] b, int ptr)
      Locate the first position before the previous LF.
      static int prevLF​(byte[] b, int ptr, char chrA)
      Locate the previous position before either the given character or LF.
      static int tagger​(byte[] b, int ptr)
      Locate the "tagger " header line data.
      static int tagMessage​(byte[] b, int ptr)
      Locate the position of the tag message body.
    • Method Detail

      • match

        public static final int match​(byte[] b,
                                      int ptr,
                                      byte[] src)
        Determine if b[ptr] matches src.
        Parameters:
        b - the buffer to scan.
        ptr - first position within b, this should match src[0].
        src - the buffer to test for equality with b.
        Returns:
        ptr + src.length if b[ptr..src.length] == src; else -1.
      • formatBase10

        public static int formatBase10​(byte[] b,
                                       int o,
                                       int value)
        Format a base 10 numeric into a temporary buffer.

        Formatting is performed backwards. The method starts at offset o-1 and ends at o-1-digits, where digits is the number of positions necessary to store the base 10 value.

        The argument and return values from this method make it easy to chain writing, for example:

         final byte[] tmp = new byte[64];
         int ptr = tmp.length;
         tmp[--ptr] = '\n';
         ptr = RawParseUtils.formatBase10(tmp, ptr, 32);
         tmp[--ptr] = ' ';
         ptr = RawParseUtils.formatBase10(tmp, ptr, 18);
         tmp[--ptr] = 0;
         final String str = new String(tmp, ptr, tmp.length - ptr);
         
        Parameters:
        b - buffer to write into.
        o - one offset past the location where writing will begin; writing proceeds towards lower index values.
        value - the value to store.
        Returns:
        the new offset value o. This is the position of the last byte written. Additional writing should start at one position earlier.
      • parseBase10

        public static final int parseBase10​(byte[] b,
                                            int ptr,
                                            MutableInteger ptrResult)
        Parse a base 10 numeric from a sequence of ASCII digits into an int.

        Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.

        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start parsing digits at.
        ptrResult - optional location to return the new ptr value through. If null the ptr value will be discarded.
        Returns:
        the value at this location; 0 if the location is not a valid numeric.
      • parseLongBase10

        public static final long parseLongBase10​(byte[] b,
                                                 int ptr,
                                                 MutableInteger ptrResult)
        Parse a base 10 numeric from a sequence of ASCII digits into a long.

        Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.

        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start parsing digits at.
        ptrResult - optional location to return the new ptr value through. If null the ptr value will be discarded.
        Returns:
        the value at this location; 0 if the location is not a valid numeric.
      • parseHexInt16

        public static final int parseHexInt16​(byte[] bs,
                                              int p)
        Parse 4 character base 16 (hex) formatted string to unsigned integer.

        The number is read in network byte order, that is, most significant nybble first.

        Parameters:
        bs - buffer to parse digits from; positions [p, p+4) will be parsed.
        p - first position within the buffer to parse.
        Returns:
        the integer value.
        Throws:
        ArrayIndexOutOfBoundsException - if the string is not hex formatted.
      • parseHexInt32

        public static final int parseHexInt32​(byte[] bs,
                                              int p)
        Parse 8 character base 16 (hex) formatted string to unsigned integer.

        The number is read in network byte order, that is, most significant nybble first.

        Parameters:
        bs - buffer to parse digits from; positions [p, p+8) will be parsed.
        p - first position within the buffer to parse.
        Returns:
        the integer value.
        Throws:
        ArrayIndexOutOfBoundsException - if the string is not hex formatted.
      • parseHexInt64

        public static final long parseHexInt64​(byte[] bs,
                                               int p)
        Parse 16 character base 16 (hex) formatted string to unsigned long.

        The number is read in network byte order, that is, most significant nibble first.

        Parameters:
        bs - buffer to parse digits from; positions [p, p+16) will be parsed.
        p - first position within the buffer to parse.
        Returns:
        the integer value.
        Throws:
        ArrayIndexOutOfBoundsException - if the string is not hex formatted.
        Since:
        4.3
      • parseHexInt4

        public static final int parseHexInt4​(byte digit)
        Parse a single hex digit to its numeric value (0-15).
        Parameters:
        digit - hex character to parse.
        Returns:
        numeric value, in the range 0-15.
        Throws:
        ArrayIndexOutOfBoundsException - if the input digit is not a valid hex digit.
      • parseTimeZoneOffset

        public static final int parseTimeZoneOffset​(byte[] b,
                                                    int ptr)
        Parse a Git style timezone string.

        The sequence "-0315" will be parsed as the numeric value -195, as the lower two positions count minutes, not 100ths of an hour.

        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start parsing digits at.
        Returns:
        the timezone at this location, expressed in minutes.
      • parseTimeZoneOffset

        public static final int parseTimeZoneOffset​(byte[] b,
                                                    int ptr,
                                                    MutableInteger ptrResult)
        Parse a Git style timezone string.

        The sequence "-0315" will be parsed as the numeric value -195, as the lower two positions count minutes, not 100ths of an hour.

        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start parsing digits at.
        ptrResult - optional location to return the new ptr value through. If null the ptr value will be discarded.
        Returns:
        the timezone at this location, expressed in minutes.
        Since:
        4.1
      • next

        public static final int next​(byte[] b,
                                     int ptr,
                                     char chrA)
        Locate the first position after a given character.
        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start looking for chrA at.
        chrA - character to find.
        Returns:
        new position just after chrA.
      • nextLF

        public static final int nextLF​(byte[] b,
                                       int ptr)
        Locate the first position after the next LF.

        This method stops on the first '\n' it finds.

        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start looking for LF at.
        Returns:
        new position just after the first LF found.
      • nextLF

        public static final int nextLF​(byte[] b,
                                       int ptr,
                                       char chrA)
        Locate the first position after either the given character or LF.

        This method stops on the first match it finds from either chrA or '\n'.

        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start looking for chrA or LF at.
        chrA - character to find.
        Returns:
        new position just after the first chrA or LF to be found.
      • headerEnd

        public static final int headerEnd​(byte[] b,
                                          int ptr)
        Locate the end of the header. Note that headers may be more than one line long.
        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start looking for the end-of-header.
        Returns:
        new position just after the header. This is either b.length, or the index of the header's terminating newline.
        Since:
        5.1
      • headerStart

        public static final int headerStart​(byte[] headerName,
                                            byte[] b,
                                            int ptr)
        Find the start of the contents of a given header.
        Parameters:
        b - buffer to scan.
        headerName - header to search for
        ptr - position within buffer to start looking for header at.
        Returns:
        new position at the start of the header's contents, -1 for not found
        Since:
        5.1
      • prev

        public static final int prev​(byte[] b,
                                     int ptr,
                                     char chrA)
        Locate the first position before a given character.
        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start looking for chrA at.
        chrA - character to find.
        Returns:
        new position just before chrA, -1 for not found
      • prevLF

        public static final int prevLF​(byte[] b,
                                       int ptr)
        Locate the first position before the previous LF.

        This method stops on the first '\n' it finds.

        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start looking for LF at.
        Returns:
        new position just before the first LF found, -1 for not found
      • prevLF

        public static final int prevLF​(byte[] b,
                                       int ptr,
                                       char chrA)
        Locate the previous position before either the given character or LF.

        This method stops on the first match it finds from either chrA or '\n'.

        Parameters:
        b - buffer to scan.
        ptr - position within buffer to start looking for chrA or LF at.
        chrA - character to find.
        Returns:
        new position just before the first chrA or LF to be found, -1 for not found
      • lineMap

        public static final IntList lineMap​(byte[] buf,
                                            int ptr,
                                            int end)
        Index the region between [ptr, end) to find line starts.

        The returned list is 1 indexed. Index 0 contains Integer.MIN_VALUE to pad the list out.

        Using a 1 indexed list means that line numbers can be directly accessed from the list, so list.get(1) (aka get line 1) returns ptr.

        The last element (index map.size()-1) always contains end.

        Parameters:
        buf - buffer to scan.
        ptr - position within the buffer corresponding to the first byte of line 1.
        end - 1 past the end of the content within buf.
        Returns:
        a line map indicating the starting position of each line.
      • lineMapOrBinary

        public static final IntList lineMapOrBinary​(byte[] buf,
                                                    int ptr,
                                                    int end)
                                             throws BinaryBlobException
        Like lineMap(byte[], int, int) but throw BinaryBlobException if a NUL byte is encountered.
        Parameters:
        buf - buffer to scan.
        ptr - position within the buffer corresponding to the first byte of line 1.
        end - 1 past the end of the content within buf.
        Returns:
        a line map indicating the starting position of each line.
        Throws:
        BinaryBlobException - if a NUL byte or a lone CR is found.
        Since:
        5.0
      • author

        public static final int author​(byte[] b,
                                       int ptr)
        Locate the "author " header line data.
        Parameters:
        b - buffer to scan.
        ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer and does not accidentally look at message body.
        Returns:
        position just after the space in "author ", so the first character of the author's name. If no author header can be located -1 is returned.
      • committer

        public static final int committer​(byte[] b,
                                          int ptr)
        Locate the "committer " header line data.
        Parameters:
        b - buffer to scan.
        ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer and does not accidentally look at message body.
        Returns:
        position just after the space in "committer ", so the first character of the committer's name. If no committer header can be located -1 is returned.
      • tagger

        public static final int tagger​(byte[] b,
                                       int ptr)
        Locate the "tagger " header line data.
        Parameters:
        b - buffer to scan.
        ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the tag buffer and does not accidentally look at message body.
        Returns:
        position just after the space in "tagger ", so the first character of the tagger's name. If no tagger header can be located -1 is returned.
      • encoding

        public static final int encoding​(byte[] b,
                                         int ptr)
        Locate the "encoding " header line.
        Parameters:
        b - buffer to scan.
        ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the buffer and does not accidentally look at the message body.
        Returns:
        position just after the space in "encoding ", so the first character of the encoding's name. If no encoding header can be located -1 is returned (and UTF-8 should be assumed).
      • parseEncodingName

        @Nullable
        public static String parseEncodingName​(byte[] b)
        Parse the "encoding " header as a string.

        Locates the "encoding " header (if present) and returns its value.

        Parameters:
        b - buffer to scan.
        Returns:
        the encoding header as specified in the commit; null if the header was not present and should be assumed.
        Since:
        4.2
      • parseEncoding

        public static Charset parseEncoding​(byte[] b)
        Parse the "encoding " header into a character set reference.

        Locates the "encoding " header (if present) by first calling encoding(byte[], int) and then returns the proper character set to apply to this buffer to evaluate its contents as character data.

        If no encoding header is present UTF-8 is assumed.

        Parameters:
        b - buffer to scan.
        Returns:
        the Java character set representation. Never null.
        Throws:
        IllegalCharsetNameException - if the character set requested by the encoding header is malformed and unsupportable.
        UnsupportedCharsetException - if the JRE does not support the character set requested by the encoding header.
      • parsePersonIdent

        public static PersonIdent parsePersonIdent​(String in)
        Parse a name string (e.g. author, committer, tagger) into a PersonIdent.

        Leading spaces won't be trimmed from the string, i.e. will show up in the parsed name afterwards.

        Parameters:
        in - the string to parse a name from.
        Returns:
        the parsed identity or null in case the identity could not be parsed.
      • parsePersonIdent

        public static PersonIdent parsePersonIdent​(byte[] raw,
                                                   int nameB)
        Parse a name line (e.g. author, committer, tagger) into a PersonIdent.

        When passing in a value for nameB callers should use the return value of author(byte[], int) or committer(byte[], int), as these methods provide the proper position within the buffer.

        Parameters:
        raw - the buffer to parse character data from.
        nameB - first position of the identity information. This should be the first position after the space which delimits the header field name (e.g. "author" or "committer") from the rest of the identity line.
        Returns:
        the parsed identity or null in case the identity could not be parsed.
      • parsePersonIdentOnly

        public static PersonIdent parsePersonIdentOnly​(byte[] raw,
                                                       int nameB)
        Parse a name data (e.g. as within a reflog) into a PersonIdent.

        When passing in a value for nameB callers should use the return value of author(byte[], int) or committer(byte[], int), as these methods provide the proper position within the buffer.

        Parameters:
        raw - the buffer to parse character data from.
        nameB - first position of the identity information. This should be the first position after the space which delimits the header field name (e.g. "author" or "committer") from the rest of the identity line.
        Returns:
        the parsed identity. Never null.
      • endOfFooterLineKey

        public static int endOfFooterLineKey​(byte[] raw,
                                             int ptr)
        Locate the end of a footer line key string.

        If the region at raw[ptr] matches ^[A-Za-z0-9-]+: (e.g. "Signed-off-by: A. U. Thor\n") then this method returns the position of the first ':'.

        If the region at raw[ptr] does not match ^[A-Za-z0-9-]+: then this method returns -1.

        Parameters:
        raw - buffer to scan.
        ptr - first position within raw to consider as a footer line key.
        Returns:
        position of the ':' which terminates the footer line key if this is otherwise a valid footer line key; otherwise -1.
      • decode

        public static String decode​(byte[] buffer)
        Decode a buffer under UTF-8, if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.
        Parameters:
        buffer - buffer to pull raw bytes from.
        Returns:
        a string representation of the range [start,end), after decoding the region through the specified character set.
      • decode

        public static String decode​(byte[] buffer,
                                    int start,
                                    int end)
        Decode a buffer under UTF-8, if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.
        Parameters:
        buffer - buffer to pull raw bytes from.
        start - start position in buffer
        end - one position past the last location within the buffer to take data from.
        Returns:
        a string representation of the range [start,end), after decoding the region through the specified character set.
      • decode

        public static String decode​(Charset cs,
                                    byte[] buffer)
        Decode a buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.
        Parameters:
        cs - character set to use when decoding the buffer.
        buffer - buffer to pull raw bytes from.
        Returns:
        a string representation of the range [start,end), after decoding the region through the specified character set.
      • decode

        public static String decode​(Charset cs,
                                    byte[] buffer,
                                    int start,
                                    int end)
        Decode a region of the buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.
        Parameters:
        cs - character set to use when decoding the buffer.
        buffer - buffer to pull raw bytes from.
        start - first position within the buffer to take data from.
        end - one position past the last location within the buffer to take data from.
        Returns:
        a string representation of the range [start,end), after decoding the region through the specified character set.
      • decodeNoFallback

        public static String decodeNoFallback​(Charset cs,
                                              byte[] buffer,
                                              int start,
                                              int end)
                                       throws CharacterCodingException
        Decode a region of the buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, an exception is thrown.
        Parameters:
        cs - character set to use when decoding the buffer.
        buffer - buffer to pull raw bytes from.
        start - first position within the buffer to take data from.
        end - one position past the last location within the buffer to take data from.
        Returns:
        a string representation of the range [start,end), after decoding the region through the specified character set.
        Throws:
        CharacterCodingException - the input is not in any of the tested character sets.
      • extractBinaryString

        public static String extractBinaryString​(byte[] buffer,
                                                 int start,
                                                 int end)
        Decode a region of the buffer under the ISO-8859-1 encoding. Each byte is treated as a single character in the 8859-1 character encoding, performing a raw binary->char conversion.
        Parameters:
        buffer - buffer to pull raw bytes from.
        start - first position within the buffer to take data from.
        end - one position past the last location within the buffer to take data from.
        Returns:
        a string representation of the range [start,end).
      • commitMessage

        public static final int commitMessage​(byte[] b,
                                              int ptr)
        Locate the position of the commit message body.
        Parameters:
        b - buffer to scan.
        ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer.
        Returns:
        position of the user's message buffer.
      • tagMessage

        public static final int tagMessage​(byte[] b,
                                           int ptr)
        Locate the position of the tag message body.
        Parameters:
        b - buffer to scan.
        ptr - position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the tag buffer.
        Returns:
        position of the user's message buffer.
      • endOfParagraph

        public static final int endOfParagraph​(byte[] b,
                                               int start)
        Locate the end of a paragraph.

        A paragraph is ended by two consecutive LF bytes or CRLF pairs

        Parameters:
        b - buffer to scan.
        start - position in buffer to start the scan at. Most callers will want to pass the first position of the commit message (as found by commitMessage(byte[], int).
        Returns:
        position of the LF at the end of the paragraph; b.length if no paragraph end could be located.
      • lastIndexOfTrim

        public static int lastIndexOfTrim​(byte[] raw,
                                          char ch,
                                          int pos)
        Get last index of ch in raw, trimming spaces.
        Parameters:
        raw - buffer to scan.
        ch - character to find.
        pos - starting position.
        Returns:
        last index of ch in raw, trimming spaces.
        Since:
        4.1