Class RawParseUtils
- java.lang.Object
-
- org.eclipse.jgit.util.RawParseUtils
-
public final class RawParseUtils extends Object
Handy utility functions to parse raw object contents.
-
-
Field Summary
Fields Modifier and Type Field Description static Charset
UTF8_CHARSET
Deprecated.useStandardCharsets.UTF_8
instead
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
author(byte[] b, int ptr)
Locate the "author " header line data.static int
commitMessage(byte[] b, int ptr)
Locate the position of the commit message body.static int
committer(byte[] b, int ptr)
Locate the "committer " header line data.static String
decode(byte[] buffer)
Decode a buffer under UTF-8, if possible.static String
decode(byte[] buffer, int start, int end)
Decode a buffer under UTF-8, if possible.static String
decode(Charset cs, byte[] buffer)
Decode a buffer under the specified character set if possible.static String
decode(Charset cs, byte[] buffer, int start, int end)
Decode a region of the buffer under the specified character set if possible.static String
decodeNoFallback(Charset cs, byte[] buffer, int start, int end)
Decode a region of the buffer under the specified character set if possible.static int
encoding(byte[] b, int ptr)
Locate the "encoding " header line.static int
endOfFooterLineKey(byte[] raw, int ptr)
Locate the end of a footer line key string.static int
endOfParagraph(byte[] b, int start)
Locate the end of a paragraph.static String
extractBinaryString(byte[] buffer, int start, int end)
Decode a region of the buffer under the ISO-8859-1 encoding.static int
formatBase10(byte[] b, int o, int value)
Format a base 10 numeric into a temporary buffer.static int
headerEnd(byte[] b, int ptr)
Locate the end of the header.static int
headerStart(byte[] headerName, byte[] b, int ptr)
Find the start of the contents of a given header.static int
lastIndexOfTrim(byte[] raw, char ch, int pos)
Get last index ofch
in raw, trimming spaces.static IntList
lineMap(byte[] buf, int ptr, int end)
Index the region between[ptr, end)
to find line starts.static IntList
lineMapOrBinary(byte[] buf, int ptr, int end)
LikelineMap(byte[], int, int)
but throwBinaryBlobException
if a NUL byte is encountered.static int
match(byte[] b, int ptr, byte[] src)
Determine if b[ptr] matches src.static int
next(byte[] b, int ptr, char chrA)
Locate the first position after a given character.static int
nextLF(byte[] b, int ptr)
Locate the first position after the next LF.static int
nextLF(byte[] b, int ptr, char chrA)
Locate the first position after either the given character or LF.static int
parseBase10(byte[] b, int ptr, MutableInteger ptrResult)
Parse a base 10 numeric from a sequence of ASCII digits into an int.static Charset
parseEncoding(byte[] b)
Parse the "encoding " header into a character set reference.static String
parseEncodingName(byte[] b)
Parse the "encoding " header as a string.static int
parseHexInt16(byte[] bs, int p)
Parse 4 character base 16 (hex) formatted string to unsigned integer.static int
parseHexInt32(byte[] bs, int p)
Parse 8 character base 16 (hex) formatted string to unsigned integer.static int
parseHexInt4(byte digit)
Parse a single hex digit to its numeric value (0-15).static long
parseHexInt64(byte[] bs, int p)
Parse 16 character base 16 (hex) formatted string to unsigned long.static long
parseLongBase10(byte[] b, int ptr, MutableInteger ptrResult)
Parse a base 10 numeric from a sequence of ASCII digits into a long.static PersonIdent
parsePersonIdent(byte[] raw, int nameB)
Parse a name line (e.g.static PersonIdent
parsePersonIdent(String in)
Parse a name string (e.g.static PersonIdent
parsePersonIdentOnly(byte[] raw, int nameB)
Parse a name data (e.g.static int
parseTimeZoneOffset(byte[] b, int ptr)
Parse a Git style timezone string.static int
parseTimeZoneOffset(byte[] b, int ptr, MutableInteger ptrResult)
Parse a Git style timezone string.static int
prev(byte[] b, int ptr, char chrA)
Locate the first position before a given character.static int
prevLF(byte[] b, int ptr)
Locate the first position before the previous LF.static int
prevLF(byte[] b, int ptr, char chrA)
Locate the previous position before either the given character or LF.static int
tagger(byte[] b, int ptr)
Locate the "tagger " header line data.static int
tagMessage(byte[] b, int ptr)
Locate the position of the tag message body.
-
-
-
Field Detail
-
UTF8_CHARSET
@Deprecated public static final Charset UTF8_CHARSET
Deprecated.useStandardCharsets.UTF_8
insteadUTF-8 charset constant.- Since:
- 2.2
-
-
Method Detail
-
match
public static final int match(byte[] b, int ptr, byte[] src)
Determine if b[ptr] matches src.- Parameters:
b
- the buffer to scan.ptr
- first position within b, this should match src[0].src
- the buffer to test for equality with b.- Returns:
- ptr + src.length if b[ptr..src.length] == src; else -1.
-
formatBase10
public static int formatBase10(byte[] b, int o, int value)
Format a base 10 numeric into a temporary buffer.Formatting is performed backwards. The method starts at offset
o-1
and ends ato-1-digits
, wheredigits
is the number of positions necessary to store the base 10 value.The argument and return values from this method make it easy to chain writing, for example:
final byte[] tmp = new byte[64]; int ptr = tmp.length; tmp[--ptr] = '\n'; ptr = RawParseUtils.formatBase10(tmp, ptr, 32); tmp[--ptr] = ' '; ptr = RawParseUtils.formatBase10(tmp, ptr, 18); tmp[--ptr] = 0; final String str = new String(tmp, ptr, tmp.length - ptr);
- Parameters:
b
- buffer to write into.o
- one offset past the location where writing will begin; writing proceeds towards lower index values.value
- the value to store.- Returns:
- the new offset value
o
. This is the position of the last byte written. Additional writing should start at one position earlier.
-
parseBase10
public static final int parseBase10(byte[] b, int ptr, MutableInteger ptrResult)
Parse a base 10 numeric from a sequence of ASCII digits into an int.Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.
- Parameters:
b
- buffer to scan.ptr
- position within buffer to start parsing digits at.ptrResult
- optional location to return the new ptr value through. If null the ptr value will be discarded.- Returns:
- the value at this location; 0 if the location is not a valid numeric.
-
parseLongBase10
public static final long parseLongBase10(byte[] b, int ptr, MutableInteger ptrResult)
Parse a base 10 numeric from a sequence of ASCII digits into a long.Digit sequences can begin with an optional run of spaces before the sequence, and may start with a '+' or a '-' to indicate sign position. Any other characters will cause the method to stop and return the current result to the caller.
- Parameters:
b
- buffer to scan.ptr
- position within buffer to start parsing digits at.ptrResult
- optional location to return the new ptr value through. If null the ptr value will be discarded.- Returns:
- the value at this location; 0 if the location is not a valid numeric.
-
parseHexInt16
public static final int parseHexInt16(byte[] bs, int p)
Parse 4 character base 16 (hex) formatted string to unsigned integer.The number is read in network byte order, that is, most significant nybble first.
- Parameters:
bs
- buffer to parse digits from; positions[p, p+4)
will be parsed.p
- first position within the buffer to parse.- Returns:
- the integer value.
- Throws:
ArrayIndexOutOfBoundsException
- if the string is not hex formatted.
-
parseHexInt32
public static final int parseHexInt32(byte[] bs, int p)
Parse 8 character base 16 (hex) formatted string to unsigned integer.The number is read in network byte order, that is, most significant nybble first.
- Parameters:
bs
- buffer to parse digits from; positions[p, p+8)
will be parsed.p
- first position within the buffer to parse.- Returns:
- the integer value.
- Throws:
ArrayIndexOutOfBoundsException
- if the string is not hex formatted.
-
parseHexInt64
public static final long parseHexInt64(byte[] bs, int p)
Parse 16 character base 16 (hex) formatted string to unsigned long.The number is read in network byte order, that is, most significant nibble first.
- Parameters:
bs
- buffer to parse digits from; positions[p, p+16)
will be parsed.p
- first position within the buffer to parse.- Returns:
- the integer value.
- Throws:
ArrayIndexOutOfBoundsException
- if the string is not hex formatted.- Since:
- 4.3
-
parseHexInt4
public static final int parseHexInt4(byte digit)
Parse a single hex digit to its numeric value (0-15).- Parameters:
digit
- hex character to parse.- Returns:
- numeric value, in the range 0-15.
- Throws:
ArrayIndexOutOfBoundsException
- if the input digit is not a valid hex digit.
-
parseTimeZoneOffset
public static final int parseTimeZoneOffset(byte[] b, int ptr)
Parse a Git style timezone string.The sequence "-0315" will be parsed as the numeric value -195, as the lower two positions count minutes, not 100ths of an hour.
- Parameters:
b
- buffer to scan.ptr
- position within buffer to start parsing digits at.- Returns:
- the timezone at this location, expressed in minutes.
-
parseTimeZoneOffset
public static final int parseTimeZoneOffset(byte[] b, int ptr, MutableInteger ptrResult)
Parse a Git style timezone string.The sequence "-0315" will be parsed as the numeric value -195, as the lower two positions count minutes, not 100ths of an hour.
- Parameters:
b
- buffer to scan.ptr
- position within buffer to start parsing digits at.ptrResult
- optional location to return the new ptr value through. If null the ptr value will be discarded.- Returns:
- the timezone at this location, expressed in minutes.
- Since:
- 4.1
-
next
public static final int next(byte[] b, int ptr, char chrA)
Locate the first position after a given character.- Parameters:
b
- buffer to scan.ptr
- position within buffer to start looking for chrA at.chrA
- character to find.- Returns:
- new position just after chrA.
-
nextLF
public static final int nextLF(byte[] b, int ptr)
Locate the first position after the next LF.This method stops on the first '\n' it finds.
- Parameters:
b
- buffer to scan.ptr
- position within buffer to start looking for LF at.- Returns:
- new position just after the first LF found.
-
nextLF
public static final int nextLF(byte[] b, int ptr, char chrA)
Locate the first position after either the given character or LF.This method stops on the first match it finds from either chrA or '\n'.
- Parameters:
b
- buffer to scan.ptr
- position within buffer to start looking for chrA or LF at.chrA
- character to find.- Returns:
- new position just after the first chrA or LF to be found.
-
headerEnd
public static final int headerEnd(byte[] b, int ptr)
Locate the end of the header. Note that headers may be more than one line long.- Parameters:
b
- buffer to scan.ptr
- position within buffer to start looking for the end-of-header.- Returns:
- new position just after the header. This is either b.length, or the index of the header's terminating newline.
- Since:
- 5.1
-
headerStart
public static final int headerStart(byte[] headerName, byte[] b, int ptr)
Find the start of the contents of a given header.- Parameters:
b
- buffer to scan.headerName
- header to search forptr
- position within buffer to start looking for header at.- Returns:
- new position at the start of the header's contents, -1 for not found
- Since:
- 5.1
-
prev
public static final int prev(byte[] b, int ptr, char chrA)
Locate the first position before a given character.- Parameters:
b
- buffer to scan.ptr
- position within buffer to start looking for chrA at.chrA
- character to find.- Returns:
- new position just before chrA, -1 for not found
-
prevLF
public static final int prevLF(byte[] b, int ptr)
Locate the first position before the previous LF.This method stops on the first '\n' it finds.
- Parameters:
b
- buffer to scan.ptr
- position within buffer to start looking for LF at.- Returns:
- new position just before the first LF found, -1 for not found
-
prevLF
public static final int prevLF(byte[] b, int ptr, char chrA)
Locate the previous position before either the given character or LF.This method stops on the first match it finds from either chrA or '\n'.
- Parameters:
b
- buffer to scan.ptr
- position within buffer to start looking for chrA or LF at.chrA
- character to find.- Returns:
- new position just before the first chrA or LF to be found, -1 for not found
-
lineMap
public static final IntList lineMap(byte[] buf, int ptr, int end)
Index the region between[ptr, end)
to find line starts.The returned list is 1 indexed. Index 0 contains
Integer.MIN_VALUE
to pad the list out.Using a 1 indexed list means that line numbers can be directly accessed from the list, so
list.get(1)
(aka get line 1) returnsptr
.The last element (index
map.size()-1
) always containsend
.- Parameters:
buf
- buffer to scan.ptr
- position within the buffer corresponding to the first byte of line 1.end
- 1 past the end of the content withinbuf
.- Returns:
- a line map indicating the starting position of each line.
-
lineMapOrBinary
public static final IntList lineMapOrBinary(byte[] buf, int ptr, int end) throws BinaryBlobException
LikelineMap(byte[], int, int)
but throwBinaryBlobException
if a NUL byte is encountered.- Parameters:
buf
- buffer to scan.ptr
- position within the buffer corresponding to the first byte of line 1.end
- 1 past the end of the content withinbuf
.- Returns:
- a line map indicating the starting position of each line.
- Throws:
BinaryBlobException
- if a NUL byte or a lone CR is found.- Since:
- 5.0
-
author
public static final int author(byte[] b, int ptr)
Locate the "author " header line data.- Parameters:
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer and does not accidentally look at message body.- Returns:
- position just after the space in "author ", so the first character of the author's name. If no author header can be located -1 is returned.
-
committer
public static final int committer(byte[] b, int ptr)
Locate the "committer " header line data.- Parameters:
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer and does not accidentally look at message body.- Returns:
- position just after the space in "committer ", so the first character of the committer's name. If no committer header can be located -1 is returned.
-
tagger
public static final int tagger(byte[] b, int ptr)
Locate the "tagger " header line data.- Parameters:
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the tag buffer and does not accidentally look at message body.- Returns:
- position just after the space in "tagger ", so the first character of the tagger's name. If no tagger header can be located -1 is returned.
-
encoding
public static final int encoding(byte[] b, int ptr)
Locate the "encoding " header line.- Parameters:
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the buffer and does not accidentally look at the message body.- Returns:
- position just after the space in "encoding ", so the first character of the encoding's name. If no encoding header can be located -1 is returned (and UTF-8 should be assumed).
-
parseEncodingName
@Nullable public static String parseEncodingName(byte[] b)
Parse the "encoding " header as a string.Locates the "encoding " header (if present) and returns its value.
- Parameters:
b
- buffer to scan.- Returns:
- the encoding header as specified in the commit; null if the header was not present and should be assumed.
- Since:
- 4.2
-
parseEncoding
public static Charset parseEncoding(byte[] b)
Parse the "encoding " header into a character set reference.Locates the "encoding " header (if present) by first calling
encoding(byte[], int)
and then returns the proper character set to apply to this buffer to evaluate its contents as character data.If no encoding header is present
UTF-8
is assumed.- Parameters:
b
- buffer to scan.- Returns:
- the Java character set representation. Never null.
- Throws:
IllegalCharsetNameException
- if the character set requested by the encoding header is malformed and unsupportable.UnsupportedCharsetException
- if the JRE does not support the character set requested by the encoding header.
-
parsePersonIdent
public static PersonIdent parsePersonIdent(String in)
Parse a name string (e.g. author, committer, tagger) into a PersonIdent.Leading spaces won't be trimmed from the string, i.e. will show up in the parsed name afterwards.
- Parameters:
in
- the string to parse a name from.- Returns:
- the parsed identity or null in case the identity could not be parsed.
-
parsePersonIdent
public static PersonIdent parsePersonIdent(byte[] raw, int nameB)
Parse a name line (e.g. author, committer, tagger) into a PersonIdent.When passing in a value for
nameB
callers should use the return value ofauthor(byte[], int)
orcommitter(byte[], int)
, as these methods provide the proper position within the buffer.- Parameters:
raw
- the buffer to parse character data from.nameB
- first position of the identity information. This should be the first position after the space which delimits the header field name (e.g. "author" or "committer") from the rest of the identity line.- Returns:
- the parsed identity or null in case the identity could not be parsed.
-
parsePersonIdentOnly
public static PersonIdent parsePersonIdentOnly(byte[] raw, int nameB)
Parse a name data (e.g. as within a reflog) into a PersonIdent.When passing in a value for
nameB
callers should use the return value ofauthor(byte[], int)
orcommitter(byte[], int)
, as these methods provide the proper position within the buffer.- Parameters:
raw
- the buffer to parse character data from.nameB
- first position of the identity information. This should be the first position after the space which delimits the header field name (e.g. "author" or "committer") from the rest of the identity line.- Returns:
- the parsed identity. Never null.
-
endOfFooterLineKey
public static int endOfFooterLineKey(byte[] raw, int ptr)
Locate the end of a footer line key string.If the region at
raw[ptr]
matches^[A-Za-z0-9-]+:
(e.g. "Signed-off-by: A. U. Thor\n") then this method returns the position of the first ':'.If the region at
raw[ptr]
does not match^[A-Za-z0-9-]+:
then this method returns -1.- Parameters:
raw
- buffer to scan.ptr
- first position within raw to consider as a footer line key.- Returns:
- position of the ':' which terminates the footer line key if this is otherwise a valid footer line key; otherwise -1.
-
decode
public static String decode(byte[] buffer)
Decode a buffer under UTF-8, if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.- Parameters:
buffer
- buffer to pull raw bytes from.- Returns:
- a string representation of the range
[start,end)
, after decoding the region through the specified character set.
-
decode
public static String decode(byte[] buffer, int start, int end)
Decode a buffer under UTF-8, if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.- Parameters:
buffer
- buffer to pull raw bytes from.start
- start position in bufferend
- one position past the last location within the buffer to take data from.- Returns:
- a string representation of the range
[start,end)
, after decoding the region through the specified character set.
-
decode
public static String decode(Charset cs, byte[] buffer)
Decode a buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.- Parameters:
cs
- character set to use when decoding the buffer.buffer
- buffer to pull raw bytes from.- Returns:
- a string representation of the range
[start,end)
, after decoding the region through the specified character set.
-
decode
public static String decode(Charset cs, byte[] buffer, int start, int end)
Decode a region of the buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, the fail-safe ISO-8859-1 encoding is tried.- Parameters:
cs
- character set to use when decoding the buffer.buffer
- buffer to pull raw bytes from.start
- first position within the buffer to take data from.end
- one position past the last location within the buffer to take data from.- Returns:
- a string representation of the range
[start,end)
, after decoding the region through the specified character set.
-
decodeNoFallback
public static String decodeNoFallback(Charset cs, byte[] buffer, int start, int end) throws CharacterCodingException
Decode a region of the buffer under the specified character set if possible. If the byte stream cannot be decoded that way, the platform default is tried and if that too fails, an exception is thrown.- Parameters:
cs
- character set to use when decoding the buffer.buffer
- buffer to pull raw bytes from.start
- first position within the buffer to take data from.end
- one position past the last location within the buffer to take data from.- Returns:
- a string representation of the range
[start,end)
, after decoding the region through the specified character set. - Throws:
CharacterCodingException
- the input is not in any of the tested character sets.
-
extractBinaryString
public static String extractBinaryString(byte[] buffer, int start, int end)
Decode a region of the buffer under the ISO-8859-1 encoding. Each byte is treated as a single character in the 8859-1 character encoding, performing a raw binary->char conversion.- Parameters:
buffer
- buffer to pull raw bytes from.start
- first position within the buffer to take data from.end
- one position past the last location within the buffer to take data from.- Returns:
- a string representation of the range
[start,end)
.
-
commitMessage
public static final int commitMessage(byte[] b, int ptr)
Locate the position of the commit message body.- Parameters:
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the commit buffer.- Returns:
- position of the user's message buffer.
-
tagMessage
public static final int tagMessage(byte[] b, int ptr)
Locate the position of the tag message body.- Parameters:
b
- buffer to scan.ptr
- position in buffer to start the scan at. Most callers should pass 0 to ensure the scan starts from the beginning of the tag buffer.- Returns:
- position of the user's message buffer.
-
endOfParagraph
public static final int endOfParagraph(byte[] b, int start)
Locate the end of a paragraph.A paragraph is ended by two consecutive LF bytes or CRLF pairs
- Parameters:
b
- buffer to scan.start
- position in buffer to start the scan at. Most callers will want to pass the first position of the commit message (as found bycommitMessage(byte[], int)
.- Returns:
- position of the LF at the end of the paragraph;
b.length
if no paragraph end could be located.
-
lastIndexOfTrim
public static int lastIndexOfTrim(byte[] raw, char ch, int pos)
Get last index ofch
in raw, trimming spaces.- Parameters:
raw
- buffer to scan.ch
- character to find.pos
- starting position.- Returns:
- last index of
ch
in raw, trimming spaces. - Since:
- 4.1
-
-