Class PackParser

    • Constructor Detail

      • PackParser

        protected PackParser​(ObjectDatabase odb,
                             InputStream src)
        Initialize a pack parser.
        Parameters:
        odb - database the parser will write its objects into.
        src - the stream the parser will read.
    • Method Detail

      • isAllowThin

        public boolean isAllowThin()
        Whether a thin pack (missing base objects) is permitted.
        Returns:
        true if a thin pack (missing base objects) is permitted.
      • setAllowThin

        public void setAllowThin​(boolean allow)
        Configure this index pack instance to allow a thin pack.

        Thin packs are sometimes used during network transfers to allow a delta to be sent without a base object. Such packs are not permitted on disk.

        Parameters:
        allow - true to enable a thin pack.
      • isCheckObjectCollisions

        protected boolean isCheckObjectCollisions()
        Whether received objects are verified to prevent collisions.
        Returns:
        if true received objects are verified to prevent collisions.
        Since:
        4.1
      • setCheckObjectCollisions

        protected void setCheckObjectCollisions​(boolean check)
        Enable checking for collisions with existing objects.

        By default PackParser looks for each received object in the repository. If the object already exists, the existing object is compared byte-for-byte with the newly received copy to ensure they are identical. The receive is aborted with an exception if any byte differs. This check is necessary to prevent an evil attacker from supplying a replacement object into this repository in the event that a discovery enabling SHA-1 collisions is made.

        This check may be very costly to perform, and some repositories may have other ways to segregate newly received object data. The check is enabled by default, but can be explicitly disabled if the implementation can provide the same guarantee, or is willing to accept the risks associated with bypassing the check.

        Parameters:
        check - true to enable collision checking (strongly encouraged).
        Since:
        4.1
      • setNeedNewObjectIds

        public void setNeedNewObjectIds​(boolean b)
        Configure this index pack instance to keep track of new objects.

        By default an index pack doesn't save the new objects that were created when it was instantiated. Setting this flag to true allows the caller to use getNewObjectIds() to retrieve that list.

        Parameters:
        b - true to enable keeping track of new objects.
      • setNeedBaseObjectIds

        public void setNeedBaseObjectIds​(boolean b)
        Configure this index pack instance to keep track of the objects assumed for delta bases.

        By default an index pack doesn't save the objects that were used as delta bases. Setting this flag to true will allow the caller to use getBaseObjectIds() to retrieve that list.

        Parameters:
        b - true to enable keeping track of delta bases.
      • isCheckEofAfterPackFooter

        public boolean isCheckEofAfterPackFooter()
        Whether the EOF should be read from the input after the footer.
        Returns:
        true if the EOF should be read from the input after the footer.
      • setCheckEofAfterPackFooter

        public void setCheckEofAfterPackFooter​(boolean b)
        Ensure EOF is read from the input stream after the footer.
        Parameters:
        b - true if the EOF should be read; false if it is not checked.
      • isExpectDataAfterPackFooter

        public boolean isExpectDataAfterPackFooter()
        Whether there is data expected after the pack footer.
        Returns:
        true if there is data expected after the pack footer.
      • setExpectDataAfterPackFooter

        public void setExpectDataAfterPackFooter​(boolean e)
        Set if there is additional data in InputStream after pack.
        Parameters:
        e - true if there is additional data in InputStream after pack. This requires the InputStream to support the mark and reset functions.
      • getNewObjectIds

        public ObjectIdSubclassMap<ObjectId> getNewObjectIds()
        Get the new objects that were sent by the user
        Returns:
        the new objects that were sent by the user
      • getBaseObjectIds

        public ObjectIdSubclassMap<ObjectId> getBaseObjectIds()
        Get set of objects the incoming pack assumed for delta purposes
        Returns:
        set of objects the incoming pack assumed for delta purposes
      • setObjectChecker

        public void setObjectChecker​(ObjectChecker oc)
        Configure the checker used to validate received objects.

        Usually object checking isn't necessary, as Git implementations only create valid objects in pack files. However, additional checking may be useful if processing data from an untrusted source.

        Parameters:
        oc - the checker instance; null to disable object checking.
      • setObjectChecking

        public void setObjectChecking​(boolean on)
        Configure the checker used to validate received objects.

        Usually object checking isn't necessary, as Git implementations only create valid objects in pack files. However, additional checking may be useful if processing data from an untrusted source.

        This is shorthand for:

         setObjectChecker(on ? new ObjectChecker() : null);
         
        Parameters:
        on - true to enable the default checker; false to disable it.
      • getLockMessage

        public String getLockMessage()
        Get the message to record with the pack lock.
        Returns:
        the message to record with the pack lock.
      • setLockMessage

        public void setLockMessage​(String msg)
        Set the lock message for the incoming pack data.
        Parameters:
        msg - if not null, the message to associate with the incoming data while it is locked to prevent garbage collection.
      • setMaxObjectSizeLimit

        public void setMaxObjectSizeLimit​(long limit)
        Set the maximum allowed Git object size.

        If an object is larger than the given size the pack-parsing will throw an exception aborting the parsing.

        Parameters:
        limit - the Git object size limit. If zero then there is not limit.
      • getObjectCount

        public int getObjectCount()
        Get the number of objects in the stream.

        The object count is only available after parse(ProgressMonitor) has returned. The count may have been increased if the stream was a thin pack, and missing bases objects were appending onto it by the subclass.

        Returns:
        number of objects parsed out of the stream.
      • getObject

        public PackedObjectInfo getObject​(int nth)
        Get the information about the requested object.

        The object information is only available after parse(ProgressMonitor) has returned.

        Parameters:
        nth - index of the object in the stream. Must be between 0 and getObjectCount()-1.
        Returns:
        the object information.
      • getSortedObjectList

        public List<PackedObjectInfo> getSortedObjectList​(Comparator<PackedObjectInfo> cmp)
        Get all of the objects, sorted by their name.

        The object information is only available after parse(ProgressMonitor) has returned.

        To maintain lower memory usage and good runtime performance, this method sorts the objects in-place and therefore impacts the ordering presented by getObject(int).

        Parameters:
        cmp - comparison function, if null objects are stored by ObjectId.
        Returns:
        sorted list of objects in this pack stream.
      • getPackSize

        public long getPackSize()
        Get the size of the newly created pack.

        This will also include the pack index size if an index was created. This method should only be called after pack parsing is finished.

        Returns:
        the pack size (including the index size) or -1 if the size cannot be determined
        Since:
        3.3
      • getReceivedPackStatistics

        public ReceivedPackStatistics getReceivedPackStatistics()
        Returns the statistics of the parsed pack.

        This should only be called after pack parsing is finished.

        Returns:
        ReceivedPackStatistics
        Since:
        4.6
      • parse

        public PackLock parse​(ProgressMonitor receiving,
                              ProgressMonitor resolving)
                       throws IOException
        Parse the pack stream.
        Parameters:
        receiving - receives progress feedback during the initial receiving objects phase. If null, NullProgressMonitor will be used.
        resolving - receives progress feedback during the resolving objects phase.
        Returns:
        the pack lock, if one was requested by setting setLockMessage(String).
        Throws:
        IOException - the stream is malformed, or contains corrupt objects.
        Since:
        6.0
      • readObjectHeader

        protected PackParser.ObjectTypeAndSize readObjectHeader​(PackParser.ObjectTypeAndSize info)
                                                         throws IOException
        Read the header of the current object.

        After the header has been parsed, this method automatically invokes onObjectHeader(Source, byte[], int, int) to allow the implementation to update its internal checksums for the bytes read.

        When this method returns the database will be positioned on the first byte of the deflated data stream.

        Parameters:
        info - the info object to populate.
        Returns:
        info, after populating.
        Throws:
        IOException - the size cannot be read.
      • verifySafeObject

        protected void verifySafeObject​(AnyObjectId id,
                                        int type,
                                        byte[] data)
                                 throws CorruptObjectException
        Verify the integrity of the object.
        Parameters:
        id - identity of the object to be checked.
        type - the type of the object.
        data - raw content of the object.
        Throws:
        CorruptObjectException
        Since:
        4.9
      • buffer

        protected byte[] buffer()
        Get a temporary byte array for use by the caller.
        Returns:
        a temporary byte array for use by the caller.
      • newInfo

        protected PackedObjectInfo newInfo​(AnyObjectId id,
                                           PackParser.UnresolvedDelta delta,
                                           ObjectId deltaBase)
        Construct a PackedObjectInfo instance for this parser.
        Parameters:
        id - identity of the object to be tracked.
        delta - if the object was previously an unresolved delta, this is the delta object that was tracking it. Otherwise null.
        deltaBase - if the object was previously an unresolved delta, this is the ObjectId of the base of the delta. The base may be outside of the pack stream if the stream was a thin-pack.
        Returns:
        info object containing this object's data.
      • setExpectedObjectCount

        protected void setExpectedObjectCount​(long expectedObjectCount)
        Set the expected number of objects in the pack stream.

        The object count in the pack header is not always correct for some Dfs pack files. e.g. INSERT pack always assume 1 object in the header since the actual object count is unknown when the pack is written.

        If external implementation wants to overwrite the expectedObjectCount, they should call this method during onPackHeader(long).

        Parameters:
        expectedObjectCount - a long.
        Since:
        4.9
      • onStoreStream

        protected abstract void onStoreStream​(byte[] raw,
                                              int pos,
                                              int len)
                                       throws IOException
        Store bytes received from the raw stream.

        This method is invoked during parse(ProgressMonitor) as data is consumed from the incoming stream. Implementors may use this event to archive the raw incoming stream to the destination repository in large chunks, without paying attention to object boundaries.

        The only component of the pack not supplied to this method is the last 20 bytes of the pack that comprise the trailing SHA-1 checksum. Those are passed to onPackFooter(byte[]).

        Parameters:
        raw - buffer to copy data out of.
        pos - first offset within the buffer that is valid.
        len - number of bytes in the buffer that are valid.
        Throws:
        IOException - the stream cannot be archived.
      • onObjectHeader

        protected abstract void onObjectHeader​(PackParser.Source src,
                                               byte[] raw,
                                               int pos,
                                               int len)
                                        throws IOException
        Store (and/or checksum) an object header.

        Invoked after any of the onBegin() events. The entire header is supplied in a single invocation, before any object data is supplied.

        Parameters:
        src - where the data came from
        raw - buffer to read data from.
        pos - first offset within buffer that is valid.
        len - number of bytes in buffer that are valid.
        Throws:
        IOException - the stream cannot be archived.
      • onObjectData

        protected abstract void onObjectData​(PackParser.Source src,
                                             byte[] raw,
                                             int pos,
                                             int len)
                                      throws IOException
        Store (and/or checksum) a portion of an object's data.

        This method may be invoked multiple times per object, depending on the size of the object, the size of the parser's internal read buffer, and the alignment of the object relative to the read buffer.

        Invoked after onObjectHeader(Source, byte[], int, int).

        Parameters:
        src - where the data came from
        raw - buffer to read data from.
        pos - first offset within buffer that is valid.
        len - number of bytes in buffer that are valid.
        Throws:
        IOException - the stream cannot be archived.
      • onInflatedObjectData

        protected abstract void onInflatedObjectData​(PackedObjectInfo obj,
                                                     int typeCode,
                                                     byte[] data)
                                              throws IOException
        Invoked for commits, trees, tags, and small blobs.
        Parameters:
        obj - the object info, populated.
        typeCode - the type of the object.
        data - inflated data for the object.
        Throws:
        IOException - the object cannot be archived.
      • onPackHeader

        protected abstract void onPackHeader​(long objCnt)
                                      throws IOException
        Provide the implementation with the original stream's pack header.
        Parameters:
        objCnt - number of objects expected in the stream.
        Throws:
        IOException - the implementation refuses to work with this many objects.
      • onPackFooter

        protected abstract void onPackFooter​(byte[] hash)
                                      throws IOException
        Provide the implementation with the original stream's pack footer.
        Parameters:
        hash - the trailing 20 bytes of the pack, this is a SHA-1 checksum of all of the pack data.
        Throws:
        IOException - the stream cannot be archived.
      • onAppendBase

        protected abstract boolean onAppendBase​(int typeCode,
                                                byte[] data,
                                                PackedObjectInfo info)
                                         throws IOException
        Provide the implementation with a base that was outside of the pack.

        This event only occurs on a thin pack for base objects that were outside of the pack and came from the local repository. Usually an implementation uses this event to compress the base and append it onto the end of the pack, so the pack stays self-contained.

        Parameters:
        typeCode - type of the base object.
        data - complete content of the base object.
        info - packed object information for this base. Implementors must populate the CRC and offset members if returning true.
        Returns:
        true if the info should be included in the object list returned by getSortedObjectList(Comparator), false if it should not be included.
        Throws:
        IOException - the base could not be included into the pack.
      • onEndThinPack

        protected abstract void onEndThinPack()
                                       throws IOException
        Event indicating a thin pack has been completely processed.

        This event is invoked only if a thin pack has delta references to objects external from the pack. The event is called after all of those deltas have been resolved.

        Throws:
        IOException - the pack cannot be archived.
      • seekDatabase

        protected abstract PackParser.ObjectTypeAndSize seekDatabase​(PackParser.UnresolvedDelta delta,
                                                                     PackParser.ObjectTypeAndSize info)
                                                              throws IOException
        Reposition the database to re-read a previously stored object.

        If the database is computing CRC-32 checksums for object data, it should reset its internal CRC instance during this method call.

        Parameters:
        delta - the object position to begin reading from. This is an instance previously returned by onEndDelta().
        info - object to populate with type and size.
        Returns:
        the info object.
        Throws:
        IOException - the database cannot reposition to this location.
      • readDatabase

        protected abstract int readDatabase​(byte[] dst,
                                            int pos,
                                            int cnt)
                                     throws IOException
        Read from the database's current position into the buffer.
        Parameters:
        dst - the buffer to copy read data into.
        pos - position within dst to start copying data into.
        cnt - ideal target number of bytes to read. Actual read length may be shorter.
        Returns:
        number of bytes stored.
        Throws:
        IOException - the database cannot be accessed.
      • checkCRC

        protected abstract boolean checkCRC​(int oldCRC)
        Check the current CRC matches the expected value.

        This method is invoked when an object is read back in from the database and its data is used during delta resolution. The CRC is validated after the object has been fully read, allowing the parser to verify there was no silent data corruption.

        Implementations are free to ignore this check by always returning true if they are performing other data integrity validations at a lower level.

        Parameters:
        oldCRC - the prior CRC that was recorded during the first scan of the object from the pack stream.
        Returns:
        true if the CRC matches; false if it does not.
      • onBeginWholeObject

        protected abstract void onBeginWholeObject​(long streamPosition,
                                                   int type,
                                                   long inflatedSize)
                                            throws IOException
        Event notifying the start of an object stored whole (not as a delta).
        Parameters:
        streamPosition - position of this object in the incoming stream.
        type - type of the object; one of Constants.OBJ_COMMIT, Constants.OBJ_TREE, Constants.OBJ_BLOB, or Constants.OBJ_TAG.
        inflatedSize - size of the object when fully inflated. The size stored within the pack may be larger or smaller, and is not yet known.
        Throws:
        IOException - the object cannot be recorded.
      • onEndWholeObject

        protected abstract void onEndWholeObject​(PackedObjectInfo info)
                                          throws IOException
        Event notifying the current object.
        Parameters:
        info - object information.
        Throws:
        IOException - the object cannot be recorded.
      • onBeginOfsDelta

        protected abstract void onBeginOfsDelta​(long deltaStreamPosition,
                                                long baseStreamPosition,
                                                long inflatedSize)
                                         throws IOException
        Event notifying start of a delta referencing its base by offset.
        Parameters:
        deltaStreamPosition - position of this object in the incoming stream.
        baseStreamPosition - position of the base object in the incoming stream. The base must be before the delta, therefore baseStreamPosition &lt; deltaStreamPosition. This is not the position returned by a prior end object event.
        inflatedSize - size of the delta when fully inflated. The size stored within the pack may be larger or smaller, and is not yet known.
        Throws:
        IOException - the object cannot be recorded.
      • onBeginRefDelta

        protected abstract void onBeginRefDelta​(long deltaStreamPosition,
                                                AnyObjectId baseId,
                                                long inflatedSize)
                                         throws IOException
        Event notifying start of a delta referencing its base by ObjectId.
        Parameters:
        deltaStreamPosition - position of this object in the incoming stream.
        baseId - name of the base object. This object may be later in the stream, or might not appear at all in the stream (in the case of a thin-pack).
        inflatedSize - size of the delta when fully inflated. The size stored within the pack may be larger or smaller, and is not yet known.
        Throws:
        IOException - the object cannot be recorded.
      • onEndDelta

        protected PackParser.UnresolvedDelta onEndDelta()
                                                 throws IOException
        Event notifying the current object.
        Returns:
        object information that must be populated with at least the offset.
        Throws:
        IOException - the object cannot be recorded.