U.S. Patent Attorneys in New Jersey & New York
New York City: 212-316-0381 New Jersey: 973-685-5280 WhatsApp: Click Here to Call E-Mail: firm@patentlawny.com

Creating a Contract Based on Previous Clauses and Rankings thereof - US (Tech Patents and Software Patents)

Patent no: 8,209,278
Issued: June 26, 2012
Inventor: Straus; Jay Bradley (New York, NY)
Attorney: Michael Feigin

Abstract

A computer system is disclosed herein that surveys numerous legal documents which memorialize business or legal transactions and then determines common provisions in such documents. The forms of these "core provisions" are then stored in a knowledge base of the system. The system then allows attorney users to apply the system to analyze unsurveyed documents and use the knowledge base to recognize the core provisions that are most similar to the text of these unsurveyed documents. The user can then edit these unsurveyed documents in a rapid accurate fashion by automated means to revise text to match these core provisions. Document editing functionality is also present, along with means to use correlations to determine the likelihood of the presence/absence of specific provisions and the presence/absence of various particular documents in groups of documents used to memorialize certain types of transactions.

Claims

I claim:

1. A computer system for processing user selected kind of documents, comprising: a memory; a processor coupled to the memory and operative to perform the operations of: conducting a survey of a quantity of documents of said kind to identify textual patterns present in such documents, which textual patterns are substantially similar to common textual provisions which frequently recur in said kind of said documents; where substantial similarity is determined by: (x) degree of similarity as a user of said kind of said documents, of ordinary skill, would recognize said patterns or provisions to be variants of one another as would appear in ordinary course use of said kind of said documents, or (y) numerical similarity through satisfaction of a numerical threshold present in said computer system such that a numerical comparison of strings of text is in excess of said numerical threshold; analyzing other existing documents or other existing sets of documents to determine the presence or absence of said identified textual patterns which were identified in said survey; receiving a subjective rank of favorability towards a party contractually agreeing to be bound by at least one block of text identified as being associated with a textual pattern, and a length of said at least one block of text for said at least one block of text; editing said other existing documents or other existing sets of documents for a user of said computer system to revise said other existing document or said other existing set of documents in a manner including exhibiting said rank of favorability towards said party to a said document and a length for said at least one block of text to include or exclude text of such identified textual patterns.

2. The computer system of claim 1 further performing an operation of exhibiting text of an existing document and receiving changes to said text.

3. The computer system of claim 1 further performing an operation of ordering identified textual patterns by length.

4. The computer system of claim 1 further performing an operation of presenting said blocks of text ordered by favorability.

5. The computer system of claim 4 further performing an operation of ordering said blocks of text by length.

6. The computer system of claim 1 further carrying out an operation of surveying said documents and calculating numerical correlations corresponding to the likelihood of presence or absence of said identified textual patterns.

7. The computer system of claim 1 further carrying out an operation of surveying groups of said documents of different classes, where such classes of documents are frequently concurrently present in user identified types of transactions, and for calculating numerical correlations corresponding to the likelihood of presence or absence of said concurrence.

8. The computer system of claim 1 further carrying out an operation of exhibiting a common editing platform such that distinct users of said computer system contemporaneously edit the same document or sets of documents, and one of said distinct users can view edits from another said distinct user.

9. The computer system of claim 1 further carrying out an operation of exchanging the roles of two parties to a document in a given provision by replacing the terminology which references the first party with the terminology that references the second party, and replacing the terminology that references the second party with the terminology that references the first party.

10. The computer system of claim 1 further carrying out an operation of internal cross referencing to provisions within said documents to be analyzed through replacement of said cross references with descriptive information regarding said provisions.

11. The computer system of claim 1 further carrying out estimating favorability of a provision by an averaging or other weight-based combination of favorability of constituent sentences of said provision.

12. The computer system of claim 1 further carrying out a step of populating a grid of length and favorability properties of said identified textual patterns by: (i) rounding favorabilities to a specific level of accuracy to determine specific discrete grid axis values; (ii) initial assignment of provisions to grid points based on length and favorability; (iii) assigning provisions to otherwise unassigned grid points in a specific row of said grid by repetitively replicating the assignment to other elements in a given row until another initially assigned grid point is encountered or a boundary value is reached; and (iv) assigning certain identified patterns to otherwise unassigned grid points in other rows by replicating entire row segments except for such grid values where an initial assignment has been established or a boundary value is reached.

13. The computer system of claim 1 further carrying out integrating said system with a document management system.

14. The computer system claim 1, wherein said subjective rank of favorability is unrelated to a frequency of occurrence.

15. The system of claim 1, wherein said exhibiting of subjective favorability towards a party is modified based on ranks of favorability by multiple parties.

16. The system of claim 15, wherein subjective rankings of favorability are received by at least two attorneys at a law firm, and a ranking of one attorney is given more weight than a ranking of another attorney based on a position within a law firm of each said attorney providing a said ranking.

17. The system of claim 1, wherein said subjective ranking of favorability is obtained, at least in part, based on a prior subjective ranking of favorability of a said provision with a said similar textual provision.

18. The system of claim 17, wherein said subjective ranking of favorability is further obtained, at least in part, based on a dictionary lookup.

19. A method of drafting a document, comprising the steps of: conducting a survey of a quantity of documents of said kind to identify textual patterns present in such documents, which textual patterns are substantially similar to common textual provisions which frequently recur in said kind of said documents; where substantial similarity is determined by: (x) degree of similarity as a user of said kind of said documents, of ordinary skill, would recognize said patterns or provisions to be variants of one another as would appear in ordinary course use of said kind of said documents, or (y) numerical similarity through satisfaction of a numerical threshold present in said computer system such that a numerical comparison of strings of text is in excess of said numerical threshold; analyzing other existing documents or other existing sets of documents to determine the presence or absence of said identified textual patterns which were identified in said survey; receiving a subjective rank of favorability towards a party contractually agreeing to be bound by at least one block of text identified as being associated with a textual pattern, and a length of said at least one block of text for said at least one block of text; editing said other existing documents or other existing sets of documents for a user of said computer system to revise said other existing document or said other existing set of documents in a manner including exhibiting said rank of favorability towards said party to a said document and a length for said at least one block of text to include or exclude text of such identified textual patterns.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

Not Applicable.

FEDERALLY SPONSORED RESEARCH

Not Applicable.

SEQUENCE LISTING OF PROGRAM

Not Applicable.

BACKGROUND OF THE INVENTION

1. Field of Invention

This invention relates to legal document computer systems, specifically as used to analyze and edit such documents or sets of such documents.

2. The Prior Art

The prior art consists of various approaches for the initial creation of a draft of a legal agreement by assembling contract provisions contained in a computer database. The agreement is assembled by adding provisions sequentially, one on top of the other, like stacking building blocks. Once assembled, the initial draft can then be sent by a law firm to opposing counsel to begin negotiations. Some examples of such prior art are disclosed in U.S. Pat. No. 5,692,206 issued to Shirley et al. (1997), U.S. Pat. No. 6,934,905 issued to Tighe (2005) and U.S. Pat. No. 7,080,076 to Williamson et al. (2006).

A computer database of contract provisions is actually an excellent idea. However, its application in the context of the prior art is essentially useless. This is because law firms have no need for the prior art's technology. The reality of law firm practice is that law firms already have standard model documents which are already drafted and ready for use. Thus, they have no need to assemble documents using a legal text database. The documents are already assembled.

An Example of Actual Law Firm Practice:

A Stock Purchase Agreement.

As a more specific example of why the prior art is not useful, consider a start-up company that wants to raise capital. It can do this by privately selling stock to investors pursuant to a stock purchase agreement.

In such a situation, the company would approach its law firm and ask them to prepare a stock purchase agreement. The law firm would likely also prepare a disclosure document, describing the company to the potential investors and setting forth the potential risks of the investment.

The law firm would accomplish these tasks by virtue of the fact that it would already have model documents on its computer system. It would already have a template stock purchase agreement available for use on its word processing system. It would already have a template disclosure document. So to prepare new drafts for the company, it would simply take these template forms, fill in the company's name and address, and the documents would be ready to be sent to the opposition (i.e. the investors and their law firm).

Of course there are situations more complicated than simply filling in a name and address. There may be risks specific to the company's proposed business that need to be included in the disclosure document. Or the proposed investment may have non-standard terms regarding profit sharing or atypical restrictions on the investors' assigning their newly purchased interests in the company.

In those instances, standard practice is for a law firm to choose the most similar versions of the same kinds of documents that it has worked on for a different, previous client. Using these most similar versions, the law firm would change the previous client's name and address, and then further revise the documents as necessary. Sometimes it might even use other sample documents from multiple previous clients to assist in the process. Perhaps one client's prior deal had one similar aspect of the current deal but a different prior deal had another aspect. However, even in that circumstance, simple cut and paste functionality available in any commercial word processing program quickly and easily allows for combining text as necessary.

In other words, the use of a separate contract provision database program to assemble an initial set of documents is not particularly helpful, since the documents are nearly fully assembled at the outset. The use of a separate program for such a database would, in the foregoing context, simply be cumbersome and a distraction.

Same Process for all Documents.

Such use of standard forms, or recycling old documents used for previous clients with some minor changes, is the reality of law firm practice today. It is as true for drafting a stock purchase agreement as it is for drafting an agreement for a loan or for a sale of real estate. This is because it simply doesn't make sense to "reinvent" the wheel by assembling each document over and over again from scratch. The time and energy involved in such an approach rule it out on efficiency grounds. Furthermore, such an approach of new document assembly introduces additional risks to the process--key aspects of documents might be accidentally left out in the assembly process.

OBJECTS AND ADVANTAGES

Several objects and advantages of the present invention are: (a) to provide a method to survey sets of existing legal documents to determine common textual patterns within specific kinds of documents; (b) to provide a method to survey sets of existing legal documents to determine correlation and anti-correlation information regarding the common presence or absence of specific pairs of provisions or pairs of documents; (c) to provide a user interface to allow a user to designate a set of documents as an overall "Project" and to add and remove specific documents from the Project; (d) to provide recognition functionality so that provisions in previously unsurveyed documents may be recognized as similar or identical to common textual patterns determined by the survey functionality; (e) to provide a user interface to allow a user to edit the text of specific documents, including to match more closely to common textual patterns determined by the survey functionality; (f) to make available common textual patterns organized by attributes such as length and favorability; (g) to integrate the method with a document management system; (h) to integrate the method into a system whereby multiple users may suggest edits to a single document; and (i) to allow comparisons of documents with similar kinds of provisions in different sequences in the two documents being compared.

Further objects and advantages will become apparent from a consideration of the ensuing description and drawings.

SUMMARY

In accordance with the present invention, a method is disclosed to survey sets of legal documents and determine common patterns in such documents, particularly common textual patterns. Such common textual patterns are then organized by key attributes such as length and favorability. A graphical user interface is also provided to allow an attorney user to analyze existing legal documents that have otherwise never been surveyed by the System. The System includes recognition functionality so that provisions in the unsurveyed documents can be matched to the most similar common textual patterns determined by the survey process. The attorney user can then edit these existing legal documents to revise their text to more closely match the common textual patterns determined in the survey process, as desired. In particular, the attorney user can choose to revise provisions to reflect the desired length and favorability attributes previously established. Additionally, the attorney user can input further information into the survey databases which the system, or the attorney user, "learns" during an analysis of an existing document. The end result is that an attorney user can receive a set of proposed documents from opposing counsel and revise the documents with great speed and accuracy in a manner not available through the prior art. The prior art focused on the initial drafting of existing documents, which is not useful in revising existing documents, and is also essentially useless, since most law firms already have a wide selection of standard template documents already drafted and ready for use. Furthermore, by breaking down documents into their common textual patterns, or "Core Provisions," computer redline document comparisons can be performed that were previously impossible. Finally, the method can be integrated into a broader shared document management or common editing functionality.

DRAWINGS

Figures

FIG. 1 shows the document view of the graphical user interface of the System, where a user can review and edit the text of a specific document.

FIG. 2 shows a level hierarchy used by the System to organize projects, documents and their contents.

FIG. 3 shows the project view of the graphical user interface of the System, where a user can add or remove specific documents from a given set of documents collected together as a single project.

FIG. 4 shows an excerpt from a sample provision database containing information regarding Core Provisions, which are the common provisions that frequently recur in specific kinds of documents.

FIG. 5 shows one possible means of organizing specific Core Provisions, indicated by Core Provision identification numbers, in a grid-like fashion based on their length and favorability.

FIG. 6 shows an excerpt from a proprietary document comparison method of the System, where a meaningful comparison of two different documents can be generated even if their generally corresponding provisions which are the true subject of the comparison are in different sequential orders in the two documents.

FIG. 7 shows a summary flowchart setting forth a general relationship of the main functionalities of the System.

PREFERRED EMBODIMENT

A. Brief Introduction to Functionality.

In order to more clearly describe in detail the various components of the present invention, it is useful to first present a brief initial summary of the functionality of the invention.

In contrast to the prior art, the present invention disclosed herein does not focus on the initial assembly of documents. The present invention involves the analysis and revision of existing legal documents. Typically the documents being reviewed and revised by a law firm are received from an opposing party's law firm in the course of a deal. In other words, the documents are unfamiliar to the law firm preparing revisions. The present invention disclosed herein (the "System") thus assists the law firm to quickly and accurately revise, or "mark up," the documents that were received.

More specifically, the present invention provides three broad types of functionality:

1. FUNCTION #1: The analysis of large numbers of sample legal documents to see patterns in these sample documents. The System essentially takes a "survey" of lots and lots of deals and documents to find such "patterns." Such patterns could be the kinds of provisions commonly seen in specific documents (referred to herein as "Core Provisions"), the text of such provisions, and the kinds of documents which make up specific deals (e.g., what documents are present in an investment in a company, in a real estate closing or in a bank loan). The patterns could also involve correlations (e.g., provision type #1 is almost always seen along with provisions type #2, but almost never with provision type #3). The analysis results would be stored in computer databases. Functionality is also provided for attorney input (i.e. human input) to expand or otherwise revise the information that the System has "learned."

2. FUNCTION #2: Review of specific proposed draft documents by comparing the proposed draft documents to the information "learned" by the system during Function #1. For example, if a real estate closing is proposed, does the purchase agreement have the proper provisions? Is the text of the provisions that are included the same as the text commonly seen? Is a provision missing? Is a document missing? Is a provision or document present that should be omitted? The System recognizes what is similar to and what is different from the information learned during the survey process. The results of the System's analysis would be made available to its users (i.e. attorneys working at a law firm) through an appropriate interface. Not only would the analysis indicate what portion of the draft documents is typical and what portion is atypical, the System would also suggest corresponding changes.

3. FUNCTION #3: Ability to edit the proposed draft documents. The System's analysis of what needs to be changed in a given document would be linked to document edit functionality. The user of the System could thus run a proposed document through the System, obtain a suggested change (per Function #2) and then implement the change by causing the document to be edited to fully or partially implement the System's proposed change. The System could perform the edit on behalf of the user, or the user could directly edit the text. Once revised, the user is then free to send the revised document back to the opposing counsel. Means of distribution, such as email, could also be incorporated into the System as desired, or an external pre-existing means of distribution could be used. A sample of a graphical user interface, where a document is being analyzed by the System and a suggested edit of one document provision to more closely match a "Core Provision" is being suggested, is shown in FIG. 1.

Thus, the System greatly reduces the amount of time to revise a document and increases the accuracy and completeness of a reviewing law firm's work product.

B. Description of Components of System and Figures.

Now that the broad functionality of the System has been described, the specific components making up the System are set forth below.

1. COMPONENT #1: Hierarchical Structure. As an initial organizational matter, the software System set forth herein involves the establishment of a hierarchy of computer text information into different levels. Each level is processed and treated in a slightly different fashion, as will be further disclosed herein. This hierarchical level organizational approach is COMPONENT #1 of the System.

A chart summarizing some information about the hierarchical levels, as will be further detailed herein, is shown in FIG. 2.

a. LEVEL 1--PROJECTS. Projects (also sometimes called "deals" or "matters") represents the grouping of computer files or documents for a given law firm client that corresponds to a particular project for that client. For example, suppose a law firm has a client that manufacturers auto parts (called "PartsCo"). That client might approach the law firm for assistance in negotiating a joint venture with a third party (called "OtherCo") to enter a new line of business where they together provide consulting services to the auto industry. The work product created or analyzed by the law firm for this matter, as stored on the law firm's computer systems, would constitute a single "Project" for its client PartsCo. If later in the year PartsCo came back to the law firm for help with leasing a manufacturing plant, that lease of the plant would also constitute a new and distinct Project for PartsCo.

b. LEVEL 2--DOCUMENTS. The most important legal documents are generally agreements, but other documents are possible as well. For example, if PartsCo wants to enter into a joint venture with OtherCo, some documents that might be involved in that project are: (i) a certificate of incorporation to form a corporation that will be the joint venture entity that they use to actually provide the consulting services (called "Joint Corp."); (ii) bylaws for Joint Corp.; (iii) resolutions of the board of directors of Joint Corp. appointing officers and issuing shares to PartsCo and OtherCo; and

(iv) a shareholders agreement between PartsCo and OtherCo regarding their intent to operate Joint Corp. as a consulting business. Those four documents would make up that joint venture Project for PartsCo. As another example, if PartsCo wanted to lease a new manufacturing plant, the relevant documents might be: (i) a letter of intent to buy the plant, subject to further due diligence by PartsCo; (ii) an engineer's report conducted as part of the due diligence; (iii) a lease agreement signed by PartsCo where it agrees to lease the plant; (iv) a closing certificate from the lessor confirming that the plant is in good condition to be leased by PartsCo, and (v) a receipt from the lessor evidencing the initial rent payment by PartsCo. Together, these five documents make up the project for PartsCo of leasing the new manufacturing plant. In other words, the collection of all relevant documents make up a given client's Project.

c. LEVEL 3--PROVISIONS. A legal agreement can be broken down into a collection of related provisions. These provisions are generally present in an agreement in numbered fashion, e.g. Section 1, Section 2, etc. Each provision typically pertains to a specific concept and is usually a paragraph long.

For example, a company called Partmaker might enter into an agreement to sell parts to a client. In that agreement, there might be a provision where Partmaker confirms that it will conduct its business in a lawful manner, such as follows:

Section 9.1 Representations of Partmaker. Partmaker hereby represents, warrants and covenants that it shall maintain all rights and licenses necessary for its to enter into this Agreement and fulfill its obligations hereunder, and that it shall perform the services set forth herein in accordance with highest industry standards.

A collection of provisions makes up a document.

d. LEVEL 4--SENTENCES. Generally provisions are about a paragraph long, and so they are made up of multiple sentences. Thus, the collection of relevant sentences make up a paragraph.

e. Level 5--PHRASES. For reasons that will become clearer later on, the next level after sentences is phrases, not words. This is because it will be useful to recognize certain phrases that appear frequently in legal documents, even though the specific sentences in which they are contained may vary.

f. Level 6--WORDS. Clearly, multiple words make up phrases

g. Level 7--LETTERS. Letters (or, perhaps even more generally, alphanumeric characters), are naturally at the bottom most level of the hierarchy. Ultimately, all documents are collections of alphanumeric characters and are essentially stored as such in each law firm's applicable computer database.

2. COMPONENT #2: Database regarding Documents in Projects. Component #2 of the System is a computer database for each kind of project containing information on the documents typically contained in such projects (each, the project's "Document Database").

As an aside, it is worth noting that since computer databases can be combined into one larger single database with different segments of the larger database corresponding to different smaller combined pieces, the distinction of whether they are separate databases, or separate designated sub-databases within a larger databases, may be largely a matter of semantics. References herein to specific databases should thus be understood in this context, e.g. a different database for each kind of project is not materially distinct for purposes of the System from a single database containing multiple portions thereof corresponding to each kind of project.

In addition, what is meant by a particular "kind" of project also merits some initial discussion: Each project is classified in the System for organizational purposes by that project's "type" and "sub-type." For example, a type of project might be a real-estate lease. Different sub-types might then be a residential lease, a manufacturing plant lease, or an office space lease. As another example, a type of a Project might be a merger, where possible sub-types would be a hostile merger, parent-subsidiary merger or a negotiated merger.

The System would allow for individual law firms to customize the applicable types and sub-types of a Project. Thus, a law firm that works on many real-estate matters might have many distinct sub-types for leases, but a law firm that does mostly wills would likely need only a few lease sub-types. It is likely the System would include a certain minimum number of types and sub-types as default categories as well.

Thus, a given "kind" of project would be determined by its type and sub-type. In other words, if in a given year a law firm worked for hundreds of clients, but for four of those clients is worked on mergers that were negotiated mergers, it would have worked on four projects of this "kind." Each such project would be of a "merger" type and a "negotiated merger" sub-type.

Each kind of Project would have its own Document Database. For each kind of Project, the Document Database for such kind would include information on the kinds of documents generally included in that kind of Project. For example, for the merger/negotiated kind of Project, the corresponding Document Database might record information to the effect that the System has taken a survey of all these kinds of Projects it has ever analyzed, and a merger agreement is present in 100% of these kinds of Projects. If might also record that a warrant agreement (i.e. so that warrants can be issued to certain key employees) would be present in 20% of the Projects of this kind that it has surveyed.

This can be very useful information. For example, suppose a law firm is representing a client that wishes to merge into a target company. In that case the target company might send over a draft set of documents to effect the merger, including warrant agreements for the benefit of certain of the target company's key executives. The client's law firm would start a new Project in the System and add these received documents to that Project. The law firm would then use the System to analyze the documents in this new Project and the System would output its analysis. One item the System might note is that warrant agreements, included in the draft documents received, are actually present only 20% of the time, i.e. 80% of the time they are absent. The System would then output a suggestion to remove these agreements from the Project, since they are not standard documents for inclusion. Of course, the attorneys using the System could ignore the suggestion, but the information would be very useful to the client's law firm in terms of reviewing the draft documents and providing a response to the target company.

Note that if the warrant agreements were absent, the System would note this as well--it would indicate that in 20% of these kinds of Projects warrant agreements are present. It could then provide a suggestion that such documents be included. Furthermore, the law firm might want to do this (i.e. include warrant agreements) even if it doesn't immediately appear favorable to its own client. For example, perhaps the target company is otherwise being difficult in the negotiations and the inclusion of these warrant agreements would make the deal more attractive to the target. Thus, the inclusion of the warrant agreements would be a good idea that might help close the deal, where this idea might not have occurred to the law firm without the analysis of the System.

More sophisticated information regarding correlations and anti-correlations would also be included in the Document Databases. For example, in taking a survey of the "mergers/negotiated" kind of Projects, the System might note a correlation that when a warrant agreement is present in a particular kind of Project, 80% of the time warrant certificates are also present (warrant certificates serve as a conventional type of evidence that warrants have been issued, but are not always necessary given the presence of a signed warrant agreement). This information can be used in the process of representing the client as well--perhaps the target company sent over draft warrant certificates as part of the deal. If the law firm knows that such certificates are present 80% of the time, it would be less likely to waste time negotiating time to get them removed. Similarly, the Document Database for the particular Project might record survey information that when warrant agreements are present, option agreements are almost never present (i.e. anti-correlation information). Thus, if both option agreements and warrants agreements were sent over by the other side, the System would suggest that one of these kinds of agreements be deleted.

3. COMPONENT #3: Graphical User Interface. Component #3 of the System is a graphical user interface (or "GUI"). In order for the System's suggestions to be communicated to the user, and to allow for the suggestions to be implemented, the System requires a user interface. In the preferred embodiment this would be a windows based graphical user interface.

a. Different Views. Furthermore, while the GUI would be a single integrated interface, allowing access to nearly all the functionality of the System, the GUI would have different screens, or "views," available to the user depending upon the particular functionality being accessed. In particular, there are two views that are of critical importance to the functioning of the System. These are the "Project View" and the "Document View."

b. Project View. The functionality described above in the description of Component #2 (the Document Database for each kind of Project) would be effected through the Project View of the GUI. This view would show the documents contained within a specific Project and contain "button" or similar type controls for a user to perform functions with respect to the contents of that Project. A sample view of the Project view is shown in FIG. 3.

c. The Document View. While the Project View is clearly of great importance, most of a user's time spent working with the System will be within the Document View. The Document View will show, among other things, the content of an individual document and provide means to revise it. While, strictly speaking, the Document View is a part of Component #3 (i.e., the overall GUI) it is sufficiently important that is also treated separately as an individual component of the overall System, and is thus described in greater detail later in this specification, in the form of Component #5. A sample view of the Document View is shown in FIG. 1.

4. COMPONENT #4: Database regarding Provisions in Documents. COMPONENT #4--Component #4 of the System is a collection of computer databases, one for each "Kind" of document, of the provisions typically contained in that kind of document (each, the document kind's "Provision Database"). The frequently encountered provisions are referred to as "Core Provisions."

The presence of this Provision Database is indicated in level 2 of the summary chart of hierarchical levels shown in FIG. 2. A representation of an excerpt of contents of one sample Provision Database, namely for the Kind of Document identified as being of a type "Software License Agreement" and sub-type "Licensee Favorable with Technical Support Levels" is shown explicitly in FIG. 4.

a. Kind of Document for each Provision Database. Similar to the matter discussed previously as to the "kind" of a Project, it is initially worth noting what is meant by a particular "kind" of document. Just as before, each document is classified by its type and sub-type.

Thus, a given "kind" of document would be determined by its type and sub-type. If in a given year, a law firm worked for hundreds of clients, but during that year it worked on asset purchase agreements of a goods manufacturer, it would have worked on twelve documents of this kind (i.e. of the "asset purchase agreement/goods manufacturer" kind). By analyzing the provisions contained in these twelve documents, the System would essentially take a "survey" to populate the provision database corresponding to this kind of document. This database would contain, among other things, the text of the various provisions commonly seen in this kind of document. Nor would the applicable Provision Database necessarily be limited to the provisions contained in those twelve documents for which the law firm was hired to provide services. As will be further discussed, the law firm could choose to process other sample documents through the System, even if the firm had never before worked on those documents for its own clients (i.e. perhaps it obtained other copies from public filings or other law firms) to further enhance the "knowledge" of its copy of the System in this area. This will be further described below.

b. Kind of provisions. Just as Projects and Documents are broken up into different types and subtypes, as will be further explored, the Provisions within the Provision Databases will also be broken out by type and sub-type.

c. Core Provisions. When the System conducts a survey of documents as part of the survey process, it seems theoretically possible (depending on specific settings of the System) that every provision that the System processes can be recorded in its databases. However, this is no the standard approach. The standard, and more useful approach, takes into account that it is important to distinguish between a provision that the System "sees" once and a provision that it sees over and over again.

In other words, only a provision that is frequently encountered in an identical or substantially identical form would be recorded in the System's database as a "Core Provision." Such common provisions are called "Core Provisions" because they are the core constituents making up so many of the given kind of documents. Note that to the extent that a user would want to include a particular provision in the System's database for classification as a Core Provision, functionality would be provided to accomplish such an inclusion. This is true even if the System has not seen the provision frequently (or even more than once). A provision in a document which is analyzed by the System and recorded in its databases for other purposes, but not accorded status as a "Core Provision" would be deemed to be a "Non-Core Provision."

It is important to understand the reason for the distinction between provisions in general, and Core Provisions specifically. Core Provisions are of unique value as a concept because an underlying approach in the drafting or editing of legal documents is to use the same language over and over again, to the greatest extent possible. This is because if a lawyer drafts something new, it is easy to make a mistake. Such a mistake might even not be apparent on close examination if a lawyer is not well versed in a particular area--subtle changes in phrasing can sometimes have dramatic consequences for tax purposes or other regulatory compliance issues. Thus, lawyers often try to keep the amount of text that is truly "new" to a minimum. Language which is "old" has essentially been vetted over time as acceptable for its intended purpose. Often the language has also acquired particular meaning within the legal community by convention, or even by court decisions which interpret the language when it is contained within a contract which is the subject of a dispute. Using such established language patterns is thus a preferred approach, so to the extent that the System encounters provisions through its analysis of legal documents that are atypical, such atypical provisions are of lesser value. Indeed, their primary value is to bulk up the overall knowledge base of the System to assist in its recognition functionality.

This idea of a Core Provision is probably best illustrated by an example. Suppose a law firm's client, the car manufacturer CarCo, is buying some parts to include in its computers from a supplier, SupplierCo. It is buying these parts pursuant to the kind of agreement identified as "Purchase Agreement/Manufacturer Purchase from Supplier." The following document section, Section 10.1, is a sample provision that could be excerpted from this agreement for the sale of parts from SupplierCo to CarCo:

3.9. Inventory Management. During the Term, SupplierCo shall keep in stock a committed quantity of Parts that, at no time shall be less than the quantity of Parts ordered by CarCo over the prior thirty (30) days. At all times SupplierCo shall ensure that such quantity will be sufficient to meet CarCo's orders as forecasted by CarCo. Inventory shall be maintained on a rotating basis (first in-first out) and no Parts shall be delivered from inventory that are older than six (6) months unless instructed by CarCo. As the above inventory is shipped to CarCo, additional Parts shall immediately replace them in inventory.

If the System sees this type of provision sufficiently frequently in this kind of document, it will learn that this is a common provision in these types of documents, and classify it as a "Core Provision" for this "Kind" of agreement. Then, in the future, when the System analyzes documents that contain provisions which are similar but slightly distinct from an established Core Provision, the System can note the similarity and can, among other things, suggest a revision to make the text match that of the Core Provision.

The Provision Database for each Kind of Document thus includes the Core Provisions, as determined by the System, for that Kind of Document.

d. Identifying the Text of Core Provisions. The foregoing explanation of Core Provisions leads, naturally, to the question of how the System actually identifies the common provisions that should qualify as Core Provisions. Broadly speaking, Core Provisions are identified by the System processing many sample documents, "recognizing" the words patterns that appear frequently in identical or substantially identical form, and then recording such patterns as Core Provisions. More specifically, while there may be different techniques to accomplish this identification, it is anticipated that the preferred embodiment would use a sequence of four steps as follows:

1. Step #1 of Identifying Core Provisions: Import the text of sample Provisions into the System. Each provision contained in a document analyzed by the System would, at least initially, be separately and distinctly imported into the System for analysis. As part of this process, each provision identified in a document would typically have any unique names of parties or other unique identifiers stripped out (at least for these internal analysis purposes) and replaced with standard alternatives to make the provisions more uniform across different samples of the kind of document in question. For example "CarCo" could be replaced internally by "Client" and "Partmaker" by "Counterparty" since the parties would likely have different names in different agreements. Each provision that is identified within the sample documents analyzed would be assigned a unique identification number or other means of identification, such as sequential storage in a computer array.

2. Step #2 of Identifying Core Provisions: Assign Checksums to Each Imported Sample Provision. A "checksum" would be calculated for each provision. While such a checksum could serve as a relatively unique characteristic number for the provision, more importantly provisions that have checksums which are similar in value are themselves probably similar in their text. For example, a simplistic version of such a checksum would be the number of words, or the number of characters, in a given provision. Clearly, similar provisions would have a similar number of words or characters. A somewhat more useful version of such a checksum would be a weighted sum of the characters, e.g. A could count as "1", B could count as "2" and the checksum would be computed by adding up the values of all these characters contained in a given provision. The mathematical difference between such checksums for two different provisions would thus provide a quick quantitative estimate of how similar are those two provisions. The closer the values of the checksums, the more similar the two provisions are likely to be.

3. Step #3 of Identifying Core Provisions: Make More Detailed Similarity Calculations.

(a) Why more detailed similarity is necessary here. Checksums are useful for quick estimates for identifying similar provisions, but they are only estimates. They are, in other words, useful for a quick initial pass to determine which provisions would be most fruitful to compare against one another, but then a more detailed comparison is required to truly determine similarity. Thus, the next step would be for each provision imported into the System to be compared to other provisions of similar checksums (i.e. the choice of provisions to be compared against one another would be based on initial estimates of similarity resulting from the checksum procedure). The similarities between each pair of these provisions would then be more precisely calculated.

A simple example shows that this more precise calculation is necessary because reliance solely on checksums is insufficient in this context: consider the words "mad" and "dam." Each have the same number of letters, and would each have the same checksum, but clearly they are different words.

Ultimately, the purpose of these provision comparisons is to find clusters or groups of similar provisions. This clustering together into sufficiently similar forms then allows the identification of the "Core Provisions."

(b) How to conduct more detailed similarity calculations. The issue then is how to conduct the more precise similarity calculation which is called for. The approach set forth herein involves counting the number of discrepancies between any two provisions being compared. Each character that has to be deleted from provision #1 to make it look like provision #2, and each character that as to be added to provision #1 to make it look like provision #2, would be considered a "deviation." Then a similarity can be calculated and defined by the following formula, where a value of 1.0 means exact similarity between provisions and a value close to 0.0 means no similarity is present: similarity=actual text length/(actual text length+#deviations)

In general, it is anticipated that the "actual text length" to be used would be the smaller of the lengths of the two provisions being compared. This use of the smaller length tends to give greater impact to the number of deviations in the calculations. This can be seen by a simple example: suppose the text "a" was compared with "abcdefghij." It appears there are 9 deviations (i.e. the 9 letters "bcdefghij" need to be added to the first string to obtain the second) here and the two text strings are quite dissimilar. If the larger length of 10 was used, the result of the similarity formula would be 10/(10+9)=10/19 or a little over 0.5. This suggests a moderate amount of similarity (i.e. about halfway between the extreme of 0, or no similarity, and 1, complete identity). Clearly this is not optimal, as the provisions are quite dissimilar. If the text length of the smaller string is used, i.e. 1 since there is just one character in the string "a", the similarity formula provides a result of 1/(1+9)=1/10 or 0.1. This is thus a much more representative result, which properly shows that the two provisions are not really very similar at all.

Note that other similarity measurements are possible. The formula suggested above is merely a reasonably accurate approach with the advantage of being subject to rapid calculation. In addition, arguments could be made that a better choice for the text length to be used in the formula would be an average of the two lengths, not the smaller length. Nevertheless, for most purposes it appears the smaller length provides a more useful result.

Regardless of the precise formula chosen, an issue now remains as to how to specifically calculate the number of deviations. For a simple string comparison, such as "a" and "abcdefghij" the differences in the text is clear. However, the issue is not as clear when comparing two provisions which are longer and more detailed. In that instance, as next discussed, a more detailed approach is necessary.

(c) Use Redline Approach to Calculate Deviations. The need to compare text strings is a commonly encountered task, particularly in legal documents. The visual output of such a comparison is often called a "redline" or "blackline" where new text which is added is shown in a different style or color, such as underlined and bold faced, while text which is deleted is also distinctly indicated (e.g. it can be shown in red font with a "strike through" line in the middle of the deleted text). Occasionally text which is identified as having been moved from one place to another is distinctly indicated as well.

Since the process of creating a redline is sufficiently common, there are likely to be standard computational algorithms to carry out such a task. Nevertheless, for completeness a simple algorithm to accomplish this is proposed herein.

First of all, a minimum possible deviation segment is generally necessary. What is meant by this is that for text of substantial length, it is not meaningful to show a letter by letter set of deviations, as this is confusing and misrepresents the nature of the differences. For example, suppose one is comparing the sentence "The parties agree to meeting and discussion sessions to address future price changes" with "The parties agree to drafting and to discuss matters pertaining to future price charges." If we show added text in all caps, and deleted text in brackets, a useful comparison redline of sentence 2 against sentence 1 would be: "The parties agree to DRAFTING AND TO DISCUSS MATTERS TO ADDRESS [meeting and discussion sessions pertaining to] future price CHARGES [changes]." This resulting redlined sentence clearly and distinctly demonstrates the changes in a useful manner.

Contrast this with the following possible redline output: "The parties agree to DRAFT[meet]ing and TO discuss[ion] MATTER[session]s PERTAINING[to address] future price chaR[n]ges." If you read this alternative output through, letter by letter, you will find that it is in fact a correct redline. However, it is confusing and less useful than the former result. This is because the "resolution" of the changes is too fine--a user of a redlining algorithm generally does not want to see potential letter by letter changes. The changes need to be of a larger size so as to be appropriately grouped together and displayed in the redline format.

Thus, an appropriate redlining algorithm would break up the text for comparison into blocks of certain minimum sizes, either word by word, or a minimum character size (this could be adjustable, likely 5 or 10 characters would be appropriate). Once the two text strings are broken up into these blocks, the algorithm involves searching for identical blocks, and then finding the largest consecutive sequence of identical blocks. Once found, this area of the text would serve as an initial location on which to "build out" the resulting redline. This largest sequence of identical text would be shown as unchanged in the redline output, and then the algorithm would involve walking forward and backward from that point, indicating whether whole blocks are to be marked as added, deleted, or unchanged. A refinement of this approach would be to consider whether there are other large sequences of identical text. Again the criteria for qualification as such a large sequence could be adjustable (perhaps 25-30 characters would be a minimum size). These other large sequences of identical text, although perhaps not the single largest such sequence, would also be recorded in the redline algorithm as being unchanged. Then the "walk forward" and "walk backward" approach for comparison would involved showing the blocks of text between such identical sequences as either deleted or added.

Again, it is worth noting that foregoing approach is merely one possible means of implementing a redline comparison (both for output to a user when necessary, and for internal calculations in the System considered here). More sophisticated algorithms may be currently available or later developed. The present redline algorithm is only one possible embodiment.

In terms of internal use within the System, as described in this section, the redline would be used to compare possible provisions and identify deviations between them. Once the deviations are determined, they can be used in the similarity formula presented previously to obtain a more accurate quantitative assessment of how similar are the two provisions being compared.

4. Step #4 of Identifying Core Provisions: Identify a group of identical common provisions. Once each provision has initially been compared against all others (first by rough checksum procedure, and then by more refined similarity calculations on a smaller subset identified in the first pass) it is possible to identify groups or clusters of provisions that are identical or substantially identical (note, as an aside, that the checksum procedure used here is, strictly speaking, not required--it is simply a computationally efficient means to quickly make a first pass comparison among a large number of provisions. It would be possible to make direct comparisons without using checksums first but it would be a more time consuming approach.

The anticipated procedure is best explained by an example. Suppose that many sample documents of a particular kind are analyzed by the System, and of the many hundreds of provisions it processes it recognized 20 different provisions with a checksum value in range of 400 through 420. The System thus separately analyzes this group of provisions in this checksum range and calculates all the similarities among this group using the more precise similarity formula and procedures discussed above. It starts with one particular provision (perhaps the one with the most common or representative checksum in the range) and, using the redlining and similarity formula approach discussed above, calculates its similarity to all the other provisions in the checksum range. For other provisions which are identical, the similarity formula should give a result of 1.0. For other provisions which are very close to identical, the formula should give a result close to 1.0, such as 0.99 or 0.98. It is anticipated that the System would have an adjustable threshold to make the determination whether provisions are sufficiently similar to be considered identical. For example, a cut-off of 0.97 might be used, and then all provisions with a calculated similarity of 0.97 or higher would be considered identical for purposes of this analysis. Of course, a value of 1.0 could also be used and then no discrepancies at all would be acceptable for purposes of this analysis.

Let us suppose that on the first pass of this analysis that, on comparing the first provision to the others in this overall group of 20, the System decides that eight of the other provisions are identical to the first one. Then there would be a total of nine (i.e. the eight identified and the original provision used for comparison) that would be considered identical. These identical provisions would be separated out from the overall group of 20 and identified as examples of a "Core Provision." Likely the single most representative example of the nine (perhaps the one then calculated to be most similar to all the others, or the one with the most typical or average checksum) would be identified formally as the official version of the text of this "Core Provision."

The process would then be repeated on the remaining 12 provisions. Perhaps, by way of example, two more clusters of 4 provisions and 3 provisions, respectively, would be identified as other Core Provisions. That would ultimately result in 5 isolated provisions that are not identical to any others, and three distinct Core Provisions.

e. Identifying the Kind of Core Provisions. Once Core Provisions have been recognized, and their text identified and stored within the System, the Core Provisions need to be categorized into their type and sub-type. In other words, once the text of a Core Provision has been established, the "kind" of Core Provision needs to be determined (i.e. it's type and sub-type). This categorization of Core Provisions into different kinds will be useful to the attorney users of the System, as will later become even more apparent.

Determining the type of the Core Provision can be greatly assisted by the caption or title of the provision. Generally the caption of a potential Core Provision would be "stripped out" in the context of trying to identify common provisions that make up a Core Provision (much as unique client names would be replaced by generic alternatives). This is done to make the provisions more uniform and facilitate their comparison. However, despite that fact that the information is stripped away for purposes of this internal analysis, it can still be retained separately, such as in a text array corresponding to the provision. For example, in the sample Section 10.1 identified above, the caption "Inventory Management" would be stripped away, but retained in connection with the text as the corresponding caption.

Once certain provisions are identified as Core Provisions, their corresponding captions can be compared. It is likely that these caption will be identical, or nearly so. The most common or representative version of the caption can thus be chosen as an initial default estimate of the type of provision. An initial default estimate of the sub-type can simply be a generic heading, such as "general."

Once these initial estimates for the type and sub-type have been made, refinements are possible. The nature of these refinements will be discussed in greater detail in related contexts later herein, but briefly, they are (i) seeking user input to confirm or revise the initial type and sub-type estimate; and (ii) comparisons against other existing Core Provisions, either within the Provision Database for this kind of document or even in other Provision Databases, in order to provide more precise estimates of type and sub-type.

f. Properties of Core Provisions.

Once Core Provisions have been identified, and categorized into their kind (i.e. their type and sub-type) then, in order to maximize their usefulness as part of the System, certain attributes or properties of such Core Provisions must be identified.

As will become clearer from further discussion herein, three of the most important properties of a Core Provision are: (i) checksum; (ii) length and (iii) favorability. Indeed, length and favorability are central aspects of a Core Provision. Functionality to search through Core Provisions based on length and favorability, in order to edit a document's provisions to match the desired Core Provisions, is set forth in FIG. 1 as arrow button control complex 100.

Checksum calculation for a Core Provision is a straight-forward matter. It is computed as checksums have previously described, i.e. a weighted sum of all the characters in the text of a Core Provision.

Length is also a straightforward matter. It is a simple matter for the System to calculate the overall length of the text string making up a Core Provision and record this information.

Favorability require more analysis. The concept underlying this property is that certain provisions are more favorable to the client than others. In order to make provisions subject to analysis and retrieval based on favorability, a number representing the provision's favorability (as measured when contained in the applicable kind of document) needs to be recorded for the each Core Provision. While any numerical scale could be used, it is anticipated that a traditional scale along the lines of "1-10" would be the most natural. More specifically, a ranking of "10" would be the most favorable a provision of a given type and sub-type could be, within the applicable kind of document. A ranking of "0" would be neutral. And a ranking of "-10" would be the most unfavorable a provision could be.

However, assigning such a number can be a somewhat subjective decision and it would be quite challenging for a typical software program to make this assessment. Accordingly, a number of different techniques would be useful in this context. Four such techniques are anticipated:

(1) User Input of Favorability Number. The most useful one is the most direct: the System requests user input as to a Core Provision's favorability. By requesting attorneys that are using the System provide the favorability number, the System obtains the value of the experience of the attorneys. Furthermore, the information, as it later appears in other uses of the System, will be consistent with the expectations of the user attorneys since it originated with them.

Note that the System needs to be able to accommodate usage by multiple attorneys within a law firm. To the extent that the attorneys share common databases, means would be provided to con

Back to patents
transparent gif
transparent gif