The clearance model and the post-audit model. In the world, two models for implementing e-invoicing have become widespread. However, not all territories adopt the same model. More and more countries are legislating the use of e-invoicing, both in the B2G, B2B and B2C spheres. The use of e-invoicing continues to grow worldwide. You can find more about this technique in one of our previous blogs.Tiempo de lectura: 2 minutos Characteristics and benefits of the main e-invoicing models. At the same time, we make sure that the anchor and the negative are far away from each other. The anchor and the positive sample are invoices of the same template and we train the network in a way that they are close to each other in the embedding space. The main idea is to train the neural network with groups of three samples: one anchor, one positive and one negative. To learn the mapping from invoice to embedding space, we use triplet learning. This means we can use a distance metric (like cosine or euclidean distance) to measure if two invoice are alike. In this case we want to translate invoice documents to a low dimensional embedding space, where invoice of the same supplier/template are close together. to translate words to a vector space (Word2Vec). In most cases, an embedding places similar inputs close to each other in embedding space. This makes it easier to use as an input for machine learning, or to capture underlying semantics. To compute the layout-similarity of two invoice documents, we introduced invoice-embeddings.Īn embedding is a low-dimensional space into which you translate complex data points. We automatically select which invoices to learn from to keep improving.Ĭomputing if a document has lookalikes in our dataset is not that easy: documents can be scanned, rotated and can contain different languages. Our clients are sending us automated feedback through our IxorDocs Recognition API. This is the best way to keep our dataset balanced. Preferably, we want to select documents which have a template that is not contained in our dataset. The solution is to carefully select which invoices are needed for training. Secondly, the training dataset needs to stay curated and balanced if the same invoice template has considerably more examples in the dataset, the trained model will be skewed and overfitting would become more likely. We simply cannot use all this feedback data for training because of its high volume. Using this feedback to “feed back” into the model may seem straightforward, but it needs to be handled with care:įirst of all, correctly labeling documents for training takes a lot of time and resources. This makes it possible to track if fields were changed by the user after detection (which is most probably because the user corrected a recognition error). We recently updated our recognition API to make sure clients can send feedback. At IxorThink, we have kept improving this model: adding recognition fields, adding pattern recognition for addresses, classification for customer and supplier information etc. In one of our previous blogs we talked about how we use a convolutional neural network for invoice recognition. Using triplet learning for improvement of our invoice recognition model
0 Comments
Leave a Reply. |