ruạṛ
a �u:jOO � @ s� d dl Z d dlmZmZ d dlmZmZmZmZ zd dl m Z W n eyV eZ Y n0 ddl mZmZmZmZ ddlmZmZmZmZ ddlmZ dd lmZmZ dd lmZmZmZm Z m!Z!m"Z" e �#d�Z$e �%� Z&e&�'e �(d�� de)e*e*e+ee ee e,e,ed� dd�Z-dee*e*e+ee ee e,e,ed� dd�Z.d e e*e*e+ee ee e,e,ed� dd�Z/d!e e*e*e+ee ee e,ed�dd�Z0dS )"� N)�basename�splitext)�BinaryIO�List�Optional�Set)�PathLike� )�coherence_ratio�encoding_languages�mb_encoding_languages�merge_coherence_ratios)�IANA_SUPPORTED�TOO_BIG_SEQUENCE�TOO_SMALL_SEQUENCE�TRACE)� mess_ratio)�CharsetMatch�CharsetMatches)�any_specified_encoding� iana_name�identify_sig_or_bom� is_cp_similar�is_multi_byte_encoding�should_strip_sig_or_bomZcharset_normalizerz)%(asctime)s | %(levelname)s | %(message)s� � 皙�����?TF) � sequences�steps� chunk_size� threshold�cp_isolation�cp_exclusion�preemptive_behaviour�explain�returnc 1 C s� t | ttf�s td�t| ����|r>tj}t�t � t� t� t| �} | dkr�t� d� |rvt�t � t� |prtj� tt| dddg d�g�S |dur�t�td d �|�� dd� |D �}ng }|dur�t�td d �|�� dd� |D �}ng }| || k�rt�td||| � d}| }|dk�r:| | |k �r:t| | �}t| �tk } t| �tk}| �rlt�td�| �� n|�r�t�td�| �� g }|�r�t| �nd} | du�r�|�| � t�td| � t� }g }g }d}d}d}t� }t| �\}}|du�r|�|� t�tdt|�|� |�d� d|v�r.|�d� |t D �]�}|�rP||v�rP�q6|�rd||v �rd�q6||v �rr�q6|�|� d}||k}|�o�t|�}|dv �r�|�s�t�td|� �q6zt|�}W n, t t!f�y� t�td|� Y �q6Y n0 zr|�r<|du �r<t"|du �r | dtd�� n| t|�td�� |d� n&t"|du �rL| n| t|�d� |d�}W nb t#t$f�y� } zDt |t$��s�t�td|t"|�� |�|� W Y d}~�q6W Y d}~n d}~0 0 d}|D ]}t%||��r�d} �q�q�|�rt�td||� �q6t&|�sdnt|�| t| | ��}|�oD|du�oDt|�| k } | �rZt�td|� tt|�d �}!t'|!d �}!d}"d}#g }$g }%|D �]�}&|&| | d! k�r��q�| |&|&| � }'|�r�|du �r�||' }'z|'j(||�r�d"nd#d$�}(W nR t#�y: } z8t�td%|t"|�� |!}"d}#W Y d}~ �q6W Y d}~n d}~0 0 |�r�|&dk�r�| |& d&k�r�t)|d'�})|�r�|(d|)� |v�r�t&|&|&d d(�D ]T}*| |*|&| � }'|�r�|du �r�||' }'|'j(|d"d$�}(|(d|)� |v �r� �q�q�|$�|(� |%�t*|(|�� |%d( |k�r|"d7 }"|"|!k�s,|�r�|du �r� �q6�q�|#�s�|�r�|�s�z| td)�d� j(|d#d$� W nR t#�y� } z8t�td*|t"|�� |�|� W Y d}~�q6W Y d}~n d}~0 0 |%�r�t+|%�t|%� nd}+|+|k�s�|"|!k�r`|�|� t�td+||"t,|+d, d-d.�� |dd| fv �r6|#�s6t| ||dg |�},|| k�rH|,}n|dk�rX|,}n|,}�q6t�td/|t,|+d, d-d.�� |�s�t-|�}-nt.|�}-|-�r�t�td0�|t"|-��� g }.|dk�r�|$D ],}(t/|(d1|-�r�d2�|-�nd�}/|.�|/� �q�t0|.�}0|0�rt�td3�|0|�� |�t| ||+||0|�� || ddfv �r~|+d1k �r~t� d4|� |�rlt�t � t� |� t|| g� S ||k�r6t� d5|� |�r�t�t � t� |� t|| g� S �q6t|�dk� rt|�s�|�s�|�r�t�td6� |� rt� d7|j1� |�|� nd|� r |du � sD|� r:|� r:|j2|j2k� sD|du� rZt� d8� |�|� n|� rtt� d9� |�|� |� r�t� d:|�3� j1t|�d � n t� d;� |� r�t�t � t� |� |S )<ae Given a raw bytes sequence, return the best possibles charset usable to render str objects. If there is no results, it is a strong indicator that the source is binary/not text. By default, the process will extract 5 blocs of 512o each to assess the mess and coherence of a given sequence. And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will. The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page but never take it for granted. Can improve the performance. You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that purpose. This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32. By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain' toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging. Custom logging format and handler can be set manually. z4Expected object of type bytes or bytearray, got: {0}r z<Encoding detection on empty bytes, assuming utf_8 intention.�utf_8g F� Nz`cp_isolation is set. use this flag for debugging purpose. limited list of encoding allowed : %s.z, c S s g | ]}t |d ��qS �F�r ��.0�cp� r. �I/opt/bart/bart_venv/lib/python3.9/site-packages/charset_normalizer/api.py� <listcomp>] � zfrom_bytes.<locals>.<listcomp>zacp_exclusion is set. use this flag for debugging purpose. limited list of encoding excluded : %s.c S s g | ]}t |d ��qS r) r* r+ r. r. r/ r0 h r1 z^override steps (%i) and chunk_size (%i) as content does not fit (%i byte(s) given) parameters.r z>Trying to detect encoding from a tiny portion of ({}) byte(s).zIUsing lazy str decoding because the payload is quite large, ({}) byte(s).z@Detected declarative mark in sequence. Priority +1 given for %s.zIDetected a SIG or BOM mark on first %i byte(s). Priority +1 given for %s.�ascii> �utf_32�utf_16z[Encoding %s wont be tested as-is because it require a BOM. Will try some sub-encoder LE/BE.z2Encoding %s does not provide an IncrementalDecoderg ��A)�encodingz9Code page %s does not fit given bytes sequence at ALL. %sTzW%s is deemed too similar to code page %s and was consider unsuited already. Continuing!zpCode page %s is a multi byte encoding table and it appear that at least one character was encoded using n-bytes.� � � �ignore�strict)�errorszaLazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %s� � ���g j�@z^LazyStr Loading: After final lookup, code page %s does not fit given bytes sequence at ALL. %szc%s was excluded because of initial chaos probing. Gave up %i time(s). Computed mean chaos is %f %%.�d � )�ndigitsz=%s passed initial chaos probing. Mean measured chaos is %f %%z&{} should target any language(s) of {}g�������?�,z We detected language {} using {}z.Encoding detection: %s is most likely the one.zoEncoding detection: %s is most likely the one as we detected a BOM or SIG within the beginning of the sequence.zONothing got out of the detection process. Using ASCII/UTF-8/Specified fallback.z7Encoding detection: %s will be used as a fallback matchz:Encoding detection: utf_8 will be used as a fallback matchz:Encoding detection: ascii will be used as a fallback matchz]Encoding detection: Found %s as plausible (best-candidate) for content. With %i alternatives.z=Encoding detection: Unable to determine any suitable charset.)4� isinstance� bytearray�bytes� TypeError�format�type�logger�level� addHandler�explain_handler�setLevelr �len�debug� removeHandler�logging�WARNINGr r �log�join�intr r r �append�setr r �addr r �ModuleNotFoundError�ImportError�str�UnicodeDecodeError�LookupErrorr �range�max�decode�minr �sum�roundr r r r r5 �fingerprint�best)1r r r r! r"