九九热最新网址,777奇米四色米奇影院在线播放,国产精品18久久久久久久久久,中文有码视频,亚洲一区在线免费观看,国产91精品在线,婷婷丁香六月天

歡迎來到裝配圖網! | 幫助中心 裝配圖網zhuangpeitu.com!
裝配圖網
ImageVerifierCode 換一換
首頁 裝配圖網 > 資源分類 > PPT文檔下載  

CollationinICU[共45頁]

  • 資源ID:53008028       資源大小:480KB        全文頁數(shù):45頁
  • 資源格式: PPT        下載積分:12積分
快捷下載 游客一鍵下載
會員登錄下載
微信登錄下載
三方登錄下載: 微信開放平臺登錄 支付寶登錄   QQ登錄   微博登錄  
二維碼
微信掃一掃登錄
下載資源需要12積分
郵箱/手機:
溫馨提示:
用戶名和密碼都是您填寫的郵箱或者手機號,方便查詢和重復下載(系統(tǒng)自動生成)
支付方式: 支付寶    微信支付   
驗證碼:   換一換

 
賬號:
密碼:
驗證碼:   換一換
  忘記密碼?
    
友情提示
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站資源下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。
5、試題試卷類文檔,如果標題沒有明確說明有答案則都視為沒有答案,請知曉。

CollationinICU[共45頁]

Collation in ICUMark DavisChief SW Globalization ArchitectIBMDublin, Ireland 1/26/202221st International Unicode Conference2What is ICU?Premier Unicode Enablement LibraryOpen-source: non-viral licenseFull-Featured, Cross-PlatformC, C+, Java APIsCollation, Charset Conversion, Resources, Boundaries, Calendars, Transforms (case, norm., translit., ), Format/Parse (dates, times, msgs, nums., curr., ), Unicode strings/propshttp:/ Ireland 1/26/202221st International Unicode Conference3Collation = Sorting OrderHow hard can it be?A B C ComplicationsLanguages are complex and variedUnicode is a big set of charactersPerformance is crucialDublin, Ireland 1/26/202221st International Unicode Conference4Varies By:Language Swedish: z German: zUsage Dictionary: f of Telephone: of fCustomizations A a a AVersioning Fixes New Gov. Stds New CharactersDublin, Ireland 1/26/202221st International Unicode Conference5Strength Levels: L1, L2, L31.Base characters: a b2.Accents: as s atignored if there is a L1 character difference3.Case: ao Ao aignored if there is a L1 or L2 difference4.Punctuation: ab a-b aBignored* if there is a L1, L2, or L3 difference5.Tie-breaker: NFD code point orderDublin, Ireland 1/26/202221st International Unicode Conference6Context SensitivityContractions H Z, but CZ CHExpansions OE OFBoth Dublin, Ireland 1/26/202221st International Unicode Conference7Canonical EquivalenceA + x + . + x + + .u + + . + u + . + u + + .Dublin, Ireland 1/26/202221st International Unicode Conference8OdditiesNormal accentscote cot cte ct first accent difference determines orderFrench accentscote cte cot ct last accent difference determines orderLogical Order Exception (Thai, Lao) sorts like Dublin, Ireland 1/26/202221st International Unicode Conference9Merging Database FieldsF1 = LastName, F2 = FirstNameSequentialWeak 1stMergedF1, then F2F1 (L1), F2L1, L2, L3diSilva, JohndiSilva, Freddi Silva, Johndi Silva, Freddsilva, Johndsilva, FreddiSilva, Johndsilva, Johndi Silva, Johndi Silva, FreddiSilva, Freddsilva, FreddiSilva, Johndi Silva, Johndsilva, JohndiSilva, Freddi Silva, Freddsilva, FredDublin, Ireland 1/26/202221st International Unicode Conference10CustomizationsParameters that change collation behaviorChoice of language (locale)Runtime choicesExamples to followDublin, Ireland 1/26/202221st International Unicode Conference11Parametric CustomizationsStrengthBaseBase+AccentBase+Accent+ Case&c.Case: A a a APunctuation: di Silva diSilva diSilva di SilvaDublin, Ireland 1/26/202221st International Unicode Conference12Punctuation / Spaces (Alternates)Base Characterdi silvadi SilvaDi silvaDi SilvaDickensdisilvadiSilvaDisilvaDiSilvaIgnoreableDickens di silvadisilvadi SilvadiSilvaDi silvaDisilvaDi SilvaDiSilvaDublin, Ireland 1/26/202221st International Unicode Conference13Extended CustomizationsUser-defined“&” “ampersand”Merging tailoringsIranian + FrenchScript Orderb ? b ?Numbers A-10 A-2 A-2 A-10Dublin, Ireland 1/26/202221st International Unicode Conference14Other Uses: String SearchingMatch according to locale conventions:e.g. w = v for SwedishUse collation options:ignore case, accentother customizationsDublin, Ireland 1/26/202221st International Unicode Conference15Other Uses: Selection BoundsReturn all records where:Zo name ZormaIgnore case / accentsZoe / zoe / Zo / zo / Dublin, Ireland 1/26/202221st International Unicode Conference16UCAUTS #10: Unicode Collation AlgorithmLevels, Expansions, Contractions, Punctuation, Canonical Equivalence, etc.Default ordering: all Unicode code pointsProvides for tailoring to given languagesAlso see: The Unicode Standard, 5.17: Sorting and SearchingAligned with ISO 14651Dublin, Ireland 1/26/202221st International Unicode Conference17APIsString CompareSort KeysString SearchSelection BoundariesMerged sortkeysDublin, Ireland 1/26/202221st International Unicode Conference18Sort KeysTransform string into series of bytes which will binary-comparea:06 C3 01 20 01 02 00A:06 C3 01 20 01 08 00:06 C3 01 20 32 01 02 02 00ab:06 C3 06 D7 01 20 20 01 02 02 00b:06 D7 01 20 01 02 00 Level 1 Level 2 Level 3 Dublin, Ireland 1/26/202221st International Unicode Conference19String Compare vs. Sort KeysSame results in either caseSC faster for single comparisons average 5 to 10 times!SK faster for multiple comparisons index once binary compare many timesDublin, Ireland 1/26/202221st International Unicode Conference20String SearchNave Approachkey matches in target at iff target.substring(x, y) keyBoundary ComplicationsIgnorables: “a” matches in “(a)”? at & & & ?Contractions: “c” matches in “churo”?Normalization: “” matches in “a”?Dublin, Ireland 1/26/202221st International Unicode Conference21WARNING 1: BasicsNot aligned with character set or repertoireLatin-1: Swedish and German sorting differsNot code point (binary) orderBinary:Z a v aSwedish:v wNot a property of strings: same DatabaseSwedish user: views/selectGerman user: views/selectsDublin, Ireland 1/26/202221st International Unicode Conference22WARNING 2: OperationsOrder not preserved under concatenation / substringingx y xz yzx y zx zyxz yz x yzx zy x yDublin, Ireland 1/26/202221st International Unicode Conference23WARNING 3: DependenceCollation is a relation over stringsSort keys embody part of that relationThus, comparing sort keys from different tailorings (or parameters) gives undefined results.Dublin, Ireland 1/26/202221st International Unicode Conference24WARNING 4: StabilityStable SortRecords with equal comparison come out in original orderProperty of algorithm, not comparisonSemi-Stable Comparisonx y x yProperty of comparison, not algorithmDegrades performanceDoesnt do what people think (or really want)!Dublin, Ireland 1/26/202221st International Unicode Conference25ICU/Java Collation ArchitectureL1-3, contractions, expansions, Locale tailoringsFully rule-based specificationArbitrary runtime user customizations & ? = question mark & $ = dollar sign & z georgeDublin, Ireland 1/26/202221st International Unicode Conference26JavaSun licensed and includes an early version of ICU collation in JavaICU version:Dramatically fasterMuch reduced memory consumptionHalved sort-key lengthMany additional featuresDublin, Ireland 1/26/202221st International Unicode Conference27ICU Collation IFull UCA complianceFull supplementary character supportSolid performanceSmall Sort-KeysSmall Memory FootprintDublin, Ireland 1/26/202221st International Unicode Conference28ICU Collation IIParametric controlTailorable to any languageSimultaneous Multiple VersionsMerging Sort KeysSelection BoundsDublin, Ireland 1/26/202221st International Unicode Conference29Memory-Mappable, Fast InitOld: separate allocationsNew: offsets within mem-mapDublin, Ireland 1/26/202221st International Unicode Conference30Delta Tailoring:Minimize Memory UsageFRfoundUCA:One Copy;80KnotfoundcodenotsynthesizeinputoutputDublin, Ireland 1/26/202221st International Unicode Conference31Simultaneous Multiple VersionsPrograms can link against different versions of ICU, simultaneously.Preserves exact binary order over time.ApplicationNewDBOldDBICU2.1ICU2.0Dublin, Ireland 1/26/202221st International Unicode Conference32PerformanceChecks for identical prefixes firstInvokes normalization only when neededFast paths for common casesMinimizes comparison timeMinimizes sort key lengthDublin, Ireland 1/26/202221st International Unicode Conference33Sort Key CompressionCommon weights are 1-bytePrimary, secondary, tertiary, quarternarySequences are compressedUTF-16 Values for “Mrk Davis” (22 bytes)004D 00E4 0072 006B 0020 0044 0061 0076 0069 0073 0000Sort Key (L3, ignorable punctuation - 19 bytes)2F 17 39 2B 1D 17 41 27 3B 0177 96 0A 018F 80 8F 07 00Dublin, Ireland 1/26/202221st International Unicode Conference34ICU vs. Windows, glibcFull UCA!String comparison: comparable speed -20% . +400%Sort keys: much shorter 50%Warning: speed comparisons are approximate!Depends on data, parameters, features, CPUDublin, Ireland 1/26/202221st International Unicode Conference35More InformationICUhttp:/ Documenthttp:/ Version of these slideshttp:/ Dublin, Ireland 1/26/202221st International Unicode Conference36Q & ADublin, Ireland 1/26/202221st International Unicode Conference37Fast C or D (FCD)Accepts all NFD, most NFC, without normalizationXFCD NFC NFDA- ringYYAngstromYA + ringYYA + graveYYA-ring + graveYA + cedilla + ringYYA + ring + cedillaA-ring + cedillaYDublin, Ireland 1/26/202221st International Unicode Conference38Backup SlidesNot used in the presentation, except in response to questionsDublin, Ireland 1/26/202221st International Unicode Conference39Performance: CodingAvoided unnecessary function calls.Example: strlen too expensive!Avoided use of objectsRewrote core code in CC+ API wraps the C core code.Fast-pathed common casesUsed stack memory buffers(with expansion if necessary)Made inner loops as tight as possibleDublin, Ireland 1/26/202221st International Unicode Conference40WARNING 5: Math. RelationS = Unicode StringsReflexivea S: a aAntisymmetrica, b S: a b & b a a = bTransitivea, b S: a b & b c a cTotala, b S: a b b aDublin, Ireland 1/26/202221st International Unicode Conference41Identical PrefixesSorting / Searching DatabasesMany comparisons to “close” stringsCheck initial prefixes with binary compareDrop into collation loop at first differenceComplicationDublin, Ireland 1/26/202221st International Unicode Conference42Initial Prefix ComplicationNeed to backup if in “bad” position:TypeContraction (Spanish)chNormalizationaSurrogate Pair ExampleDublin, Ireland 1/26/202221st International Unicode Conference43Fractional UCAFractional weights for compressionGaps for tailoring, future UCA additionsOnly stores differences in tailoring fileReduces memory footprinta ba bprimary0861 0865 0871 08751718 60 18 6619secondary2020202003030303tertiary0202020203030303UCAFrac. UCADublin, Ireland 1/26/202221st International Unicode Conference44Exceptional ValuesNormal weight storageP P P P P P P P P P P P P P P P S S S S S S S S C C T T T T T T 1 116b8b6bF F F F T T T T d d d d d d d d d d d d d d d d d d d d d d d d4b4b Tag24 bit dataSpecial Weight StorageNOT_FOUND, EXPANSION, CONTRACTION, THAI, Dublin, Ireland 1/26/202221st International Unicode Conference45Minimal MemoryFlat-file (memory mapped)speeds initializationreduces memory footprint(next slide)Delta TailoringSingle copy of UCA (80K)Small delta files per locale

注意事項

本文(CollationinICU[共45頁])為本站會員(1528****253)主動上傳,裝配圖網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知裝配圖網(點擊聯(lián)系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。




關于我們 - 網站聲明 - 網站地圖 - 資源地圖 - 友情鏈接 - 網站客服 - 聯(lián)系我們

copyright@ 2023-2025  zhuangpeitu.com 裝配圖網版權所有   聯(lián)系電話:18123376007

備案號:ICP2024067431-1 川公網安備51140202000466號


本站為文檔C2C交易模式,即用戶上傳的文檔直接被用戶下載,本站只是中間服務平臺,本站所有文檔下載所得的收益歸上傳人(含作者)所有。裝配圖網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對上載內容本身不做任何修改或編輯。若文檔所含內容侵犯了您的版權或隱私,請立即通知裝配圖網,我們立即給予刪除!