学习日志 211228

2021-12-28 17:32 作者:mayoiwill 0人读过 | 我要投稿

elasticsearch基础学习

========================

# 211228

# 扩容pvc

- 参考

- https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-volume-claim-templates.html

- 在k8s描述文件中增加pvc段落

- 重新apply -f 发现报错不允许修改

- 删除现有集群

- kubectl delete elasticsearch quickstart

- 重新apply -f 成功

- 检查pvc

- kubectl get pvc

- 已扩充到10Gi

# 索引基本使用

- 参考 https://learnku.com/docs/elasticsearch73/7.3/index-some-documents/6450

## 创建索引

- 采用直接PUT一个doc的方式

- 当该索引不存在时会自动创建索引

- 指令

```

PUT /customer/_doc/1

{

"name": "John Doe"

}

```

- 指令解释

- PUT 新增

- /customer 是索引名(类似于表名) 比如我们改为 test_doc

- /_doc 这个是内置接口指针对索引的doc进行操作

- /1 表示操作的id是1

- 内容是一个json

- 可以有多个字段

- 字段能否是多级的?

- 结果

- 创建了一个名为 test_doc 的索引

- 在这个索引中增加了一条id为1的数据

- 该数据有id 和 name两个字段

- 系统自动为该索引创建了mapping

- 见下

- 获取数据

- `GET /test_doc/_doc/1`

- 采用_search接口

```

GET /test_doc/_search

{

"query": {

"match_all": {}

}

```

- 具体_search内置接口的语法见下面mapping部分

## 操作数据

- 更新数据

- 重新PUT数据即可

- 检查

- 使用GET

- 内容已更新

- _version内置字段也更新为2

- 删除数据

- `DELETE /test_doc/_doc/1`

- 检查

- GET

- found : false

- 走_search

- hits.total.value = 0

- 批量插入

- _bulk接口

```

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"

```

- 这个不试了可能需要加-u指定用户密码

- 检查各个索引占用的空间等情况

- `GET /_cat/indices?v`

- 有一些内置的数据

- 包括kibana的一些数据

## 检查索引

- 基本搜索

- 参考 https://learnku.com/docs/elasticsearch73/7.3/start-searching/6451

- "from" "size" 可以分页

- 聚合数据

- 参考 https://learnku.com/docs/elasticsearch73/7.3/analyze-results-with-aggregations/6452

- 类似于 SQL 的 group by

## 修改mapping

- 参考

- https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-dynamic

- 修改mapping

- update mapping API

- https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html

- 采用和mapping类型相符的搜索条件

- doc字段类型是text 可以使用match

```

GET /test_doc/_search

{

"query": {

"match": {

"doc": "test"

}

```

- te - 不行原因是按英语做tokenize 没有te这个token

- test 1 - 可以

- 使用match_phrase

- test 可以

- test 1 - 不行没有连续出现

- doc.keyword 类型 keyword

- test - 不行 keyword这里必须全文匹配

## mapping的动态模版

## 显示指定mapping 索引类型

## 运行时字段

- 把mapping的dynamic设为runtime

- 这样新增字段都是runtime了避免索引因为新增字段而变大

- 默认是true, 则新增字段都会建索引

- 索引建出来后 properties字段就不能删了

- 只能做reindex

- 删除 mapping 中 runtime 字段

```

PUT my-index-000001/_mapping

{

"runtime": {

"day_of_week": null

}

```

- 结果_source还在的

- 但是该字段已经不能作为match条件使用了

- 加回来 PUT mapping

```

PUT /test_doc/_mapping

{

"runtime": {

"author": {

"type":"keyword"

}

```

## 在_search里做runtime_mapping

## 给runtime字段建索引

- 参考

- https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-indexed.html

- runtime字段实际上是不被索引的

- 建个新的索引

- 把原来runtime字段的定义复制到新索引的properties里

- 重新_bulk数据进去

- 其它加数据的方式?

- 旧索引删了

## grok pattern

- 可以识别`'%{COMMONAPACHELOG}'`

# 分析器

## 字段类型

- 重点关注text大类

- text

- annotated-text

- completion

- search_as_you_type

- token_count

- Document ranking type

- dense_vector

- sparse_vector

- rank_feature

- rank_features

- 特殊类型 geo 地理位置索引?

- 所有值都可以是数组

- 字段取值可以是 `"aaa"` 也可以是 `["aaa","bbb"]`

- 如果查询条件是match "aaa" 上述都成立

- 这个能力适合以下场景

- 论文表 + 作者表

- 论文:作者是 1:N 有个论文_作者关系表

- 进索引后, 作者字段就直接用数组

- 这样直接支持比如标题和作者两个条件的查询了

- 这就涉及原始数据库表如何转换为索引的doc

- 后续我们研究如何用flink做类似转换

## 理解分析器

- 分析器由三个模块组成

- 0-N个字符处理器

- 1个分词器 (tokenizer)

- 0-N个词(token)处理器

- 分析器可以作用在索引构建阶段或查询阶段

- 某些分析器仅能在查询阶段使用

- 一般要求构建时和搜索时使用相同的分析器

- 但搜索时指定单独的分析器也是有道理的

- 这种情况下该分析器映射出来的条件一般来讲更严格

- 比如构建时 apple 可以分出 a ap app ...

- 但是搜索时 appli 只能是appli

- 词干化stemming (词处理器)

- 基于字典的效果好性能差

- snowball 常用

- 辅以 keyword_marker 等自定义词干化的过程

- token graph

- 只有synonym_graph 和word_delimiter_graph才支持

- 不带_graph后缀的不支持

### 测试分析器

- 只使用 `_analyze`

```

POST _analyze

{

"analyzer": "whitespace",

"text": "The quick brown fox."

}

```

- 为索引的某个字段指定分析器

- 参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/test-analyzer.html

- 字段的 type设为text 同级设定 analyzer 为自定义的analyzer的名字

- 在settings.analysis.analyzer下自定义analyzer

- 测试时指定索引名/_analyze

- 指定field和text(没有实际数据也可)

- TODO

## 自定义同义词替换

## 应用分析器到索引

标签：

学习日志 211228

学习日志 211228的评论 (共条)

你可能也喜欢这些文章

最新发布的文章

学习日志 211228

本文作者的其他文章

学习日志 211228的评论 (共 条)

你可能也喜欢这些文章

最新发布的文章

学习日志 211228的评论 (共条)